0% found this document useful (0 votes)
62 views139 pages

Week2 Combine

This document provides an outline and overview of an introductory course on C programming. It begins with defining what C is, noting that it is a high-level language that is also relatively low-level, small, and general purpose. The document then discusses C's strengths, potential weaknesses, and history. It introduces procedural programming paradigms and how C programs are organized with functions. The remainder of the document provides examples of defining, writing, and calling functions in C programs.

Uploaded by

raley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views139 pages

Week2 Combine

This document provides an outline and overview of an introductory course on C programming. It begins with defining what C is, noting that it is a high-level language that is also relatively low-level, small, and general purpose. The document then discusses C's strengths, potential weaknesses, and history. It introduces procedural programming paradigms and how C programs are organized with functions. The remainder of the document provides examples of defining, writing, and calling functions in C programs.

Uploaded by

raley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 139

HIGH-LEVEL PROGRAMMING I

Intro to C Programming (Part 2/3) by Prasanna


Ghali
Outline
2

 What is C?
 C’s Strengths
 Why not C?
 History of C
 Procedural Paradigm
 Organization of C Programs
 Writing and Calling Functions
 Creating and Running a C Program
 C Programs with Multiple Source Files
 Header Files: Interface vs. Implementation
What is C? (1/3)
3

 C is a high-level language
 Providesconstructs for structured programming
 Supports functions and procedural paradigm
 Supports user-definable derived data types

 C is a (relatively) low-level language


 Supports full range of native machine types
 Supports access to machine-level addresses
 Provides operations that correspond closely to
machine’s built-in instructions
What is C? (2/3)
4

 C is a small language and is therefore easy to


learn
 Provides limited set of features compared to other
languages
 Uses library for commonly required features
including input/output, file management, memory
management, math, …
we’ve come across
 We’ll use about 34 keywords these keywords

signed, unsigned, char, short, int, long,


float, double, void, if, else, while, return
What is C? (3/3)
5

 C is a general purpose programming language


 Can be used to program variety of different
applications
 Most widely used systems programming language
for implementing operating systems, compilers,
databases, …
 De facto standard for programming embedded
systems
 Provides excellent foundation for learning ALL
other languages
C’s Strengths
6

 Simple
 Efficient

 Integration with Unix/Linux

 Portable

 Standard library

 Powerful

 Flexible
Why Not C?
7

 C programs can be error-prone


 Permissiveness and flexibility may provide too wide
latitude for some programmers
 C programs can be terse
 Brevity may be difficult for some programmers
 Large scale C programs can be difficult to maintain
 Language itself doesn’t provide support for modular
and object-oriented paradigms
History of C (1/2)
8

1950 1960 1970

Lisp Algol
68
Algol
60
Fortran Pascal

BCPL

COBOL B
History of C (2/2)
9

1970 1980-90 2000 2010 2020

Simula

C++ C++98 C++11 C++20


BCPL B Classic C

C89
C99 C11 C18

This course will use C11


Consider Complex Problem (1/2)
10

 You have to send flowers to your grandmother


(who lives in Japan) for her birthday
 Plant flowers
 Water flowers
 Pick flowers
 Fly to Japan with flowers
Consider Complex Problem (2/2)
11

 You have to send flowers to your grandmother


(who lives in Japan) for her birthday
 Plant flowers
 Make trip to nursery to make purchases
 Prepare soil in pot
 Plant seeds in pot
 …
 Water flowers
 Pick flowers
 Fly to Japan with flowers
Another Complex Problem (1/2)
12

 You’re asked to organize catering for a wedding


 Make up guest list
 Invite guests
 Select appropriate menu
 Book reception hall
 …
Another Complex Problem (2/2)
13

 You’re asked to organize catering for a wedding


 Make up guest list
 Get list from groom
 Get list from bride
 Check for conflicts
 Check with bride about groom’s list
 Check with groom about bride’s list
 Check final list with groom’s parents
 Check final list with bride’s parents
 …
 Invite guests
 …
 Select appropriate menu
 Book reception hall
 …
Procedural Programming Paradigm
14
(1/2)
 Breaking down tasks into smaller subtasks is
good plan of attack for solving complex
programming problems too
 Each “large” task is decomposed into smaller
subtasks and so forth
 Process is continued until subtask can be
implemented by single algorithm
 Synonyms for this strategy: top-down design,
procedural abstraction, functional decomposition,
divide-and-conquer, stepwise refinement
Procedural Programming Paradigm
15
(2/2)
 In C/C++, algorithm is packaged into building
block called function
 Other languages refer to function as procedure or
subroutine or method
 Program is organized into these smaller,
independent units called functions

Input 1 I’m a function


⋮ Output
Input n (algorithm)
Advantages of Functions
16

 Divide and conquer approach to complexity


 Divide complicated whole into simpler and more
manageable units
 Standalone, independent functions are easier to understand,
implement, maintain, test and debug
 Cost and pace of development
 Different people can work on different functions
simultaneously
 Building blocks for programs
 Write function once and use it many times in same program
or many other programs
Organization of C Programs (1/3)
17

 C was designed to use procedural paradigm to


solve programming problems
 Program is synchronization of different functions to
solve problem
input
f1 f2
f3
output

f7
f4 f5 f6
Organization of C Programs (2/3)
18

source-file-1.c source-file-2.c source-file-3.c


function A
function main

function C
call function A

function B
call function B

call function C

 Every C program must have one and only one


function called main not a C/C++ keyword!!!
Organization of C Programs (3/3)
19

 Related functions are organized into a source file


 Think of C program as one or more source files
with each source file containing one or more
related functions
// source-file-1.c // source-file-n.c
preprocessing directives preprocessing directives

function prototypes function prototypes

data declarations (global) data declarations (global)

return-type return-type function-name


main(parameter declarations)
⋯ (parameter declarations)
{ {
data declarations (local) data declarations (local)
statements statements
} }

other functions other functions


How to Write Define a Function
20
(1/3)
Input 1
Output I’m a function ⋮
Input n
1) Every function must have a name
2) C has rules for specifying names

specifies type of output value Comma-separated list


of input values
1
3 2
{ and } are
used to group
return-type function-name(parameter-list)
statements that {
implement statements
4
function } 5
In C, each statement is expression followed by ;
How to Write Define a Function
21
(2/3)
abs 𝑛𝑢𝑚 𝑛𝑢𝑚
(integer) (integer)
this variable in parameter list is called formal parameter
or just parameter i.e., num is parameter of type int
1

int abs(int num) 2 C statements are terminated by ;


{
if (num < 0) {
num = -num;
}
return num; 3 Statement says “function will return
} an output value num of type int”
How to Write Define a Function
22
(3/3)
max 𝑥, 𝑦 𝑥 (integer)
(integer) 𝑦 (integer)

int max(int x, int y)


{
if (x > y) {
return x;
} else {
return y;
}
}
Simplest C Program: Does
23
Nothing!!!
Every C program must have one Here, we’re saying that
and only function named main main takes no input value(s)
1
2
main must return an void is keyword
output value of type int 3 indicating no value!!!
4
{ and } are
int main(void) return is keyword
used to group { 8 used in a statement to
statements that return 0;
implement
terminate function or
5
function } to return output value
7
6
Statement says “function will return Every C statement must
an output value 0 of type int” terminate with ;
How to Call a Function
24

this variable in parameter list is called parameter


1
1) At run-time, client function main calls function
int abs(int num) {
if (num < 0) { abs using function call operator ()
num = -num; 2) Argument x is evaluated to value -10
} 3) Result of evaluation is used to initialize
return num; parameter num to value -10
}
4) Function abs returns value 10 back to client
int main(void) { before terminating
int x = -10; 5) So, result of calling function abs is value 10 of
x = abs(x); type int – this value is then assigned to x
return 0;
} 2 3 Function abs is called by using its name
variable x is declared followed by () that enclose a value (x here)
type int and initialized Value x in function call operator is called
with value -10 argument
Creating and Running a C Program
25

// this is file nothing.c


int main(void) Editor
{ 1 nothing.c 4
return 0;
} Compiler Toolchain Errors
2 nothing.out
OS program that loads executable 5 Loader
nothing.out into memory
Program nothing.out begins execution 6 Execution
once loader has installed executable in 3
memory
gcc –std=c11 –pedantic-errors –Wstrict-prototypes
-Wall –Wextra –Werror nothing.c –o nothing.out
gcc Options
26

 -std=c11 Uses C11 standard


 -pedantic-errors Gives an error (not just a warning) if
code is not following C11 standard
 -Wstrict-prototypes Disallows things allowed in old C
standards – we want C code to be compatible with C++
 -Wall Warns about anything that compiler finds shady
 -Wextra Warns about more shady things than –Wall
 -Werror Converts warnings to errors so that code with
warnings is never successfully compiled
 -o nothing.out Specifies name of output file; otherwise
file name defaults to a.out

gcc –std=c11 –pedantic-errors –Wstrict-prototypes


-Wall –Wextra –Werror nothing.c –o nothing.out
Next Simplest C Program: Print a
27
Greeting!!! (1/2)
1) Strange syntax that is not a part of C!!!
2) Any line that begins with # is directive to another
program called preprocessor.
3) Think of preprocessor as editor that modifies C source
file according to directives that begin with # character
1) C has no facilities for I/O. 1
// this is file hello.c
2) Instead, it has set of libraries to
#include <stdio.h>
support I/O, math, date & time, and
many other functionalities. 2 int main(void)
3) Functionalities in each library are
{
declared in a standard header file.
printf("Hello World\n");
4) To use a particular library’s
return 0;
functions, add preprocessor
}
#include command
Next Simplest C Program: Print a
28
Greeting!!! (2/2)
// this is file hello.c 1) File stdio.h is called header file
#include <stdio.h> 2) It contains prototype of function printf
1 3) Makes name printf and function’s
parameter list and return type known to
int main(void)
compiler
{
4) C/C++ require all names used in source file
printf("Hello World\n");
to be declared before their first use
return 0; 5) Preprocessor will replace 1st line with
} contents of header file
6) < and > delimiting header file name simply
2 tell compiler where to find the file

1) Function printf is part of standard C library


2) It is used to print information to standard output (the screen)
3) It is one of the most complex functions in the library
4) Argument "Hello World\n" represents a string (sequence of
characters) with \n representing a newline
Compilation Phases (1/2)
29

Compiler toolchain consists of four phases: preprocessor,


compiler, assembler, and linker Editor
// this is file hello.c 1 hello.c
#include <stdio.h> Preprocessor 4
2 hello.i
int main(void)
{ Compiler Errors
printf(" Hello World\n"); hello.s
3
return 0;
} Assembler 8
5 hello.o
Object files 6
Linker Errors
& Libraries 7 hello.out
Loader Execution
Compilation Phases (2/2)
30

 Preprocess only (I’m not using all required gcc


options for brevity)
 gcc –std=c11 –E hello.c -o hello.i
 Compile only
 gcc –std=c11 –S hello.i -o hello.s
 Assemble: gcc –c hello.s -o hello.o
 Link: gcc hello.o -o hello.out
 Execute: ./hello.out
Creating a C Program (1/2)
31

Source Object
Preprocess Compile File 1
File 1

Source Object
Preprocess Compile File 2
File 2
Executable
Library Link File
Object
File 1

Source Object
Preprocess Compile File n
File n
Library
Object
one of these source files must File m

define function main


Creating a C Program (2/2)
32

 Compiler consumes each source file individually


without being aware of presence of other
source files!!!
 Linker consumes all object files together to
create executable file
Multiple Source Files (1/6)
33

 Deconstruct hello.c into two source files and


one header file
// this is file hello.c
#include <stdio.h>

int main(void)
{
printf("Hello World\n");
return 0;
}
Multiple Source Files (2/6):
34
hello-decl.h
// this is file hello-decl.h
#include <stdio.h>

// hello prints a greeting to standard output


void hello(void);
1

File stdio.h is included to provide function


2
prototype (or declaration) of standard C library
function printf
Declaration of identifier hello: hello is a function that takes no
input and returns no output
Multiple Source Files (3/6):
35
hello-defn.c
#include "hello-decl.h"

// definition of function hello


void hello(void) {
printf("Hello World!!!\n");
} 1

2
1) File hello-decl.h is included to provide function
prototype (or declaration) of standard C library
function printf and function hello
Compile 2) Pair of double quote delimiters " tells preprocessor
only!!! 3 to search for header file in current directory
gcc –std=c11 –pedantic-errors –Wstrict-prototypes
-Wall –Wextra –Werror –c hello-defn.c –o hello-defn.o
Multiple Source Files (4/6):
36
driver.c
#include "hello-decl.h"
File hello-decl.h is included
int main(void) { 1 to provide function prototype (or
hello(); declaration) of function hello
return 0;
2
}

1) Compiler doesn’t care where or how function hello


is defined
3 2) Compiler only cares whether call to hello matches
declaration in hello-decl.h

gcc –std=c11 –pedantic-errors –Wstrict-prototypes


-Wall –Wextra –Werror –c driver.c –o driver.o
Multiple Source Files (5/6):
37
driver.c
#include "hello-decl.h" 1 File hello-decl.h is included
#include <stdio.h> to provide function declaration (or
int main(void) { prototype) of function hello
hello(); 2
return 0; File hello-decl.h includes C
} 3 standard library file stdio.h

1) Can have multiple declarations of function printf!!!


2) But, there can be only one definition of a function
4 (and a variable)
3) Otherwise, linker will not create executable.

gcc –std=c11 –pedantic-errors –Wstrict-prototypes


-Wall –Wextra –Werror –c driver.c –o driver.o
Multiple Source Files (6/6):
38
Linking Object & Library Files
 Two object files:
 driver.o with definition of function main
 hello-defn.o with definition of function
hello
 C standard library: libc.a with definition
of function printf
 Findlocation of C standard library file thro’ this
command: gcc –print-file-name=libc.a
 Link these files into executable hello.out

gcc driver.o hello-defn.o –o hello.out


Interface versus Implementation
39

// this is file hello-decl.h // this is file hello-defn.c


#include <stdio.h> #include "hello-decl.h"

void hello(void); // definition of function hello


void hello(void) {
1 printf("Hello World!!!\n");
}
1) Header file specifies
2
function prototypes
2) Function prototype is an 1) Source file specifies function
interface that only specifies definitions
what the function does 2) Function definition is an
implementation that specifies how
function accomplishes purpose
advertised by interface
Mathematical Functions
40

1
#include <math.h> File math.h is included to provide function
#include <stdio.h>
prototype (or declaration) of standard C
int main(void) { library function sqrt
double px = 0.0, py = 0.0;
double qx = 3.0, qy = 4.0;
double w = qx - px, h = qy - py;
Call to printf displays
double dist = sqrt(w*w + h*h);
following line:
printf("Distance is %f\n", dist);
return 0; Distance is 5.000000
} 2

3
Shorthand for link object files with library file libm.a

gcc –std=c11 –pedantic-errors –Wstrict-prototypes 4


-Wall –Wextra –Werror dist.c –o dist.out -lm
Summary
41

 C program consists of source files with each file consisting


of functions
 Function is encapsulation of algorithm
 Think of function as statements grouped together and given a
name
 C programs must have one and only one function called
main
 main function is starting point of execution of C program
 Each source file must be individually compiled into object
file
 Object files are linked to together into an executable
 Must explicitly link to C math standard library functions
Introduction to Compilation Process [Prasanna Ghali]

Introduction to Compilation Process


In the early days of computing, programmers used free software exclusively. Even computer
companies often distributed free software. By the s, almost all software was proprietary,
which means that it had owners who forbade and prevented cooperation among users. Every
computer user needs an operating system; if there is no free operating system, then you can't
even get started using a computer without resorting to proprietary software. This was the
motivation for the launch of The GNU Project by Free Software Foundation in - to develop a
complete, free Unix-like GNU operating system. A Unix-like operating system includes a kernel,
compilers, editors, text formatters, mail software, graphical interfaces, libraries, games and many
other things. Thus, writing a whole operating system is a very large job. By , the GNU Project
had either found or written all the major components except one - the kernel. Then Linux, a Unix-
like kernel, was developed by Linus Torvalds in and made free software in . Combining
Linux with the almost-complete GNU system resulted in a complete operating system: the
GNU/Linux system and more commonly simply referred to as Linux or Linux distro [short for
distribution]. Every desktop in DigiPen labs has a Linux distribution from ubuntu.

In the process of building the GNU system, a number of free and open-source programming tools
were incubated including the GNU C compiler (GCC). Later a C++ compiler was developed and
translators and compilers for languages such as Ada, Fortran, Objective-C, and Objective-C++, and
Go were added. The addition of languages beyond C led to GCC having the meaning The GNU
Compiler Collection.

When building ready-to-run applications from C source code, a compiler is not sufficient; run-time
libraries, an assembler, and a linker, are also needed. The whole set of these tools is called a
compiler tool chain or compiler driver. This document describes the process of writing, compiling,
linking, and running C programs using the GCC C compiler driver.

Compiling a simple C program


Suppose you wish to create a computer program to compute the sum of the first terms of the

series where is a positive natural number. For example, if the program is given value as

input, it would return since . The creation of this computer

program can be divided into a six-step problem-solving phase followed by a three-step


implementation phase.

1 / 17
Introduction to Compilation Process [Prasanna Ghali]

Suppose, after the conclusion of the six-step problem-solving phase, an algorithm is


devised:

The next phase called the implementation phase will use a specific programming language to
convert algorithm into a computer program. This phase comprises of three steps: Code,
Test, and Debug. This document is concerned with filling in the details required by a C
programmer to implement the Code step. Once the executable program is created, the
programmer can continue with the Test and Debug steps of the implementation phase - these
steps are not discussed here.

The Code step for algorithm SUM


The Code step can be further divided into two stages: Edit and Compile. The Edit stage uses a text
editor to create and edit a source file containing the implementation of algorithm . The
subsequent Compile stage involves the use of a compiler toolchain to convert the contents of
source file into machine code that is stored in a file known as an executable file or a binary file.

Editing source file sum.c


The first process of the Code step is to create a source file using a text editor. A source file
contains code or human readable text representing an algorithm's implementation using the
syntax of a specific programming language. This act of using a text editor to create a source file
and typing English-like text representing C syntax and semantics is called editing.

Word processors such as Microsoft Word are not useful and functional for editing source files
because they save files in proprietary native formats incompatible with programming tools. Since
writing and editing code is a fundamental job of programmers, specialized text editors called
source code editors have been invented to make programmers become more productive and
efficient. The list of source code editors is lengthy ranging from the simple, no-frills text editors
such as vi on Linux and Notepad on Windows to complex integrated development environments
such as Visual Studio on Windows. We recommend Visual Studio Code since it is free, modern,
multi-platform [can be installed on macOS, Windows 10, Linux], supports almost every
programming language currently in use, and provides many functionalities useful to
programmers. Visual Studio Code is installed in all DigiPen labs. Begin here to get an overview of
the editor.

Begin by creating a folder in Windows called test . Type wsl on the command line to switch
from Windows to Linux. Using Visual Studio Code, create a file called sum.c in folder test . This
is easily done by typing the following in the bash shell [note that the $ represents the Linux shell
prompt and is not a part of the command]:

2 / 17
Introduction to Compilation Process [Prasanna Ghali]

1 $ code sum.c

Type the following C code implementing algorithm in source file sum.c :

1 #include <stdio.h>
2
3 int main(void) {
4 int n;
5 printf("Enter a positive natural number: ");
6 scanf("%d", &n);
7
8 int sum = 0;
9 for (int i = 1; i <= n; ++i) {
10 sum += i;
11 }
12 printf("Sum of first %d natural numbers is: %d\n", n, i);
13 return 0;
14 }
15

Compiling source file sum.c


Compilation refers to the process of converting a program from the textual source code, in a
programming language such as C or C++, into machine code. This machine code is then stored in
a file known as an executable file, sometimes referred to as a binary file. In fact, compilation is an
umbrella term that represents all of the individual stages of the compilation process. The
complete set of tools used in the compilation process is referred to as a compiler driver or
compiler toolchain.

To compile the file sum.c with GNU C compiler, use the following command:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror


sum.c -o sum.out

This command will compile source code in sum.c to machine code and store it in an executable
file sum.out . The output file for the machine code is specified using -o option. If the -o option is
omitted, output is written to a default file called a.out .

Executing binary file sum.out


To run executable program sum.out , type the executable's pathname like this:

1 $ ./sum.out
2 Enter a positive natural number: 100
3 Sum of first 100 natural numbers is: 5050

The loader program in Linux will load executable file sum.out from disk to memory and cause
the CPU to begin executing the first instruction in function main .

3 / 17
Introduction to Compilation Process [Prasanna Ghali]

Why does Linux require pathname ./sum.out and not the simpler and straightforward sum.out ?
When the name of an executable such as code or gcc is typed in the shell, the operating system
will search in directories specified in a variable called PATH . In fact, these directories can be
displayed by typing the command echo $PATH . If the plain sum.out is typed, command not
found is displayed by the shell since directory test [in which executable sum.out was created]
is not specified in variable PATH . In every shell, character . on the command line means current
directory. So, by typing ./sum.out , you're telling Linux "don't worry about directories in variable
PATH , just run executable sum.out in the current directory . ".

Compiler options and dealing with errors


The compiler driver will catch straightforward syntax errors. To demonstrate this, edit source file
sum.c with a missing ; on line :

1 #include <stdio.h>
2
3 int main(void) {
4 int n;
5 printf("Enter a positive natural number: ");
6 scanf("%d", &n);
7
8 int sum = 0;
9 for (int i = 1; i <= n; ++i) {
10 sum += i
11 }
12 printf("Sum of first %d natural numbers is: %d\n", n, i);
13 return 0;
14 }
15

Compiling sum.c with no options other than enforcing C11 [for brevity], the compiler prints the
following error message:

1 $ gcc -std=c11 sum.c -o sum.out


2 sum.c: In function ‘main’:
3 sum.c:10:13: error: expected ‘;’ before ‘}’ token
4 10 | sum += i
5 | ^
6 | ;
7 11 | }
8 | ~
9

In addition to flagging incorrect syntax as errors, compilers provide many diagnostic warning
messages about potential coding errors. To demonstrate this, a subtle error is introduced in line
: in function printf , the correct integer format specifier %d is replaced with incorrect floating-
point format specifier %f :

1 #include <stdio.h>
2
3 int main(void) {
4 int n;

4 / 17
Introduction to Compilation Process [Prasanna Ghali]

5 printf("Enter a positive natural number: ");


6 scanf("%d", &n);
7
8 int sum = 0;
9 for (int i = 1; i <= n; ++i) {
10 sum += i;
11 }
12 printf("Sum of first %d natural numbers is: %f\n", n, sum);
13 return 0;
14 }
15

Compiling the now buggy source file sum.c with no options other than enforcing C11, the
following diagnostic warning is produced by the compiler:

1 $ gcc -std=c11 sum.c -o sum.out


2 sum.c: In function ‘main’:
3 sum.c:12:48: warning: format ‘%f’ expects argument of type ‘double’, but
argument 3 has type ‘int’ [-Wformat=]
4 12 | printf("Sum of first %d natural numbers is: %f\n", n, sum);
5 | ~^ ~~~
6 | | |
7 | double int
8 | %d
9

Notice that the compiler does create executable sum.out . To prevent programmers from passing
off buggy software, gcc provides the -Werror option that prevents gcc from successful
compilation when diagnostic warnings are generated:

1 $ gcc -std=c11 sum.c -o sum.out


2 sum.c: In function ‘main’:
3 sum.c:12:48: error: format ‘%f’ expects argument of type ‘double’, but
argument 3 has type ‘int’ [-Werror=format=]
4 12 | printf("Sum of first %d natural numbers is: %f\n", n, sum);
5 | ~^ ~~~
6 | | |
7 | double int
8 | %d
9 cc1: all warnings being treated as errors
10

In other words, -Werror option converts diagnostic warnings to full blown errors and prevents
programmers from proceeding any further until these warnings are heeded by repairing source
code. Options -Wall and -Wextra turn on warning messages for many other common coding
errors listed here. What do these compilation options such as -std-c11 , -pedantic-errors , and
-Wstrict-prototypes mean? These options support the creation of compliant code and a
detailed explanation of these compilation options is provided in this section. Finding bugs is hard
and programmers appreciate when compilers provide varied options that flag potential bugs by
generating warnings. Therefore, it is important for students to use these options, as in:

5 / 17
Introduction to Compilation Process [Prasanna Ghali]

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror


sum.c -o sum.out

before submitting any code that will be used in assessments for this course. Non-compliance
could result in submitted code receiving zero points and/or failing grade.

How the compiler works


This section describes in more detail how source files are transformed to an executable file.
Compilation is a multi-stage process involving several tools, including the compiler itself such as
gcc , the assembler as , and the linker ld . The complete set of tools used in the compilation
process is known as the compiler toolchain or compiler driver. The sequence of commands
executed by a single invocation of gcc consists of preprocessing, compilation proper, assembly,
and linking stages:

As an example, the individual compilation stages will be examined using source file hello.c :

1 #include <stdio.h>
2
3 int main(void) {
4 int year = 2020;
5 printf("Hello World %d!!!\n", year);
6 return 0;
7 }
8

Although the code in source file hello.c is simple, it uses external header files [on line ] and
calls a C standard library function (on line ), and therefore exercises the entire compiler
toolchain.

The Preprocessor
The first stage of the compilation toolchain process is the use of the preprocessor. Think of the
preprocessor as a text editor that modifies a C source file according to preprocessing directives.
Before interpreting directives, the preprocessor performs a more basic global transformation on
the source file: all single-line and multi-line comments are replaced with single spaces.

6 / 17
Introduction to Compilation Process [Prasanna Ghali]

Preprocessing directives are lines in a source file that begin with character # . The # is followed
by an identifier that is the directive name. The include directive name involves inclusion of
header files. The define directive name involves macro expansion where a macro is a fragment
of C code which has been given a name. Directive names if , else , and elif allow conditional
compilation of the source file by allowing certain parts of the source file to be included or
excluded from the compilation process. This document is only concerned with include directive
name.

A header file is a file containing C declarations and macro definitions to be shared between
multiple source files. A programmer requests the use of a header file in a source file with
preprocessing directive #include . Line of source file hello.c looks like this: #include
<stdio.h> . The author is asking the preprocessor to search for C standard library header file
stdio.h in the disk and then to replace line with the contents of header file stdio.h . The
delimiters < and > indicate to the preprocessor that it must search the standard list of system
directories special to the compiler toolchain being used.

This stage of the compiler toolchain can be manually invoked like this:

1 $ gcc -std=c11 -E hello.c -o hello.i

In gcc , the output file is specified with the -o option. The resulting file hello.i is a
transformation of source file hello.c with comments replaced by a single space and line of
source file hello.c replaced with contents of system header file stdio.h followed by the
remaining lines of the source file. This is confirmed by examining the contents of hello.i :

1 // hundreds of lines omitted for brevity ...


2
3 extern int printf (const char *__restrict __format, ...);
4
5 // many hundereds of lines omitted for brevity ...
6
7 # 3 "hello.c"
8 int main(void) {
9 int year = 2020;
10 printf("Hello World %d!!!\n", year);
11 return 0;
12 }
13

Notice that the purpose of including header file stdio.h is served because hello.i now
contains a function prototype of printf on line so that the compiler proper can understand
the call to function printf on line . This is so that the cardinal rule in C and C++ that all names
must be declared before their first use is satisfied.

The Compiler proper


The next stage of the compiling toolchain process is the actual compilation of preprocessed
source code to assembly language, for a specific CPU. The command-line option -S instructs gcc
to only convert preprocessed C source code in file hello.i to assembly language:

1 $ gcc -std=c11 -S hello.i

7 / 17
Introduction to Compilation Process [Prasanna Ghali]

By default, the resulting assembly language is stored in file hello.s . A partial listing of the
assembly language for an Intel (Pentium) CPU looks like this:

1 // many lines deleted for brevity ...


2 main:
3 .LFB0:
4 .cfi_startproc
5 endbr64
6 pushq %rbp
7 // many lines deleted for brevity ...
8 subq $16, %rsp
9 movl $10, -4(%rbp)
10 movl -4(%rbp), %eax
11 movl %eax, %esi
12 leaq .LC0(%rip), %rdi
13 movl $0, %eax
14 call printf@PLT
15 movl $0, %eax
16 leave
17 .cfi_def_cfa 7, 8
18 ret
19 .cfi_endproc
20 .LFE0:
21 // many lines deleted for brevity ...
22

Notice that line of the assembly language code above contains a call to C standard library
function printf .

The Assembler
The assembler is a translator that transforms assembly language code into machine code and
generates an object file. When there are calls to external functions in the assembly source file - as
with line in the assembly language code - the assembler leaves the addresses of external
functions undefined, to be filled in later by the linker. The assembler is invoked with -c option of
gcc :

1 $ gcc -c hello.s -o hello.o

or by directly calling the assembler:

1 $ as hello.s -o hello.o

The resulting object file hello.o contains the machine instructions for the source code in file
hello.c , with an undefined reference to printf . Unlike other intermediate files generated by
gcc , object file hello.o is a binary file and is therefore not human-readable.

8 / 17
Introduction to Compilation Process [Prasanna Ghali]

The Linker
The final stage of compilation is to use the ld program to link object files to create an executable
file or binary file. In practice, an executable requires many external functions from system and C
run-time libraries. Consequently, the actual link
commands used internally by GCC are complicated:

1 /usr/lib/gcc/x86_64-linux-gnu/9/collect2 -plugin /usr/lib/gcc/x86_64-linux-


gnu/9/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/9/lto-
wrapper -plugin-opt=-fresolution=/tmp/ccPZPAYK.res -plugin-opt=-pass-
through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc
-plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id
--eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker
/lib64/ld-linux-x86-64.so.2 -pie -z now -z relro -o hello.out
/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/9/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/9
-L/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu -
L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib -L/lib/x86_64-linux-gnu -
L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -
L/usr/lib/gcc/x86_64-linux-gnu/9/../../.. hello.o -lgcc --push-state --as-
needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-
state /usr/lib/gcc/x86_64-linux-gnu/9/crtendS.o /usr/lib/gcc/x86_64-linux-
gnu/9/../../../x86_64-linux-gnu/crtn.o

Fortunately there is never any need to explicitly use the ld command directly. gcc can
transparently handle the entire linking process, as in:

1 $ gcc hello.o

This links object file hello.o with the C standard library. That is, it takes the printf function and
other dependent functions from the C standard library, and the machine language version of
main function defined in object file hello.o into an executable file a.out . The program can be
executed by using the pathname of the executable a.out , as in:

1 $ ./a.out
2 Hello World 2020!!!
3

Alternatively, you can use the -o option to provide an other name than a.out to the executable:

1 $ gcc hello.o -o hello.out

Linking with external libraries


A library is a collection of precompiled object files which can be linked into programs. Libraries are
typically stored in special archive files with extension .a , referred to as static libraries. They are
created from object files with a separate tool, the GNU archiver ar , and used by the linker to
resolve references to functions at compile-time. In the Linux distro on DigiPen lab desktops, the C
standard library containing functions specified in C11 standard such as printf is located at
/usr/lib/x86_64-linux-gnu/libc.a . By default, libc.a is linked by gcc . For historical reasons,

9 / 17
Introduction to Compilation Process [Prasanna Ghali]

the math portion of C standard library is stored in a separate static library /usr/lib/x86_64-
linux-gnu/libm.a . The functions in the math library are declared in header file <math.h> .
However, again for historical reasons, gcc does not automatically link libm.a .

Consider the following source file sin.c that makes a call to external function sin in math
library libm.a :

1 #include <stdio.h>
2 #include <math.h>
3
4 int main(void) {
5 double angle = 0.785398; // 45 degrees in radians
6 printf("sin(%f) = %f\n", angle, sin(angle));
7 return 0;
8 }
9

Source file sin.c compiles without a hiccup:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror -c


sin.c -o sin.o

However, the linker throws an error when an attempt is made to generate an executable:

1 $ gcc sin.o -o sin.out


2 /usr/bin/ld: sin.o: in function `main':
3 sin.c:(.text+0x23): undefined reference to `sin'
4 collect2: error: ld returned 1 exit status
5

The problem is that the reference to function sin is neither defined in source file sin.c nor in
the default C standard library libc.a . Instead, it is defined in external math library libm.a and
the compiler does not link to file libm.a unless it is explicitly selected. This is done by using the -
lm option to the linker stage of gcc :

1 $ gcc sin.o -o sin.out -lm

Option -lm is a shorthand for link object files with a library file /usr/lib/x86_64-linux-
gnu/libm.a .

Compiling multiple source files


Software development is a group activity with many programmers collaborating and working
together to create a software application. Editing a single monolithic source file is not only
cumbersome but also error-prone. Instead, common practice is to divide the program into sub-
programs with each sub-program further divided into functions. Related functions are grouped in
a source file and assigned to a specific programmer. This division of labor and purpose through
multiple source files makes it easier and faster to design, implement, and test software because
individual programmers can independently implement, compile, and test their source files. In fact,
an individual programmer will never need access to the source code of other programmers to

10 / 17
Introduction to Compilation Process [Prasanna Ghali]

create the executable. The following picture illustrates the process by which multiple source files
and external libraries are combined to generate an executable.

In the following example, the implementation of algorithm is divided into two source files
main.c and sum_fn.c and a header file sum.h . The division of tasks to two programmers
consists of a client programmer implementing main.c that uses the implementation of algorithm
and a second programmer independently implementing algorithm in source file
sum_fn.c and also providing an interface or header file sum.h . Neither programmer is aware of
nor has access to the other programmer's source files [note that header files are by design
supposed to be shared].

Here is the implementation of driver source file main.c containing the definition of function
main [recall that every C program must contain one and only one function called main which is
the CPU's entry point to the program] by the client programmer:

1 #include <stdio.h>
2 #include "sum.h"
3
4 int main(void) {
5 int n;
6 printf("Enter a positive natural number: ");
7 scanf("%d", &n);
8
9 printf("Sum of first %d natural numbers is: %d\n", n, sum(n));
10 return 0;
11 }
12

The implementation of algorithm in the previous version of the program in source file
sum.c has been replaced by a call to a new external function sum , which from the perspective of
both the author of main.c and the compiler is defined somewhere.

Another new entry in source file main.c is the addition of preprocessor directive #include
"sum.h" on line . This is an interface file provided by function sum 's author containing its
function prototype. A function prototype specifies the function's name, its parameter list, and its
return type. The compiler will use this function prototype to check for the correct use of function
sum by the author of main.c . The compiler does this check by ensuring that the function call's
arguments and return type match up correctly with the function definition's parameters and
return type. In simpler terms, all of this means that since the compiler doesn't have access to the
definition of function sum in file sum_fn.c , it will use the function prototype in header file sum.h
to pass judgement on whether the call to function sum in main.c is correct.

11 / 17
Introduction to Compilation Process [Prasanna Ghali]

Finally, notice the difference in syntax for #include directives in lines [ #include <stdio.h> )
and ( #include "sum.h"> ]. The delimiters < and > indicate to the preprocessor that it must
search the standard list of system directories special to the compiler toolchain being used. In the
Linux distro on DigiPen lab desktops, C standard library header files are located at
/usr/include . The delimiters " and " indicate to the preprocessor that it must search the
current directory before looking in the standard list of system directories.

Suppose header file sum.h supplied by the implementer of file sum_fn.c looks like this:

1 int sum(int);
2

The header file contains a single line containing the function prototype for sum .

The implementer of file main.c now has everything required to create an object file:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror -c


main.c -o main.o

Independently, the second programmer has defined function sum in a separate file sum-fn.c :

1 #include "sum.h"
2
3 int sum(int N) {
4 int total = 0;
5 for (int i = 1; i <= N; ++i) {
6 total += i;
7 }
8 return total;
9 }
10

Notice that line contains a preprocessor directive to include header file sum.h . This is a
recommended (and sometimes necessary) practice to ensure that both the function prototype in
the header file and definition in source file match up. The author of sum-fn.c can independently
compile and generate a corresponding object file:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror -c


sum-fn.c -o sum_fn.o

Either of the two programmers or any other programmer can now link together function main in
object file main.o , function sum in object file sum-fn.o , and C standard library function printf
defined in static C standard library libc.a into a single executable, say new-sum.out :

1 $ gcc main.o sum-fn.o -o new-sum.out

To run executable program new-sum.out , type the pathname of the executable like this:

1 $ ./new-sum.out
2 Enter a positive natural number: 100
3 Sum of first 100 natural numbers is: 5050

12 / 17
Introduction to Compilation Process [Prasanna Ghali]

Compilation flags for GCC C and Clang compilers


The aim of these compilation flags in GCC C and Clang compilers is to make the best possible
attempt to ensure C code conforms as a subset of C++. Note that the Clang driver and language
features are intentionally designed to be as compatible with the GCC C compiler as reasonably
possible, easing migration from GCC C compiler to Clang. In most cases, code "just works".
Therefore, only links to the GCC options are provided here. Information and details related to the
Clang compiler can be found in the Clang compiler user's manual.

-std=c11 : Information about the C11 standard is available in the C11 N1570 standard draft.

-pedantic-errors : Gives an error when base standard C11 requires a diagnostic message to
be produced. C89 allows the declaration of a variable, function argument, or structure
member to omit the type specifier, implicitly defaulting its type to int . Although, legal in
C89, this is considered illegal in C99, C11, and C++:

1 #include <stdio.h>
2 int main(void) {
3 static x = 10;
4 printf("x: %d\n", x);
5 return 0;
6 }
7

However, compiling this code with a C11 compiler, as in: gcc -std=c11 tester.c elicits only
a warning message. However, compiling the same code with the -pedantic-errors option
produces an error.

According to Section of the C11 N1570 standard draft,

1 A conforming implementation shall produce at least one diagnostic message


(identified in an implementation-defined manner) if a preprocessing translation unit or
translation unit contains a violation of any syntax rule or constraint, even if the
behavior is also explicitly specified as undefined or implementation-defined. Diagnostic
messages need not be produced in other circumstances. )

9. The intent is that an implementation should identify the nature of, and where
possible localize, each violation. Of course, an implementation is free to produce
any number of diagnostics as long as a valid program is still correctly translated. It
may also successfully translate an invalid program.

Based on the above text, -pedantic-errors cannot be solely used to check programs for
strict C conformance. The flag finds some non-standard practices, but not all - only those for
which C requires a diagnostic, and some others for which diagnostics have been added.

-Wall : This enables all the warnings about constructions that some users consider
questionable, and that are easy to avoid [or modify to prevent the warning]. Some of them
warn about constructions that users generally do not consider questionable, but which
occasionally you might wish to check for; others warn about constructions that are necessary
or hard to avoid in some cases, and there is no simple way to modify the code to suppress
the warning. Some of them are enabled by -Wextra but many of them must be enabled
individually. The entire list of warning flags enabled by -Wall for GCC C compiler can be
found here.

13 / 17
Introduction to Compilation Process [Prasanna Ghali]

-Wextra : This enables some extra warning flags that are not enabled by -Wall . The entire
list can be found here.

-Werror : Converts diagnostic messages generated by the base compiler and warnings
generated by flags -Wall and -Wextra to errors. This is a necessary feature for generating
cleanly compiled code that doesn't generate any warning messages.

-Wstrict-prototypes : It is legal in all C standards to specify a function declaration as:

1 return-type f();

C89, C99, and C11 compilers read the above declaration as f is a function that takes
unknown number of parameters with unknown types and return a value of type return-
type . This means the following code in a source file test.c will compile and link with
undefined behavior when executed.

1 #include <stdio.h>
2
3 extern int foo();
4
5 int main(void) {
6 int x = 2, y = 4, z = 6;
7 printf("%d + %d = %d\n", x, y, foo(x, y));
8 printf("%d + %d + %d = %d\n", x, y, z, foo(x, y, z));
9 return 0;
10 }
11
12 int foo(int i, int j) {
13 return i + j;
14 }
15

The code will compile with GCC C and Clang compilers (and also with Microsoft Compiler)
without any diagnostic messages:

1 $ gcc -std=c11 -pedantic-errors -Wall -Wextra -Werror test.c

However, the same code will not compile with a C++ compiler:

1 $ g++ -std=c++11 test.c

since the declaration return-type f(); in C++ compilers indicates that f is a function that
takes no parameters and returns a value of type return-type .

To ensure compatibility with C++ standards, the -Wstrict-prototypes must be used with
GCC and Clang to ensure that test.c doesn't successfully compile.

Make and Makefiles


The previous section has emphasized the necessity for using a variety of GCC options to ensure
code is cleanly compiled without any warnings. Typing the entire set of required options each
time can be annoying. When writing complex programs consisting of multiple (think tens or
hundreds or thousands) source files, making small changes to a few files will require the entire set

14 / 17
Introduction to Compilation Process [Prasanna Ghali]

of source files to be recompiled. These recompilations may occur hundreds of times every day
causing substantial delays as programmers wait for the executable to be created. More
importantly, programmers will have to remember dependencies between different files. For
example, if source file b.c includes header file a.h and if a.h is updated, then b.c must be
recompiled even though it was not altered.

It can be difficult to remember the entire list of source files and the dependencies required to
create an executable from them. To solve this problem, a program called make is used. The
version of make provided by GCC is coincidentally called make . make is a facility for automated
maintenance and build executables from source files. make uses a makefile that specifies the
dependencies between files and the commands that will bring all files up to date and build an
executable from these up to date files. In short, makefile contains the following information:

the name of source and header files comprising the program


the interdependencies between these files
the commands that are required to create the executable

A simple makefile consists of rules with each rule consisting of three parts: a target, a list of
prerequisites, and a command. A typical rule has the form:

1 target : prereq-1 prereq-2 ...


2 command1
3 command2
4 ...

target is the name of the file to be created or an action to be performed by make. prereq-1 ,
prereq-2 , and so on represent the files that will be used as input to create target . If any of the
prerequisites have changed more recently than target , then make will create target by
executing commands command1 , command2 , and so on. make will terminate and shutdown if any
command is unsuccessful. Note that every command must be preceded by a tab and not spaces!!!

Here's an example:

1 example.out : main.o file1.o file2.o


2 gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror
main.c file1.c file2.c -o example.out

Line says that target example.out must be remade (or made if it doesn't exist) if any of the
prerequisite files [ main.o , file1.o , file2.o ] have been changed more recently than the target.
Before checking the times prerequisite files were changed, make will look for rules that start with
each prerequisite file. If such a rule is found, make will make the target if any of its prerequisites
are newer than the target. After checking that all prerequisite files are up to date and remaking
any that are not, make brings example.out up to date.

Line tells make how it should remake target example.out . This involves calling gcc with the
usual and required GCC options to compile and link source files main.c , file1.c , and file2.c .

A makefile can also contain macro definitions where a macro is simply a name for something. A
macro definition has the form:

1 NAME = value

15 / 17
Introduction to Compilation Process [Prasanna Ghali]

The value of macro NAME is accessed by either $(NAME) or ${NAME} . make will replace every
occurrence of either $(NAME) or ${NAME} in makefile with value .

Here's a complete annotated example:

1 # makefile for example.out


2 # the # symbol means the rest of the line is a comment
3
4 # this is definition of macro GCC_OPTIONS
5 GCC_OPTIONS = -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -
Werror
6 # this is definition of macro OBJS
7 OBJS = main.o file1.o file2.o
8
9 # this rule says that target example.out will be built if prerequisite files
10 # main.o file1.o file2.o file3.o have changed more recently than example.out
11 # the text $(OBJS) will be substituted with list of options in line 7
12 # the next line says to build example.out using command gcc
13 # the text $(GCC_OPTIONS) will be substituted with list of options in line 5
14 example.out : $(OBJS)
15 gcc $(GCC_OPTIONS) $(OBJS) -o example.out
16
17 # the next line says main.o depends on main.c
18 # the line after it says to create main.o with the command gcc
19 main.o : main.c
20 gcc $(GCC_OPTIONS) -c main.c -o main.o
21
22 # file1.o depends on both file1.c and file1.h
23 # and is created with command gcc $(GCC_OPTIONS) -c file1.c -o file1.o
24 file1.o : file1.c file1.h
25 gcc $(GCC_OPTIONS) -c file1.c -o file1.o
26
27 # file2.o depends on both file2.c and file1.h
28 file2.o : file2.c file1.h
29 gcc $(GCC_OPTIONS) -c file2.c -o file2.o
30
31 # clean is a target with no prerequisites;
32 # typing the command in the shell
33 # make clean
34 # will only execute the command which is to delete the object files
35 clean :
36 rm $(OBJS)

Target clean on line is different from the other targets; it has no prerequisites. If the following
command is issued in the shell:

1 $ make clean

then make will execute only the command on line in rule clean and then exit.

Let's conclude this section by writing a simple makefile called Makefile for the new_sum program
that consists of two source files main.c and sum_fn.c and a header file sum.h . The default
makefile is named makefile or Makefile ; other names can be used but make must be provided
the non-default makefile name.

16 / 17
Introduction to Compilation Process [Prasanna Ghali]

1 GCC_OPTIONS = -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -


Werror
2 OBJS = main.o sum_fn.o
3 EXEC = new_sum.out
4
5 $(EXEC) : $(OBJS)
6 gcc $(GCC_OPTIONS) $(OBJS) -o example.out
7
8 main.o : main.c sum.h
9 gcc $(GCC_OPTIONS) -c main.c -o main.o
10
11 sum_fn.o : sum_fn.c sum.h
12 gcc $(GCC_OPTIONS) -c sum_fn.c -o sum_fn.o
13
14 clean:
15 rm $(OBJS) $(EXEC)

The most common error with a makefile is programmers forgetting to put a horizontal tab at the
beginning of a command line, and instead place space characters there. Here's what happens if
line is prefixed with space characters rather than a tab:

1 Makefile:12: *** missing separator. Stop.


2

17 / 17
Annotated First C Programs [Prasanna Ghali]

Annotated First C Programs


Things to know before writing first C program
Everything you'll learn this semester about C programming language is relevant to the C++ programming
language that you'll learn next semester. Since the course is concerned with the C subset of C++
programming language, concepts described in this and other documents are applicable not only to C but also
to C++. If you follow lectures, tutorials, and assignments correctly, every C program you'll write this semester
will also be a C++ program.

Everything related to C in this course refers to the C11 standard of the language. Certain code that will be
demonstrated throughout the semester might fail to compile in older C standards.

C programs follow the von Neumann architecture. Both instructions and data of a program reside in memory.
Programming is then an endeavor that requires programmers to write appropriate instructions that
transform input data into required output data.

A program consists of source file(s).

Source file(s) contain functions that communicate with other functions in the program by passing and
returning values.

A function encapsulates an algorithm. An algorithm is a finite sequence of instructions that operate on data.
Think of a function as a black box that takes certain input, performs actions that transform the input, and
then returns the transformed data as output to the function's caller.

A C program is made up of functions which in turn are made up of statements. A statement is the atomic unit
of a C program. Informally, statements in an algorithm correspond to C statements.

Data manipulated by a function consists of two types: literals and variables.

1. Literals express constant values such as 7 or 123.56 or a character such as 'a' or a sequence of
characters such as "Hello" .
2. Variables identify named memory locations at which data of interest is located. There are two values
associated with variables. The first is the physical memory address that variables represent. The second
is the contents of these physical memory locations. Unlike other high-level programming languages,
both the address and the value stored at the address are accessible to C programmers.
All data, represented by both constants and variables, is typed. A data type specifies the set of values and the
set of operations that can be applied on these values. The type specified by a programmer for a constant or a
variable allows the compiler to translate to the machine the nature of the data stored and how the machine is
to interpret the data. If a variable doesn't have a type associated with it, it will be impossible for C compilers
to convey to the machine how to correctly interpret the contents of the memory locations associated with
that variable. For this reason, both C and C++ are said to be statically typed languages. On the other hand,
Python is an example of a dynamically typed language.

Using decimal notation is natural for ten-fingered humans, but the language of a computer, called machine
language, is a sequence of s and s. Each of the digits, and , is called a binary digit or bit. A bit can be only
a or a - never anything else, such as or , or or or . This is a fundamental concept. Every piece of
information stored in a computer or processed by a computer, whether it is your name, or your street
address, or the amount you owe on your credit card, is stored as strings of s and s.

In isolation, a single bit is not very useful since it can represent only two values. When groups of bits are
combined together and some interpretation is applied that gives meaning to the different possible bit
patterns, it becomes possible to represent the elements of any finite set. A sequence of bits is referred to as a
binary number. For example, using a binary number system, groups of bits can be used to encode integers. By
using a standard character code such as ASCII, the letters and symbols on a keyboard can be encoded as
binary numbers to represent text in a document.

Although humans can represent arbitrarily large numbers using the decimal, or octal, or some other number
system, machines are only capable of representing binary numbers having specific number of bits. Machines
don't let you collect together or process binary numbers having an arbitrary number of bits. Instead,
machines represent instructions and data using binary numbers having certain fixed number of bits. Since

1 / 21
Annotated First C Programs [Prasanna Ghali]

the early days of computing, a sequence of eight bits, called a byte, has become the de facto standard for an
unit of digital information. A byte represents the smallest data item and the unit of storage in modern
computers.

CPUs have evolved from bits to incorporate binary numbers having , , and bits. In addition, every
CPU has a word size, indicating the number of bits in a binary number that can be atomically [that is, as a
single unit] processed by the CPU. Most modern CPUs have a bit word size. Currently, for each specific bit
size [ , , , and bits], computers specify binary numbers using unsigned, signed, and floating-point
representations:

Unsigned representation for positive integers greater than or equal to encoded in traditional binary
form. For bits, unsigned numbers ranging from to can be represented. For bits,
unsigned numbers ranging from to can be represented. And so on for and
bits.
Signed representation for both positive and negative integers is encoded in two's-complement form. For
bits, signed numbers ranging from to can be represented. For bits,
signed numbers ranging from to can be represented. And so on for
and bits.
Floating-point representation for rational numbers of the form encoded in the IEEE 754
format.
The fundamental, numeric data types for bit C11 compiler used in this course are:

Note that types signed short int , signed int , signed long int , and signed long long int are
equivalent to abbreviated types short , int , long , and long long , respectively. Similarly, unsigned short
int , unsigned long int , and unsigned long long int are equivalent to abbreviated types unsigned
short , unsigned long , and unsigned long long , respectively.

In programming languages, the term identifier means the name given to variables, functions, macros, and so
on. Except literals, everything in a C program such as variables and functions must have an identifier so that
they can be uniquely identified by programmers and compilers.

Identifiers in C consist of a sequence of Latin characters [ a through z , A through Z ], underscores _ , and


digits 0 through 9 . The first symbol of an identifier must be either a Latin character or an underscore. C
identifiers are case sensitive. Valid identifiers are, for example, monkey , Monkey , an_id_23 , big_monkey ,
small_monkey , _the_monkey , or __special_monkey . In general, it is never a good idea to name identifiers
beginning with two underscore, as in __bad_monkey because C standard library uses double underscores and
your identifier might clash with a system defined identifier.

Before a variable or function is used in a program, their identifiers must first be declared to the compiler, and
presumably to any human reader of the program. More specifically, the compiler must be provided with the
type associated with the identifier. This allows the compiler to correctly translate to the machine the values
represented by these identifiers. Note that ISO C11 standard specifies that variables can be declared
anywhere in a function with the only caveat that they be declared before their first use.

By default, every program is provided three input/output streams to interact with its environment: standard
input [ stdin ] for the program to read input from the keyboard device by default, standard output [ stdout ]
for the program to write output to the computer display screen by default, and standard error [ stderr ] for
the program to write error or diagnostic messages to the computer display screen by default.

Before continuing with this document, you must understand how compilers work. Compilation is a multi-
stage process involving several tools, including the compiler itself gcc , the assembler as , and the linker ld .
The complete set of tools used in the compilation process is known as the compiler toolchain or compiler
driver. The sequence of commands executed by a single invocation of gcc consists of the following stages:

2 / 21
Annotated First C Programs [Prasanna Ghali]

preprocessing, compilation proper, assembly, and linking.

By default, GCC C compiler gcc expects source files to have .c suffix.

This course will use the ISO C11 standard. Code presented in this document and throughout this semester
may not compile in older C standards.

The minimal C program


Consider the following source file nothing.c that - as the name implies - does nothing:

1 // this is the minimal C program


2 int main(void)
3 {
4 return 0;
5 }
6

1. The double slash // on line begins a single-line comment that extends to the end of the line. A comment is
for consumption by the human reader and not the compiler. The preprocessor will strip away comments and
replace them with single space character. This means that the compiler proper will never see comments in a
source file.

2. Line declares a function main .

Line consists of three identifiers int , main , and void . An identifier is a sequence of characters used
to denote names of variables, functions, and other entities such as macros and types. An identifier may
contain letters, digits, and underscores, but must begin with a letter or underscore. This page provides
information and rules specific to C identifiers.

Not only are int and void identifiers, they're also C keywords. A keyword is a predefined identifier that
has special meaning to C compilers.

Line declares function main . Every C program uses function main as the entry point to the program
and therefore there can only be one and only one function called main in a C program.

Recall that a function encapsulates an algorithm that transforms input value(s) to output value(s). Inputs
to a function are specified between a pair of parentheses ( and ) as a sequence of comma-separated
values called function parameters. Function main takes a single parameter of type void and returns a
value of type int . Although functions can take as many parameters as dictated by the author, they can
only return a single value.

Identifier void is a C keyword indicating it is a C data type specifying no value. The use of void data
type in the current context where it is delimited within parentheses ( and ) indicates to the compiler
and human readers that function main won't receive any data, or value, or information from the
function that invoked it.

C++ standards allow programmers to skip the use of void in function parameters. That is, C++
compilers will implicitly assume that function main doesn't receive input values and has a void
parameter if the function is written as:

1 int main() // invalid in C - but valid in C++


2 {
3 return 0;
4 }
5

However, C11 requires every function that doesn't receive input values to specify this fact to the
compiler using void . Therefore, the above code will not compile with the C compiler and must be
rewritten as

3 / 21
Annotated First C Programs [Prasanna Ghali]

1 int main(void) // this is how this course will declare main


2 {
3 return 0;
4 }
5

int is a C keyword indicating a data type that on both bit and bit machines represents bit
signed integers ranging in value from ( ) to ( ). In the current
context, the program's author is indicating to the compiler that function main will return a value of type
int to the function that invoked it. Recall that type int is an abbreviated and equivalent form of the
wordier type signed int .

To summarize, line of the source text declares identifier main to be a function that takes no values and
returns a value of type int to the operating system.

3. Line :

Curly braces are used in C to group together stuff including all statements required to implement an
algorithm.
Left curly brace { starts the function body or code block of function main and will contain statements
that implement the algorithm encapsulated by function main . The right curly brace on line } matches
the left curly brace and ends the body of function main .
4. Line contains a C statement.

Notice that statement(s) within a function are indented to easily identify to human readers that these
indented statements constitute a code block. Unlike Python, C and C++ are free form and programmers
have very few restrictions on how they present source code to compilers.

All statements are delimited by a semi-colon ; . Here, statement return 0; will result in function main
[and therefore the program] returning value 0 of type int to the operating system indicating that the
program executed successfully. Note that the type of the value returned by the return statement
[literal 0 has type int ] matches the return type [ int ] specified on line .

C programs return nonzero values to indicate failure. Not every operating system makes use of that
return value: Linux-based operating systems do, but Windows systems rarely do.

Add return to your list of C keywords - the others are int and void .

C11 and all standards of C++ allow programmers to skip the explicit return statement in function main .
If there is no explicit return statement in function main , these C/C++ compilers will implicitly add
statement return 0; to indicate successful completion. This means that the most minimal C/C++
program will look like this:

1 int main(void)
2 {
3 }
4

For consistency with older C standards, my documented examples will always contain an explicit return
statement.

5. Line contains an empty line. According to ISO C standards, every source file must terminate with a newline.

6. C is a free form language implying that programmers have very few restrictions on how they present source
code to compilers. For example, the entire source code in nothing.c can be written on a single line:

1 // this is the minimal C program


2 int main(void) { return 0; }
3

4 / 21
Annotated First C Programs [Prasanna Ghali]

Writing code like this decreases the vertical spread of the source code while increasing its horizontal spread.
Jamming several things into a single line makes code hard to read and maintain for programmers. Instead, I
prefer a compromise that decreases vertical spread a little bit while increasing horizontal spread a little bit.
My code will follow this template:

1 // this is the minimal C program


2 int main(void) {
3 return 0;
4 }
5

7. Open a Windows command prompt. Change your current working directory to C:\sandbox . Switch to Linux
bash shell using Window shell command wsl . Open Code from the bash shell using command: code
nothing.c . Use Code to enter the code described in nothing.c . After saving the file, use GCC C compiler
gcc to compile nothing.c (note that the $ symbol below represents the Linux bash shell command prompt
and is not part of the gcc command):

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror -c nothing.c


-o nothing.o

The six options to gcc : -std=c11 , -pedantic-errors , -Wstrict-prototypes , -Wall , -Wextra , -Werror
are required and necessary every time you compile a source file.

Option -c indicates that source file nothing.c is only to be compiled but not linked. That is, source file
nothing.c is converted to an object file containing binary machine code for a specific CPU but not into
an executable program.

Option -o nothing.o gives the name nothing.o to the output object file generated by the compiler. All
though compilers will default the object file's name to the name of the source file with an extension of
.o , I explicitly specify the name of the object file.

Link object file nothing.o with standard library object files to create an executable:

1 $ gcc nothing.o -o nothing.out

Since this particular example describes a minimal C program, there are no C standard library object files
to be linked to the object file nothing.o . If option -o is not used, the linker will default the executable
file to a.out .

To run executable program nothing.out , type the executable's pathname like this:

1 $ ./nothing.out
2 $

Repeat the compile and link steps using verbose option -v . This option prints the commands that
execute the different compilation stages to standard output.

Second C program: Using the C standard library


Consider the code in source file hello.c that prints a greeting to stdout :

5 / 21
Annotated First C Programs [Prasanna Ghali]

1 #include <stdio.h>
2
3 /*
4 Print a greeting to the world!!!
5 */
6 int main(void) {
7 printf("Hello World!!!\n");
8
9 return 0;
10 }
11

1. Let's start with line containing unusual characters # , < , and > : #include <stdio.h>

The preprocessor is a program used by the C compiler to provide certain text utility functionalities. Think
of the C preprocessor's behavior and capabilities as being similar to a specialized text editor.
Identifier include is a directive to the preprocessor to replace line with contents of file stdio.h .
Delimiters < and > tell the preprocessor to begin searching for file stdio.h in standard include paths
that were established when the compiler was installed on a computer. The search will conclude in the
current working directory of source file hello.c .
What is the purpose behind including file stdio.h ? This file is supplied by the compiler vendor and
contains declarations of input and output functions defined in C standard library. Recall that C itself is a
fairly small language and it is up to the C standard library to provide input/output capabilities to C
programs. Specific to source file hello.c , stdio.h contains a declaration of function printf that
prints text to stdout . Files such as stdio.h are called header files because C requires identifiers be
declared before their first use and related identifiers are collected in a file which is conveniently added at
the head or top of the source file so that these identifiers are visible throughout the source file.
A function declaration introduces a reference to a function defined elsewhere (in this case, in the C
standard library) by specifying the function's name, its parameter list, and its return type. A declaration
for function printf is required in source file hello.c for the compiler to ensure the call to printf
matches the number and types of parameters in the declaration of printf and further ensure that the
return value from printf is used correctly by the caller in hello.c . Note that the code in hello.c
ignores the return value from function printf . In short, the compiler will use the function declaration of
printf to check for the correct use of function printf in source file hello.c .
Function declarations as discussed in this document and throughout this course are known as function
prototypes to distinguish them from an older style of function declarations in which the parameter list is
left empty. This course will leave this bit of ugly history behind and instead follow C++ terminology: the
terms function declaration and function prototype are considered synonyms and the term function
declaration will be used in lieu of function prototype.
2. Line is a newline which is left untouched by the preprocessor. The space, tab, and newline are collectively
known as whitespace characters. These characters are ignored [there are a few cases where they're not
ignored but let's not dwell on those obscure details at this early stage] and their main purpose is to provide
punctuation.

3. Lines , , and :

Lines , , and specify a multi-line comment. A multi-line comment is program text delimited by
characters /* and characters */ . A multi-line comment can be on a line by itself, or it can be on the
same line as a statement, or can extend over several lines.
Comments are only meant for consumption by human readers - they're supposed to provide the reader
with a clear and easy-to-understand description of what is the algorithm and how the algorithm is being
implemented by sequences of statements. Although comments are optional, good style requires
comments be used throughout a program to improve its readability and to document the algorithm.
Since comments don't have meaning to a compiler, C standards require the preprocessor to strip the
source file of comments by replacing them with single space characters.
4. Line :

The declaration of function main has been seen in an earlier program. Now, let's expand on the concept
of declarations by introducing definitions. Line begins the declaration and also the definition of function
main .

6 / 21
Annotated First C Programs [Prasanna Ghali]

Think of a function definition as the physical manifestation of what is described in a function declaration.
While a function declaration tells the compiler [and presumably human readers] about the function's
name, list of function parameters, and type of value returned by the function, a function definition creates
memory storage for the instructions necessary to implement the algorithm that is encapsulated by the
function and defines its parameters and return value. The following code snippet illustrates the
difference between a function declaration and function definition:

1 /*
2 this is a function declaration (prototype): it tells the
3 compiler that inc is a function that takes a parameter of
4 type int and returns a value of type int
5 */
6 int inc(int x);
7
8 /*
9 this is a function definition: it not only tells the compiler
10 that inc is a function that takes a parameter of type int and
11 returns a value of type int; in addition a function definition
12 implements the function using statements.
13 */
14 int inc(int x) {
15 return x+1;
16 }
17

All C programs must define a single function named main . This function will become the entry point of
the program - that is, main will be the first function executed when the program is started. Returning
from this function terminates the program, and the returned value is treated as an indication of program
success or failure.

Any time you want to group things in C you put these things between opening curly brace { and closing
curly brace } . Compilers understand these braces as punctuator symbols that enclose these things. With
functions, the statements implementing an algorithm are the things that must be enclosed between {
and } . Notice that unlike previous programs, left curly brace { is not present in its own line but is
instead separated from closing parenthesis ) by whitespace. All subsequent statements until a
corresponding right curly brace } specify the function's body. In fact, left curly brace { need not be
separated from the closing parenthesis ) by whitespace nor does the first statement have to be
whitespace separated from the preceding { :

1 #include <stdio.h>
2 /*Print a greeting to the world!!!*/
3 int main(void){printf("Hello World!!!\n");return 0;}
4

C is a free form language that only requires whitespace to provide punctuation. For example, on line of
the above code, the compiler will require a whitespace between int and main to distinguish these two
discrete identifiers. Otherwise, intmain will be interpreted as a single identifier.

The following text is also valid C syntax:

1 #include <stdio.h>
2 /*Print a greeting to the world!!!*/int main
3 (void){printf("Hello World!!!\n");return 0;}
4

There doesn't need to be whitespace between { and printf because the compiler understands { to
be a punctuator symbol that indicates the start of a new block of code and the subsequent identifier
printf is considered as part of the first statement in this new block of code.

5. Line begins with identifier printf .

7 / 21
Annotated First C Programs [Prasanna Ghali]

Since identifier printf is followed by left parenthesis ( , followed by a bunch of stuff, followed by right
parenthesis ) , the compiler will understand that function printf is being called. This function is
defined in the C standard library and is used for printing formatted data to standard output.

Between the left and right parentheses, the text enclosed in a pair of double quotes "Hello
World!!!\n" is called a string literal. A string literal is a sequence of characters delimited by a pair of
double quotes " .

The string literal "Hello World!!!\n" is the argument or value passed to function printf .

You might expect the sequence of characters Hello World!!!\n to be printed to stdout by function
printf . However, that is not the case - only the sequence of characters "Hello World!!!" are printed
to stdout on the first line followed by a newline. The backslash '\' is called an escape character when it
is used in a string. The compiler combines it with the character that follows it and then attaches a special
meaning to the combination of characters. For example, \n represents a skip to newline. The cursor is a
moving place marker that indicates the next position in stdout [on the screen, for example] where
information will be displayed. When executing a printf function call, the cursor is advanced to the start
of the next line in stdout if the \n escape sequence is encountered in the string passed to the function.
A printf string often ends with a \n newline escape sequence so that the call to printf produces a
completed line of output. If no characters are printed on the next line before another newline character
is printed, a blank line will appear in the output. For example, the calls

1 printf("today is a good day.\n");


2 printf("\ntomorrow will be a better day.\n");

produce two lines of text with a blank line in between:

1 today is a good day.


2
3 tomorrow will be a better day.

The first call to printf places the cursor at the start of line . Since \n newline escape sequence is the
first character of the string in the argument to the second call to printf , the cursor will move to the
start of line . Because the second call to printf also terminates with \n , the cursor will print the
characters tomorrow will be a better day. on line and then shift to the start of line .

In addition to \n , a number of additional escape characters are recognized. For example, the sequence
\\ is used to insert a single backslash in a string, and the sequence \" will insert a double quote in a
string. Thus, the call to function printf :

1 printf("\"The End.\"\n");

will print to stdout a line containing

1 "The End."

Some of the commonly used escape characters recognized by C are:

In short, function main is calling function printf with string literal "Hello World!!!\n" as argument
that is printed to stdout by printf as a line containing Hello World!!! followed by a newline.

8 / 21
Annotated First C Programs [Prasanna Ghali]

6. Line represents a newline that was introduced by the programmer to make the code more readable to
other human readers. As indicated earlier, a newline is left untouched by the preprocessor.

7. Line contains statement return 0; which indicates that function main is returning a value of type int .
Since the definition of function main on line indicates that main is a function that takes no value and
returns a value of type int , the function must return a value of type int .

8. Line contains right curly brace } which matches left curly brace { on line indicating the end of the body
of function main .

9. Line is empty because every C source file is required to terminate with a newline.

10. Compile source file hello.c using GCC C compiler with the full suite of options:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror -c hello.c -o


hello.o

11. Link object file hello.o with C standard library to create executable file hello.out :

1 $ gcc hello.o -o hello.out

12. To run executable program hello.out , type the executable's pathname like this:

1 $ ./hello.out
2 Hello World!!!
3

Things to try: Deciphering messages from compiler


1. What happens when you insert characters /* at the start of line in source file hello.c ?

2. What happens when you add characters /* at the beginning and characters */ at the end of line in the
above code?

3. What happens when you add characters // at the beginning of line in the above code?

It is clear that C11 prohibits the nesting of multi-line comments. However, single-line comments can be
nested inside multi-line comments.

4. Compile the source file after commenting line ? Does the compiler generate diagnostic messages? Do you
understand why the compiler doesn't generate diagnostic messages?

Third C program: Multiple source files


1. Now, we'll reinforce basic concepts crucial to understanding the structure, configuration, and vocabulary of C
programs: declarations and definitions. This is done by deconstructing the Hello World!!! program into a
function hello that prints Hello World!!! to stdout and a client function main that uses the services of
function hello by making a call to it. This deconstruction consisting of two functions main and hello is
implemented using two source files driver.c and hello-defn.c , and a header file hello-decl.h . It is
further assumed that files hello-defn.c and hello-decl.h are authored by one programmer while
driver.c is implemented by a second programmer and neither programmer is aware of the other.

2. Since function hello will be used by other parts of the program and function hello in turn calls C standard
library function printf , a header file hello-decl.h is created to declare these two functions. This is the
purpose of header files - to gather together declarations of related entities in a file and provide clients the
service of including this header file rather than having to individually declare each entity. More specifically,
line in the following header file contains the declaration of function hello . As described earlier, a function
declaration specifies the function's name, its parameter list, and its return type. The compiler will use this
function declaration to check for the correct use of function hello by clients that wish to use this function in
their source files. File hello-decl.h will look like this:

9 / 21
Annotated First C Programs [Prasanna Ghali]

1 #include <stdio.h>
2
3 // declaration (prototype) of function hello
4 void hello(void);
5

3. Source file hello-defn.c containing the definition of function hello will look like this:

1 #include "hello-decl.h"
2
3 // definition of function hello
4 void hello(void) {
5 printf("Hello World!!!\n");
6 }
7

Notice that line of source file hello-defn.c contains a preprocessor directive to include header file hello-
decl.h . This is a recommended practice to ensure that both the function declarations in the header file and
their definitions in the source file match up. Also notice that line has an include directive that delimits the
name of the header file hello-decl.h between a pair of double quotes " . When the header file is delimited
by angle brackets <> , the preprocessor will search for the header file in the standard include paths of the
compiler. Now, with a pair of double quotes " used as delimiters, the preprocessor will only search for the
header file in the current directory in which source file hello-defn.c is located.

The author of file hello-defn.c now has everything required to successfully compile the source file to an
object file:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror -c hello-


defn.c -o hello-defn.o

4. Independently, the second programmer has defined function main in a separate file driver.c :

1 #include "hello-decl.h"
2
3 int main(void) {
4 hello();
5
6 return 0;
7 }
8

Similar to source file hello-defn.c , driver.c has an include directive on line that delimits the name of
header file hello-decl.h between a pair of double quotes " . When hello-decl.h is included in a source file
such as driver.c , the preprocessor will replace the line containing include directive with contents of file
hello-decl.h . The transformed driver.c will look like this:

1 #include <stdio.h>
2
3 void hello(void);
4
5 int main(void) {
6 hello();
7
8 return 0;
9 }
10

10 / 21
Annotated First C Programs [Prasanna Ghali]

The preprocessor will notice that the transformed version of driver.c again contains an include directive
and will further transform the previously transformed file by replacing line of transformed driver.c with
the contents of file stdio.h . This process will recursively continue if stdio.h itself contains include
directives. The transformed version driver.c will look like this:

1 contents of stdio.h here ...


2
3 void hello(void);
4
5 int main(void) {
6 hello();
7
8 return 0;
9 }
10

The author of file driver.c now has everything required to create an object file without access to the source
file containing the definition of function hello :

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror -c driver.c -


o driver.o

5. Either of the two programmers or any other programmer can now link together function main in object file
driver.o , function hello in object file hello-defn.o , and C standard library function printf defined in
static C standard library libc.a into a single executable, say new-hello.out :

1 $ gcc driver.o hello-defn.o -o new-hello.out

6. To run executable program new-hello.out , type the executable's pathname like this:

1 $ ./new-hello.out
2 Hello World!!!
3

7. One way to think of hello-decl.h and hello-defn.c is that hello-decl.h is an interface file while hello-
defn.c is an implementation file. Other programmers include interface files in their source files to call
functions declared in these interface files without either programmers nor compilers aware of the
implementation details of these functions. Instead programmers and compilers are only interested in
whether these declared functions are referenced or called correctly in these other source files. It is the linker
that will ensure that the implementation details of these declared functions are integrated with the functions
in these other source files to create a synergistic and whole execution file.

Things to try:
1. What happens when you try to compile driver.c by commenting out line ?

If line of driver.c is removed, that is, if driver.c doesn't include header file hello-decl.h , the compiler
will implicitly assume that function hello is declared as:

1 int hello();
2

This declaration says that hello is a function that takes an unknown number of parameters and returns an
int . That is, if the declaration of function hello is not present in the source file, the compiler will implicitly
assume that hello is declared as int hello(); and print a diagnostic message to indicate this assumption:

11 / 21
Annotated First C Programs [Prasanna Ghali]

1 $ gcc -std=c11 -Wstrict-prototypes -Wall -Wextra -c driver.c -o driver.o


2 driver.c: In function ‘main’:
3 driver.c:4:3: warning: implicit declaration of function ‘hello’ [-Wimplicit-function-
declaration]
4 4 | hello();
5 | ^~~~~
6

Notice that source file driver.c is successfully compiled into object file driver.o . In fact, this object file
driver.o can be linked with object file hello-defn.o and C standard library to create an executable that
prints the required greeting to stdout :

1 $ gcc driver.o hello-defn.o -o new-hello.out


2 $ ./new-hello.out
3 Hello World!!!

All though the executable seems to run correctly, remember that the compiler has made a spurious
assumption in the absence of an explicit declaration of function hello that hello is a function that takes an
unknown number of parameters and returns an int even though hello is defined as a function that takes
zero parameters and returns nothing. This particular behavior of C means that it is possible to create runtime
security holes in the program by authoring functions that call function hello with arguments. C++ designers
correctly identified this drawback of C as a serious security threat and therefore C++ was designed to flag
omissions of function declarations as errors. To ensure C compilers can compile C code developed many
decades earlier, C standards are unable to flag omissions of function declarations as errors.

The C code created in this course must compile cleanly with C++ compilers and thus the code must explicitly
avoid certain drawbacks of C standards. Therefore, it is important that you always declare every function you
use in a source file before its first use so that the compiler is not allowed to make implicit assumptions that
can cause runtime bugs and security holes. You enforce this policy using gcc compiler option -pedantic-
errors which will prevent such implicit declarations to be just passed off with diagnostic warnings:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -c driver.c -o


driver.o
2 driver.c: In function ‘main’:
3 driver.c:4:3: error: implicit declaration of function ‘hello’ [-Wimplicit-function-
declaration]
4 4 | hello();
5 | ^~~~~
6

Notice how option -pedantic-errors has converted a diagnostic warning to an error, thereby preventing the
object file from being created.

2. In the previous question, you saw an example of compiler error. To distinguish between compiler errors and
linked errors, compile and link driver.c using option -Werror like this:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror driver.c

What happens when you replace the pair of double quotes delimiters #include "hello-decl.h" in line of
driver.c with angled braces < and > , as in #include <hello-decl.h> ?

3. Likewise, what happens when you replace the angled brace delimiters #include <stdio.h> in line of
hello-decl.h with pair of double quotes delimiters #include "stdio.h" ?

Fourth C program: Mathematical functions and linking with


C standard math library

12 / 21
Annotated First C Programs [Prasanna Ghali]

Arithmetic expressions that solve science and engineering problems often require computations beyond basic
addition, subtraction, multiplication, and division. Much problem solving requires the use of exponentiation,
logarithms, exponentials, and trigonometric functions. This section introduces mathematical functions available in
the C standard library.

The process begins with the use of the following preprocessor directive in any source file referencing
mathematical functions in the C standard library:

1 #include <math.h>

This directive specifies that function prototypes and macros be added to the source file to aid the compiler when it
converts calls to mathematical functions in the C standard library.

If your algorithms involve extensive math operations then you should look up the wide variety of functions
prototyped in math.h to see which functionality is already implemented and what you may have to build from
scratch. There's an important detail relating to trigonometric functions that stymies beginner programmers:
trigonometric functions assume that their argument is in radians. For example, if you've a variable theta
containing a value in degrees, that angle must be converted to radians (recall from trigonometry that
radians). The following code fragment does the trick:

1 #define PI (3.141593)
2 #define DEG_TO_RAD (PI/180.0)
3 ...
4 double theta_rad = theta * DEG_TO_RAD;
5 double x = sin(theta_rad);
6 // conversion can also be specified within function reference
7 double y = sin(theta * DEG_TO_RAD);

Distance between two points


Recall the problem statement:

and the algorithm that was devised to solve this problem:

First, an include preprocessing directive for math.h and function prototype for function distance are provided
in header file distance.h :

1 #include <math.h>
2
3 /*!
4 @author pghali
5 @brief Computes the distance between two points.
6
7 This function takes coordinates of point (px, py) and
8 point (qx, qy) and returns the distance between P and Q.
9
10 @param px - double-precision floating-point value specifying px.
11 @param py - double-precision floating-point value specifying py.
12 @param qx - double-precision floating-point value specifying qx.
13 @param qy - double-precision floating-point value specifying qy.
14 @return - a double-precision floating-point value measuring the
15 distance between P and Q.

13 / 21
Annotated First C Programs [Prasanna Ghali]

16 *//*_____________________________________________________________*/
17 double distance(double px, double py, double qx, double qy);
18

The following code in source file distance.c defines function distance :

1 #include "distance.h"
2
3 double distance(double px, double py, double qx, double qy) {
4 // compute sides of right triangle formed by two points
5 double width = qx - px;
6 double height = qy - py;
7 return sqrt(width*width + height*height);
8 }
9

Source file distance.c is [only] compiled in the usual manner:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror -c distance.c -o


distance.o

Function distance can be tested by calling it from function main which is implemented in file test-dist.c :

1 #include <stdio.h>
2 #include "distance.h"
3
4 int main(void) {
5 double px = 0.0, py = 0.0, qx = 3.0, qy = 4.0; // input portion
6 // compute distance from P(0, 0) to Q(3, 4)
7 double dist = distance(px, py, qx, qy);
8 printf("Distance is %f\n units", dist); // output portion
9 return 0;
10 }
11

Source file test-dist.c is [only] compiled in the usual manner:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror -c test-dist.c -o


test-dist.o

However, the linker throws an error when an attempt is made to generate an executable:

1 $ gcc test-dist.o distance.o -o dist.out


2 usr/bin/ld: distance.o: in function `distance':
3 distance.c:(.text+0x59): undefined reference to `sqrt'
4 collect2: error: ld returned 1 exit status
5

The problem is that function sqrt is not defined in the default C standard library libc.a . Instead, it is defined in
external math library libm.a and because of historical reasons the compiler does not link to file libm.a unless it
is explicitly selected. This is done using -lm option to the linker stage of gcc :

1 $ gcc test-dist.o distance.o -o test-dist.out -lm

Option -lm is a shorthand for link object files with library file /usr/lib/x86_64-linux-gnu/libm.a .

Run executable program test-dist.out to test function distance :

14 / 21
Annotated First C Programs [Prasanna Ghali]

1 $ ./test-dist.out
2 Distance is 5.000000 units
3

The two source files distance.c and test-dist.c can be individually compiled and then linked together along
with the C standard library with a single call to gcc :

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror distance.c test-


dist.c -o test-dist.out

Printing output from program


The printf function was first introduced in the Hello World!!! program to print a sequence of characters
comprising a greeting to stdout [the computer's screen by default]. In addition to printing explanatory text, the
printf function was also used in the distance program to print values to stdout . Consider the following
statement that prints the value of a double variable named dist :

1 printf("Distance is %f units\n", dist);

This printf statement contains two arguments: a format string and a print list containing an identifier to specify
the value to be printed.

A format string is enclosed in a pair of double quotes " , and can contain text, format specifiers, or both. A format
specifier begins with the character % and describes the format to use in printing the value of a variable. If the
format string doesn't contain format specifiers, the characters in the format string will be printed to stdout as is.
In this example, the format string specifies that the characters Distance is are to be printed. The next group of
characters %f represents a format specifier that indicates that a floating-point value is to be printed next, which
will then be followed by the characters units . The next combination of characters \n represents a newline
indicator; it causes a skip to a newline on stdout after the information has been printed. The second argument in
the printf statement is a variable dist ; it is matched to the format specifier %f in the format string. Function
printf is authored to a print a float or a double when the format specifier is %f [floating-point form]. Thus, in
this example, since the value of dist is , this value will be printed to stdout :

1 Distance is 5.000000 units


2

Why does printf print 5.000000 and not 5.0 ? This matter will be explained in the next section.

Printing floating-point values


Floating-point values are displayed in one of several formats as indicated in the following table. Please note that
you don't need to memorize any of these details. They're only being provided here so that you know that there
many ways in which floating-point values can be printed. And, when you need to print floating-point values a
certain way, you should look up a reference page to know what your options are.

15 / 21
Annotated First C Programs [Prasanna Ghali]

Format specifiers f and F always prints at least one digit to the left of the decimal point. Values displayed
with format specifiers f and F print digits of precision to the right of the decimal point by default. This is
the reason why printf prints 5.000000 and not 5.0 in the previous example.
Format specifiers e and E display floating-point values in exponential notation - the computer equivalent of
scientific notation used in math. For example, the value is represented in scientific notation as
and in exponential notation as by the computer. This notation indicates that
is multiplied by raised to the second power ( ) where stands for exponent. Format
specifiers e and E print lowercase e and uppercase E , respectively, preceding the exponent, and print
exactly one digit to the left of the decimal point. Values displayed with format specifiers e , and E print
digits of precision to the right of the decimal point by default.
Format specifier g [or G ] prints in either e (or E ) or f format with no trailing zeros. For example,
is printed as 1.234 . The details of which format is picked are too detailed for beginners and the interested
reader can look up the details here.

The following code illustrates the various floating-point format specifiers:

1 #include <stdio.h> // contains function prototype of printf


2
3 int main(void) {
4 double d = 1234567.89;
5 float f = d;
6
7 // format specifier f to print float value
8 printf("Print float using format specifier %%f: %f\n", f);
9 // format specifier f to print double value
10 printf("Print double using format specifier %%f: %f\n", d);
11 // format specifier F to print float value
12 printf("Print float using format specifier %%F: %F\n", f);
13 // format specifier F to print double value
14 printf("Print double using format specifier %%F: %F\n", d);
15 // format specifier e to print float value
16 printf("Print float using format specifier %%e: %e\n", f);
17 // format specifier e to print double value
18 printf("Print double using format specifier %%e: %e\n", d);
19 // format specifier E to print float value
20 printf("Print float using format specifier %%E: %E\n", f);
21 // format specifier E to print double value
22 printf("Print double using format specifier %%E: %E\n", d);
23 // format specifier g to print float value
24 printf("Print float using format specifier %%g: %g\n", f);
25 // format specifier g to print double value
26 printf("Print double using format specifier %%g: %g\n", d);
27 // format specifier G to print float value
28 printf("Print float using format specifier %%G: %G\n", f);
29 // format specifier G to print double value
30 printf("Print double using format specifier %%G: %G\n", d);
31
32 return 0;
33 }
34

16 / 21
Annotated First C Programs [Prasanna Ghali]

The printf statements print the following text to stdout :

1 Print float using format specifier %f: 1234567.875000


2 Print double using format specifier %f: 1234567.890000
3 Print float using format specifier %F: 1234567.875000
4 Print double using format specifier %F: 1234567.890000
5 Print float using format specifier %e: 1.234568e+06
6 Print double using format specifier %e: 1.234568e+06
7 Print float using format specifier %E: 1.234568E+06
8 Print double using format specifier %E: 1.234568E+06
9 Print float using format specifier %g: 1.23457e+06
10 Print double using format specifier %g: 1.23457e+06
11 Print float using format specifier %G: 1.23457E+06
12 Print double using format specifier %G: 1.23457E+06
13

Notice that the E , e , g , and G format specifiers cause the value to be rounded when printed to stdout while
format specifier f does not.

Printing integer values


Integer values are displayed in one of the several format specifiers shown in the following table:

The following code illustrates the various integer format specifiers:

1 #include <stdio.h> // contains function prototype of printf


2
3 int main(void) {
4 int val = 345;
5
6 // format specifier d to print signed integer value
7 printf("Format specifier %%d: %d\n", +val);
8 // format specifier d to print signed integer value
9 printf("Format specifier %%d: %d\n", -val);
10 // format specifier i to print signed integer value
11 printf("Format specifier %%i: %i\n", -val);
12 // format specifier u to print unsigned integer value
13 printf("Format specifier %%u: %u\n", val);
14 // format specifier u to print unsigned integer value
15 printf("Format specifier %%u: %u\n", -val);
16 // format specifier o to print unsigned integer value in octal
17 printf("Format specifier %%o: %o\n", val);
18 // format specifier o to print unsigned integer value in octal
19 printf("Format specifier %%o: %o\n", -val);
20 // format specifier x to print unsigned integer value in hexadecimal
21 printf("Format specifier %%x: %x\n", val);
22 // format specifier x to print unsigned integer value in hexadecimal
23 printf("Format specifier %%x: %X\n", val);
24 // format specifier x to print unsigned integer value in hexadecimal
25 printf("Format specifier %%x: %x\n", -val);

17 / 21
Annotated First C Programs [Prasanna Ghali]

26 // length modifier h used with format specifier d to


27 // print unsigned integer value as short int
28 printf("Format specifier %%hd: %hd\n", val);
29 // length modifier h used with format specifier d to print signed int
30 // value 65535 which is -1 as signed short int
31 printf("Format specifier %%hd: %hd\n", 65535);
32 // length modifier l used with format specifier d to print
33 // signed integer value as long int
34 printf("Format specifier %%ld: %ld\n", -20000000L);
35 // length modifier l used with format specifier u to print
36 // unsigned integer value as unsigned long int
37 // UL suffix means constant is of type unsigned long int
38 printf("Format specifier %%lu: %lu\n", 20000000UL);
39 // length modifier L used with format specifier u to print
40 // signed long int value as unsigned long int
41 // L suffix means constant is of type signed long int
42 printf("Format specifier %%lu: %lu\n", -20000000L);
43 // length modifier LL used with format specifier d to print
44 // signed long long int value as signed long long int value
45 // LL suffix means constant is of type signed long long int
46 printf("Format specifier %%lld: %lld\n", -20000000LL);
47 // length modifier ll used with format specifier d to print
48 // unsigned long long int value as unsigned long long int value
49 // LLU suffix means constant is of type unsigned long long int
50 printf("Format specifier %%llu: %llu\n", 20000000LLU);
51
52 return 0;
53 }
54

Notice that the minus sign prints on line , while the plus sign is suppressed on line . Also, format specifier i on
line behaves the same as format specifier d on line . Also, on line , format specifier u interprets value
as unsigned value . The printf statements print the following text to stdout :

1 Format specifier %d: 345


2 Format specifier %d: -345
3 Format specifier %i: -345
4 Format specifier %u: 345
5 Format specifier %u: 4294966951
6 Format specifier %o: 531
7 Format specifier %o: 37777777247
8 Format specifier %x: 159
9 Format specifier %x: 159
10 Format specifier %x: fffffea7
11 Format specifier %hd: 345
12 Format specifier %hd: -1
13 Format specifier %ld: -20000000
14 Format specifier %lu: 20000000
15 Format specifier %lu: 18446744073689551616
16 Format specifier %lld: -20000000
17 Format specifier %llu: 20000000
18

Printing multiple values


It is possible to print multiple values to stdout using a single printf statement with a format string containing
multiple format specifiers. The distance program is rewritten to print the coordinates of the source and destination
points in addition to the computed distance:

18 / 21
Annotated First C Programs [Prasanna Ghali]

1 #include <stdio.h>
2 #include "distance.h"
3
4 int main(void) {
5 double px = 0.0, py = 0.0, qx = 3.0, qy = 4.0; // input portion
6 double dist = ; // compute distance
7 printf("Distance from (%f, %f) to (%f, %f) is %f\n units",
8 px, py, qx, qy, distance(px, py, qx, qy));
9 return 0;
10 }
11

Since there are five format specifiers in the format string, five corresponding variables or expressions must follow
in the print list. The first format specifier corresponds to variable px , the second specifier corresponds to variable
py , and so on. Because values displayed with format specifier f print digits of precision to the right of the
decimal point by default, the text printed to stdout will be:

1 Distance from (0.000000, 0.000000) to (3.000000, 4.000000) is 5.000000 units


2

Reading input from user


Suppose the programmer wishes to test the distance program with various position coordinates other than
and . The programmer would have to change values of variables px , py , qx , and qy , and
then recompile test-dist.c , relink, and reexecute the program to obtain the distance for a different set of points.
Alternatively, if scanf function from C standard library is used to read position coordinates, there is no need to
recompile and relink the program; the program only needs to be reexecuted.

The new version of function main in source file test-dist-new.c looks like this:

1 #include <stdio.h>
2 #include "distance.h"
3
4 int main(void) {
5 // input portion
6 double px, py, qx, qy;
7 printf("Enter point P: ");
8 scanf("%lf %lf", &px, &py);
9 printf("Now, enter point Q: ");
10 scanf("%lf %lf", &qx, &qy);
11
12 // call to function distance
13 double dist_pq = distance(px, py, qx, qy);
14
15 // output portion
16 printf("Distance from P(%.3f, %.3f) to Q(%.3f, %.3f) is %.3f\n",
17 px, py, qx, qy, dist_pq);
18 return 0;
19 }
20

Consider the call to scanf function on line : scanf("%lf %lf", &px, &py) . The first argument of the scanf
function is a format string "%lf %lf" that specifies the types of variables whose values are to be entered from
stdin [keyboard]. The entire (and complex) list of type specifiers is provided here. A simpler list of type specifiers
is shown in the following table:

19 / 21
Annotated First C Programs [Prasanna Ghali]

From the table, the specifiers for an int variable are %i or %d ; the specifiers for a float variable are %f , %e ,
and %g ; and the specifiers for a double variable are %lf , %le , and %lg . It is critical to use the correct specifier -
don't expect help from the compiler if you use %f specifier to read the value for a double variable - your program
will fail miserably!!!

The remaining two arguments in scanf function are memory locations that correspond to the specifiers in the
control string. These memory locations are indicated with address operator & . This operator is a unary operator
[meaning it requires a single operand] that determines the memory address of the operand with which it is
associated. A common error is to omit the address operator for identifiers.

Since the values to be entered through stdin (the keyboard) are double-precision floating-points to be stored in
variables px and py , the two arguments are &px and &py . Since the program must read two values, the
numbers must be separated by at least one whitespace character which could be a space character or a tab or a
newline. Note that the number may contain a decimal point, but doesn't have to.

In order to prompt the user to enter the values, a scanf statement is preceded by a printf statement that
describes the information that the user should enter from the keyboard:

1 printf("Enter point P: ");

Compile this source file, link with distance.o and the math library to create an executable:

1 $ gcc -std=c11 -pedantic-errors -Wstrict-prototypes -Wall -Wextra -Werror -c test-dist.c -o


test-dist.o
2 $ gcc test-dist-new.o distance.o -o test-dist-new.out -lm

Run the executable program and the interaction with the user looks like this:

1 $ ./test-dist-new.out
2 Enter point P: 10.12 12.10
3 Now, enter point Q: 13.45 45.13
4 Distance from P(10.120, 12.100) to Q(13.450, 45.130) is 33.197
5 $

Things to review
1. What is an identifier? What is the legal way to write an identifier?
2. What is a keyword? List the keywords you've come across in this document.
3. What is a function?
4. What is the difference between a function declaration [or function prototype] and a function definition?
5. What are function parameters?
6. How do functions specify that they will not take any parameters? How do functions specify that they're not
returning values?
7. What is the purpose of function main in a C program?
8. What is a C comment? How many different ways can you comment a C source file?
9. What is the preprocessor?
10. What are whitespace characters? What purpose do whitespace characters serve?
11. What is a data type in the context of a programming language?
12. What is a variable?

20 / 21
Annotated First C Programs [Prasanna Ghali]

13. What is a literal value? How does it differ from a variable? What is a string literal? What are the other literals in
this tutorial?
14. Using function printf how will you print values of type int and values of type double ?
15. Using function scanf how will you print values of type int and values of type double ?
16. What does a compiler do? What is the input to a compiler and what is its output?
17. What is the purpose of a linker? What are its inputs and what does it generate as output?

21 / 21
HIGH-LEVEL PROGRAMMING I
Intro to C Programming (Part 1/3) by Prasanna
Ghali
Outline
2

 What is a Computer Program?


 What is Computer Programming?
 What is a Programming Language?
 How Computers Work?
 How Computers Store Data?
 Data Representation in Machines and C
 Machine Languages
 Assembly Languages
 Disadvantages of Low-Level Languages
 High-Level Programming Languages
 Compilers and Interpreters
What is a Computer Program?
3

 Program is specific implementation of an


algorithm in particular programming language
What is Computer Programming?
4

 Science and art of encoding algorithm into


computer program
What is a Programming Language?
5

 Framework that allows programmers to


precisely communicate their algorithms to
computers
 Syntax: What are rules of language?
 Vocabulary: alphabet, words, sentences
 Grammar: rules governing language constructs
 Symbols representing syntax have 7-bit ASCII
byte and UTF-8 encoding
 Semantics: What is the meaning of sentences?
Organization of Computer
6

 Typical organization of modern computers


based on von Neumann architecture described
in 1945: PC

r0
Memory
r1
ALU
r2
r3

Control Unit
I/O Devices

Processor/CPU
Stored-Program Computer (1/4)
7

 Earliest machines were hardwired for specific


applications

Reference
Stored-Program Computer (2/4)
8

 Computers based on von


Neumann architecture use stored-
program concept:
 Program that manipulates data is
stored in memory Reference
 Data to be manipulated by
program is also stored in memory
Instructions
Data

Processor/CPU
Memory
Stored-Program Computer (3/4)
9

PC

r0 Instructions
r1 Data
ALU
r2
r3
Memory

Control Unit
I/O Devices

Processor/CPU
Stored-Program Computer (4/4)
10

Memory

PC Accounting program
Data
(machine code)
r0
Editor program
Data
r1 (machine code)
ALU
r2 Web browser
Data
(machine code)
r3
… Data

Control Unit … Data

… Data
Processor/CPU
Bits
11

 Computers are digital electronic devices


 Digital devices represent information with

sequences of 0s and 1s
 Low voltage represents 0 while high voltage
represents 1
 Each of 0 or 1 digit is called binary digit or bit
 This is fundamental concept - computers can only
store and process information as strings of 0s
and 1s
Bits and Bytes
12

 Language of computers is binary - sequence of


0s and 1s
 Sequence of 8 bits, called byte, has become de

facto standard for unit of digital information

7 6 5 4 3 2 1 0
1 1 1 1 1 1 1 0

Most significant bit Least significant bit


Computer Memory
13

 Picture illustrates organization of computer


memory as linear array of 1000 bytes
8 bits = 1 byte
0
1
2

999
Computer Memory Capacity
14

Size (Bytes)
Name Symbol
Exponential Explicit
Kilobyte KB 210 bytes 1024
Megabyte MB 220 bytes 1,048,576
Gigabyte GB 230 bytes 1,073,741,824
Terabyte TB 240 bytes 1,099,511,627,776
What is Data Type?
15

 Data type is a set of values and set of


operations that can be applied on these values
 Set of values indicates kind of data
 Set of operations indicates what can be done with
data
CPU Data Types
16

 Recall digital devices represent and process


information using binary numbers which are
sequences of 0s and 1s
 CPUs can only represent numbers of two data
types:
 Integer

 Floating-point
Integer Data Types (1/2)
17

Type What values? Representation


Traditional binary
Unsigned Positive (incl. 0)
encoding
Signed Negative and positive 2’s complement
Integer Data Types (2/2)
18

 Because of way in which digital computing has


evolved, modern CPUs use collection of bits
grouped into units of eight:
 8 bits (byte)
 16 bits
Bit size Unsigned Signed
 32 bits
8-bit 0, 28 − 1 −27 , 27 − 1
 64 bits
16-bit 0, 216 − 1 −215 , 215 − 1
32-bit 0, 232 − 1 −231 , 231 − 1
64-bit 0, 264 − 1 −263 , 263 − 1
C/C++ Integer Data Types
19

 Data types for 64-bit C11 compiler used in this


course
 Note: Other compilers might show different
behavior for 32- and 64-bit sizes
Bit
Unsigned Signed
size
8-bit unsigned char signed char
16-bit unsigned short int signed short int
32-bit unsigned int signed int
64-bit unsigned long int signed long int
64-bit unsigned long long int signed long long int
Floating-Point Types (1/2)
20

 Floating-point types useful for representing very small


and very large numbers, but not precisely
 Floating-point values represented in IEEE 754 format
𝑝
 Rational numbers where 𝑝 and 𝑞 are integers are
𝑞
represented as (−1)𝑠 × 𝑚 × 2𝑒 where 𝑠 is sign, 𝑚 is
fixed bit length fraction (mantissa), and 𝑒 is exponent
 Term floating-point refers to fact that these numbers
can move binary point in rational number to adjust
precision
 Precision (how many fractional digits?) is used to
distinguish floating-point values
Floating-Point Types (2/2)
21

 Floating-point representation is (−1)𝑠 × 𝑚 × 2𝑒


 SP means single-precision with 32-bits
 DP means double-precision with 64-bits
 EP means extended-precision with 128-bits
Type Sign bit Mantissa bits Exponent bits
SP 1 23 8
DP 1 52 11
EP 1 112 15
Type Smallest value Largest value
SP ±1.175494351 × 10−38 ±3.40282346 × 1038
DP ±2.2250738585072014 × 10−308 ±1.7976931348623158 × 10308
C Floating-Point Types
22

 C has corresponding equivalent types:


Bit size C type
32-bits float
64-bits double
128-bits long double

 Many languages (C, C++, Python) use double


as their basic data type for representing
rational numbers
Integer vs Floating-Point (1/2)
23

 General rule of thumb for programmers: prefer integer


numbers; use floating-point numbers and arithmetic with
caution
 Integer types encode relatively small range of values, which are
exact
 Floating-point types encode large range of values, but only
approximately
 Cannot exactly represent many values such as 0.1, 0.2, … which are
then either rounded up or down to nearest representable number
 May not obey arithmetic rules because of rounding
 Good precision for numbers around zero; precision decreases for
larger numbers
 Single-precision have precision of about 6 decimal digits
 Double-precision have precision of about 15 decimal digits
Integer vs Floating-Point (2/2)
24

 Best advice from me to avoid pain and


suffering:
 For integer values, use 32-bit signed int type
 Because of rules used by programming languages when
signed int and unsigned int values are
mixed, results can be unexpected
 To avoid surprises, stick to signed int even if you
expect numbers to be only positive
 For
fractional values, use 64-bit double-precision
double type
Programming Languages:
25
Classification
 Programming languages can be
classified in many different ways
 We’ll broadly classify in two

ways: low-level languages and


high-level languages

Reference
Instruction Set Architecture (1/2)
26

 CPU hardwired by computer


PC
architect with set of basic instructions
called Instruction Set r0
r1
 Instruction is represented as sequence ALU
of 0s and 1s r2
r3
 Instruction referred to as operation
code or opcode
 Things or values that instruction works Control Unit

on are called operands


Processor/CPU
Instruction Set Architecture (2/2)
27

 Machine instructions generally fall into 3


categories: PC

 Data movement: Load, Store, Move r0

 Control flow: Branch, Jump, Goto r1


ALU
 Arithmetic and Logic: Add, Sub, Mul, Div, r2
And, Or, … r3

 Computer architects implement digital


circuitry in: Control Unit
 Control Unit to interpret these instructions
 ALU to execute these instructions Processor/CPU
Fetch-Decode-Execute Cycle (1/12)
28

 Instruction cycle (also known as Fetch-Decode-


Execute cycle) is basic operational process of
CPU
Fetch-Decode-Execute Cycle (2/12)
29

x = 5
Code snippet in some y = 3
machine language z = x + y

PC

r0
r1
ALU
r2
r3

Control Unit Memory


Processor/CPU
Fetch-Decode-Execute Cycle (3/12)
30

x = 5
Code snippet in some y = 3
z = x + y Code snippet
machine language
transferred to
memory
PC
x=5
r0
y=3
r1
ALU z=x+y
r2 …
r3 Instructions Data

Control Unit Memory


Processor/CPU
Fetch-Decode-Execute Cycle (4/12)
31

1. Code begins execution


2. Memory storage assigned for
variables x, y, and z
PC
x=5 x
r0 Named
y=3 y
memory
r1 z
ALU z=x+y locations
r2 … …
r3 Instructions Data

Control Unit Memory


Processor/CPU
Fetch-Decode-Execute Cycle (5/12)
32

PC
x=5 5 x
r0 5
y=3 y
r1 z
ALU z=x+y
r2 … …
r3 Instructions Data

Control Unit Memory


Processor/CPU
Fetch-Decode-Execute Cycle (6/12)
33

PC
x=5 5 x
r0 3
y=3 3 y
r1 z
ALU z=x+y
r2 … …
r3 Instructions Data

Control Unit Memory


Processor/CPU
Fetch-Decode-Execute Cycle (7/12)
34

PC
x=5 5 x
r0 3
y=3 3 y
r1 z
ALU z=x+y
r2 … …
r3 Instructions Data

Control Unit Memory


Processor/CPU
Fetch-Decode-Execute Cycle (8/12)
35

PC
x=5 5 x
r0 5
y=3 3 y
r1 z
ALU z=x+y
r2 … …
r3 Instructions Data

Control Unit Memory


Processor/CPU
Fetch-Decode-Execute Cycle (9/12)
36

PC
x=5 5 x
r0 5
y=3 3 y
r1 3
ALU z=x+y z
r2 … …
r3 Instructions Data

Control Unit Memory


Processor/CPU
Fetch-Decode-Execute Cycle
37
(10/12)

PC
x=5 5 x
r0 5
y=3 3 y
r1 3
ALU z=x+y z
r2 … …
r3 Instructions Data

Control Unit Memory


Processor/CPU
Fetch-Decode-Execute Cycle
38
(11/12)

PC
x=5 5 x
r0 5
y=3 3 y
r1 3
ALU z=x+y z
r2 8 … …
r3 Instructions Data

Control Unit Memory


Processor/CPU
Fetch-Decode-Execute Cycle
39
(12/12)

PC
x=5 5 x
r0 5
y=3 3 y
r1 3
ALU z=x+y 8 z
r2 8 … …
r3 Instructions Data

Control Unit Memory


Processor/CPU
Machine Languages
40

 Recall that computer architects provide CPUs


with set of basic machine instructions
represented in numbers called Instruction Set
 Instruction set of CPU with additional tools
called Machine Language
 Unique to each CPU family (x86, PowerPC, ARM,
…)
 Fixed-width patterns of 1’s and 0’s correspond to
opcodes and operands
Machine Language: Example
41

 Euclid’s GCD algorithm:


 Tocompute greatest common divisor of integers a and
b, check to see if a and b are equal. If so, print one of
them and stop. Otherwise, replace the larger one by
their difference and repeat.
 Machine code (Intel x86) for Euclid’s GCD
algorithm looks like this:
55 89 e5 53 83 ec 04 83 e4 f0 e8 31 00 00 00 89 c3 e8 2a 00
00 00 39 c3 74 10 8d b6 00 00 00 00 39 c3 7e 13 29 c3 39 c3
75 f6 89 1c 24 e8 6e 00 00 00 8b 5d fc c9 c3 29 d8 eb eb 90
Machine Languages: Disadvantage
42

 Earliest computers could only be programmed


in their machine languages
 Programming was tedious, cumbersome, and error-
prone process
Assembly Language (1/2)
43

 BIG IDEA: Abstraction


 When something is hard, use abstraction to hide
complexity
 Build hierarchical layers with each lower layer hiding
details from the layer above
 Use simpler intermediate language to provide
abstraction between low-level languages and
programmer
Assembly Language (2/2)
44

 Assembly language is first intermediate


language to be invented
 Provides English-like mnemonics to replace binary
numbers in machine language programs
 Example: machine instruction 01110011
00001001 00000011 specified as ADD r9, r3
 Translator called assembler now required to
convert assembly code into machine code
Assembly Language: Example
45

 Assembly code (Intel x86) for Euclid’s GCD:


pushl %ebp A: cmpl %eax, %ebx
movl %esp, %ebp jle D
pushl %ebx subl %eax, %ebx
subl $4, %esp B: cmpl %eax, %ebx
andl $-16, %esp jne A
call getint C: movl %ebx, (%esp)
movl %eax, %ebx call putint
call getint movl -4(%ebp), %ebx
cmpl %eax, %ebx leave
je C ret
D: subl %ebx, %eax
jmp B
Low-Level Languages:
46
Disadvantages (1/2)
 Programming in low-level languages is
machine-centered enterprise
 Instructionscan only be specified at machine level
 For example, mathematicians cannot express
solutions to numerical problems using mathematical
functions such as 𝑎 × sin 2 × 𝜋 + 𝑏 /𝑐
 Difficult to support data types not native to
machine
Low-Level Languages:
47
Disadvantages (2/2)
 Low-level programs are not portable
 Each CPU family has to be programmed in its own
machine or assembly language
 Expensive and error-prone as CPUs evolve and
competing designs are developed
Evolution of Computing (1/2)
48

 Complexity of CPUs continues to grow at


quantum leap
 Difficult for humans to keep track of wealth of
details
 Computers progressively being used to solve
problems of increasing complexity
 Advanced algorithms, more complex data structures
become difficult to implement in low-level
languages
Evolution of Computing (2/2)
49

 Programmers began to wish for machine


independent languages
 Idea of abstraction again came to the rescue

 Why not create high-level languages to


abstract away low-level machine details?
 Programmers need not work directly with nor
worry about registers, memory addresses, …
Trend Towards High-Level
50
Languages
 Thousands of high-level languages invented
since 1950s such as COBOL, FORTRAN,
ALGOL, Forth, Ada, C, C++, Java, …
High-Level Languages (1/2)
51

 Term high in high-level language means


 closer to way humans think
 closer to problems being solved

 Uses English-like mnemonics for groups of


actions and data
a = sqrt(b);
 if (ammo > 0)

fire_weapon();
High-Level Languages (2/2)
52

 Easier to tackle complex problems


 Programmers can spend more time on high-level
concepts such as algorithm design
 Data can be expressed in hierarchy of data types
derived from built-in machine data types
 Programs are portable
 Program written in high-level language can be
translated for different machines
 Programming becomes more accessible
C/C++ and Python: Examples
53

 Euclid’s algorithm for GCD: To compute the greatest common


divisor of integers a and b, check to see if a and b are
equal. If so, print one of them and stop. Otherwise, replace
the larger one by their difference and repeat.
// C/C++ code for GCD
int gcd(int a, int b) { # Python code for GCD
while (a != b) { def gcd(a, b):
if (a > b) { while a != b:
a = a - b; if a > b:
} else { a = a - b
b = b - a; else:
} b = b - a
} return a
return a;
}
Translators
54

 Compilers and interpreters are translators to


convert programs written in high-level
language into machine language
Compilers: C, C++, …
55

 Compiler translates source program (written in


high-level language) into target program
(usually in machine language) using a number
of phases Source Program

Compiler translates Compiler is machine


Compiler
and then goes away language program

At later time, user


tells OS to run Input Target Output
target program Program
Interpreters: Python, JavaScript
56

 Interpreter provides virtual machine that:


 Reads one high-level statement at a time
 Converts statement into machine language
instructions
 Has CPU execute these instructions

Source Program
Interpreter Output
Input
No target program exists
Compilers vs. Interpreters (1/2)
57

 In general, interpretation leads to


 Greater flexibility – program can generate new
pieces of itself and execute them on the fly
 Better diagnostics – because interpreter is
executing source code directly, it can provide
debug information
 Compilation, by contrast, leads to better
performance
 Ingeneral, a decision made at compile time is a
decision that doesn’t need to be made at run time
Compilers vs. Interpreters (2/2)
58

 While conceptual differences are clear, most


language implementations include mixture of
both
 Java, C#

Source
Translator
Program

Intermediate
Program Virtual Machine Output
Input
Summary (1/2)
59

 Typical organization of modern computers based on von


Neumann architecture of stored-program concept
 Language of computers is binary - sequence of 0s and 1s
 Modern CPUs use collection of bits grouped into units of eight
 Sequence of 8 bits, called byte, has become de facto standard for unit of
digital information
 16-bits, 32-bits, 64-bits
 Data type is a set of values and set of operations that can be
applied on these values
 CPUs can only represent numbers which are either integer or
floating-point data types
 Integer: signed and unsigned 8-, 16-, 32-, and 64-bits
 Floating-point: single-, double-, and extended-precision
 In this course, use signed int and double as standard
data types for integer and floating-point data types
Summary (2/2)
60

 Programming languages broadly classified as low-level and high-


level languages
 Machine languages use Instruction Set Architecture that consists of
instructions hardwired into CPU
 Instruction consists of opcode and operands
 Instructions executed using Fetch-Decode-Cycle
 Assembly language use mnemonics; assembler is required to convert
assembly code to machine lnaguage
 Using low-level languages is tedious, cumbersome, error-prone, and beyond
capability of humans
 High-level languages provide portability and higher levels of abstraction
from underlying machine
 Syntax and semantics
 Compilers (C, C++) and interpreters (Python, JavaScript) are translators of high-
level language code into machine language
 Some languages (Java, C#) use combination of compilation and interpretation

You might also like