0% found this document useful (0 votes)
9 views

Intro To C - Module 2

Uploaded by

Andrew Fu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Intro To C - Module 2

Uploaded by

Andrew Fu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction to C: Module 2 (August 26, 2024)

Weekly Reading: Beej Guide, Chapter 3-4

Anatomy of a C Program

Let's look at a C Program.

#include <stdio.h> // @A

// for now, ignore this function.


void put_int(int x) {
printf("%d\n", x);
}

void say_hello(char* name, int age) { // @B


puts("Hello"); // @C
puts(name);
puts("You were born in ");
put_int(2024 - age);
}

int main() { // @D
say_hello("Bob", 30); // @E
return 0; // @F
}

This is a working, complete program. When compiled, it generates an executable that can be
run to print:

Hello
Bob
You were born in
1994

to the console. The output ain't pretty (we'll get back to that) but we see that the program does
something useful.

At the top of the file, we have #include directives (@A) that tell the compiler to include functions
from stdio ("standard I/O") in the standard library. Technically, it only needs to include (and only
does include) header files that specify the type signatures—the functions themselves are linked
in later—but ignore the distinction for now. A "helper" function called say_hello is declared (@B)
and it uses the puts function (@C) to print strings to the console.
The entry point of any executable program is a function named (@D) main. It returns an integer
(or int) that we will ignore for now; in this program, it takes no arguments. This main calls (@E)
its helper function with a string—for now, interpret char* that way—and an int, and, after doing
so, returns (@F) an exit status of 0, indicating that nothing exceptional—i.e., no error
conditions—occurred.

Exceptions exist, but most executables will return 0 if and only if they run without errors and
nonzero values to indicate what went wrong. In Linux, exit codes must fit in an 8-bit integer (0 to
255) and higher bytes are ignored.

How Functions Work

Thus far, function declarations behave similarly to those in Java; the main difference from a
language Python is that they require type information. A function declaration of:

double square_root(int x) {
// ...
}

means that square_root takes an int (integer) and returns a double—a 64-bit floating-point
number. This is backwards relative to modern notation; in a language like OCaml or Scala, we'd
write it as "int -> double"; but in C, the return type is written first.

A function that doesn't return a value—it is used only for effects—is given the return type void.
Above, we have a say_hello function that writes to the console, but we don't need any
computation back from it, so we have it "return void".

The program above isn't very interesting. Let's now write a program we can interact with.

#include "stdio.h"
#include "stdlib.h"

int square(int x) {
return x * x;
}

int main(int argc, char** argv) { // @G


int default_val = 137; // @H
int input = default_val;

if (argc > 1) { // @I
input = atoi(argv[1]);
}
int result = square(input);
printf("%d squared is %d\n", input, result);
return 0;
}

We can compile it...

$ gcc -o square_demo square_demo.c

and run it:

$ ./square_demo 15
15 squared is 225

$ ./square_demo
137 squared is 18769

Our main function (@G) has a different signature here: it takes an integer argc (as in "argument
count") and a char**—for now, think of it as an array of strings—called argv (as in "argument
values"). Whenever we run a program, this is where the operating system puts our
command-line arguments. The number of them, including the executable's name, is given as
argc, and they are listed in argv.

Arrays are zero-indexed. Therefore, if argc is 2, you can legally access argv[1], but not
argv[2]; however, note that we get the user's first argument in argv[1], the second position.
That's because argv[0] is used for the name of the executable, which is not what we need.

As I said before, C doesn't check bounds for array-access—you have to do that. We set a
default value (@H) of 137 for the number to be squared and, using conditional execution (@I) to
make sure argv[1] exists before reading it, replace the default if it does. Since command-line
arguments are strings, not numbers, we use the atoi (ASCII-to-int) function (from "stdlib.h")
to extract the int.

All Base Types Are Numbers

The four most important base types in C are char, int, double, and size_t.

A char is a one-byte integer type; it's misnamed because:


● characters these days can be longer than one byte.
● it has use cases other than text (for which it is no longer adequate.)
You should only ever put numbers in the range [0...127] (i.e., 7-bit positive integers) into char; it
is implementation-defined whether char is signed (range [-128...127]) or unsigned (range
[0...255].) Either way, it is large enough to deal with ASCII characters—or, equivalently, for tiny
positive integers—but should not be used for anything else.

For example, if you're working with generic binary data (byte arrays) then you should prefer
unsigned char or, better yet, the uint8_t that becomes available if you #include
"inttypes.h".

Because char is an integer type, it is legal to write:

char x = 65; // prefer when you're working with tiny numbers

or:

char x = 'A'; // prefer when you're working with text

or:

unsigned char x = 0x41; // prefer when working w/ binary data

They are all equivalent—'A' is just an alias for 65 (hexadecimal 41).

The int type is inescapable. It's an integer, obviously. Of what size? That's less obvious. It is
guaranteed to be at least 2 bytes—therefore, it will handle all values between -32,767 and
+32,767—and probably -32,768, but that's another discussion—but, on most systems, it will use
4 bytes instead. Unsigned integers have "odometer" or modular behavior in case of
overflow—UINT_MAX + 1 == 0—whereas signed integers have... well, we'll discuss it in more
detail in the next module, but it's bad. Don't overflow a signed integer, ever.

If, for the sake of portability, a fixed size is necessary, then use types like int16_t or uint32_t
from <inttypes.h>. When you use int, you're letting the system and the compiler decide. If
that's not what you want, be explicit.

The floating-point types, which correspond to 32- and 64-bit floating-point numbers, are float
and double. Your default tool should be the double-precision floating point number; smaller
floats are used in graphics and machine learning, but they won't be covered here.

Finally, size_t is an unsigned integer that exists to contain the size of an object (such as an
array) in bytes. Objects can be very big, so size_t is usually 8 bytes, or 64 bits, capable of
holding any number from 0 to 18,446,744,073,709,551,615 (264 - 1).
You can learn what these sizes are on your system by using the sizeof operator, which
determines at compile time the size of a type or value.

#include <stdio.h>

int main() {
printf("Size of char: %lu\n", sizeof(char));
printf("Size of int: %lu\n", sizeof(int));
printf("Size of size_t: %lu\n", sizeof(size_t));
return 0;
}

Your results, on a modern desktop computer, will very likely be identical to mine:

$ ./m2_sizes
Size of char: 1
Size of int: 4
Size of size_t: 8

This is probably, for most, an excruciatingly boring detail, and it's one you never have to think
about when using, for example, Python. Python's int type is unbounded—you can use the
interpreter to evaluate 17 ** 400 (17400, a 1635-bit number) and it will give you the correct
answer. Whenever there is risk of overflow, it detects the fact and silently promotes integers
from fast 64-bit integers to a "bignum" data structure that can store arbitrarily large
ones—sacrificing speed, but retaining correctness. Overflow detection, however, is
expensive—if you're writing high-performance code, and know your integer variables are never
going to exceed 264, you don't want it. C's default is to give you the fastest, most unsafe
option—if you want checked arithmetic, write it.

Control Flow

Conditional execution is similar to that in other languages. This Python code, for example:

if x < 100:
a()
elif x < 200:
b()
else:
c()

would be written in C like so:

if (x < 100) {
a();
} else if (x < 200) {
b();
} else {
c();
}

There is no elif, parentheses around branching conditions are mandatory, and blocks are
separated by curly braces rather than indentation, but they are conceptually the same.

Loops, likewise, are similar to those in Python, so:

while x < 0:
f()
g()

has the C counterpart of:

while (x < 0) {
f();
g();
}

Historically, C used integer types in lieu of booleans—you will often see while (1) as a
counterpoint to Python's while True—with 0 representing falsity and all nonzero values
indicating truth. These days, however, you should use the bool type and the constants true
and false that become available if you #include "stdbool.h".

C does not have the list comprehension semantics of Python's for; the behavior of the
for-loop, which will be familiar to Java users, is like so:

for (int i = 0; i < 100; i++) { // for (initialization; test; increment)


f(i); // loop body
}

is equivalent to:

int i = 0;
while (i < 100) {
f(i);
i++;
}

where i++ ("increment operator") is shorthand i += 1 or i = i + 1.


C has break and continue with identical semantics to their counterparts in Python—break
exits the nearest enclosing loop immediately, while continue returns to the top of it. Thus, the
following loops

while (a < 100) {


f();
}

and

while (true) { // or: while (1)


if (a >= 100) break;
f();
}

are equivalent. From a stylistic perspective, prefer the former, as it's more idiomatic and
therefore easier to understand. There are some who avoid break and continue; I will say that
they are occasionally useful but should be used with caution—I almost never resort to the latter.

C has an idiom you might not have seen in other languages—the do-while loop, which places
the test at the end of the block, ensuring that the loop runs at least once. Thus:

do {
f();
} while (a < 100);

is equivalent to:

while (true) {
f();
if (a >= 100) break;
}

or

f();
while (a < 100) {
f();
}

Note that the do-while construction requires a semicolon at the end, whereas ordinary
while-loops do not.
The switch statement will be familiar to Java users, but new to Python programmers. It
evaluates an integer expression and dispatches on the value. For example:

switch (suit) {
case 0:
printf("Diamond\n");
break;
case 1:
printf("Spade\n");
break;
case 2:
printf("Heart\n");
break;
case 3:
printf("Club\n");
break;
}

If suit is 2, then this block will print "Heart" to the console. The break statement is
necessary; if it weren't there, then execution would "fall through" to execute the case 3
code—in this way, switch blocks can be thought of as a limited version of the hated "goto"
statement (which exists, but is rarely used, in C.) When no case applies—say that, above, suit
were 4—then there is no match and the block does nothing; however a default case can be
added as a catch-all.

In Python, for this you use a cascading if-else tree:

if suit == 0:
print('Diamond')
elif suit == 1:
print('Spade')
elif suit == 2:
print('Heart')
elif suit == 3:
print('Club')

and, to be honest, there's nothing wrong with this. The advantage of a switch/case block is
performance. The if/else cascade requires a check for each value, and is O(n) in the number
of cases, whereas a compiler can, when it is faster to do so, turn a switch/case block into a
jump table.

Week 2 Questions
2.1: What do you think the following loop does? Make your guess, then run a program to test
your intuition.

unsigned int i = 1;
while (i > 0) {
i++;
}

2.2: Would you expect the behavior of the loop above to change if a regular (signed) int were
used? Can you think of any circumstances in which it might not do that?

2.3: Investigate sizeof. Is this a function, or something else? If it's not a function, then how is it
different?

Week 2 Project

Build a command-line tool that sums its arguments, which will all be integers, and prints the
result to the console. An example session might look like this:

$ ./sum
0
$ ./sum 1 2 3
6
$ ./sum 61 18 -42 100
137

All arguments will be between -1000 and 1000, inclusive; there can be as many as 100 of them.

2.4: Include your code for this project in your PDF.

Writeup (Due September 5)

Please submit your answers to 2.1-2.4 by PDF in Canvas. Your answers to the first three
questions should fit within one page. Include 2.4 on a separate page(s).

You might also like