Cs136 Post
Cs136 Post
Data Abstraction
Official calendar entry: This course builds on the techniques and
patterns learned in CS 135 while making the transition to use of an
imperative language. It introduces the design and analysis of
algorithms, the management of information, and the programming
mechanisms and methodologies required in implementations.
Topics discussed include iterative and recursive sorting algorithms;
lists, stacks, queues, trees, and their application; abstract data types
and their implementations.
Web page:
http://www.student.cs.uwaterloo.ca/~cs136/
While time is spent learning some of the C syntax, this is not a “learn
C” course.
See the website and attend tutorials for how to use Seashell.
CS 136 Fall 2018 01: Introduction 7
Course materials
Textbooks:
Course notes:
Available on the web page and as a printed coursepack from
media.doc (MC 2018).
• You can show your code to others to help them (or to get help),
but copying code is not allowed
(electronic transfer, copying code from the screen, printouts,
etc.)
If you submit any work that is not your own, you must still cite
the origin of the work in your source code.
You must log into Marmoset to view your private test results
(after the deadline).
Not all learning goals can be achieved just by listening to the lecture.
Some goals require reading the text or using Seashell to complete
the assignments.
(4 * 5) / 2 ⇒ 10
4 * (5 / 2) ⇒ 8
-5 / 2 ⇒ -2
†
C99 standardized the “(round toward zero)” behaviour.
9 % 2 ⇒ 1
9 % 3 ⇒ 0
9 % 5 ⇒ 4
Braces ({}) indicate the beginning and end of the function body,
known in C as the function block.
The types of the parameter and the produced (“returned” ) value are
both dynamic.
C uses static typing: all types must be known before the program
is run and they cannot change.
apply ⇒ call
consume ⇒ pass
produce ⇒ return
int my_num(void) {
return my_add(40, 2);
}
You should follow the course style. The course staff and markers
may not understand your code if it is poorly formatted.
The style we have chosen is the most widely accepted style for C
(and C++) projects (e.g., it conforms to the Google style guide).
Every C program must have one (and only one) main function.
†
main has optional parameters (discussed in Section 13).
int main(void) {
//...
return 0; // this is optional
}
(define (my-sqr n)
(* n n))
2
49
// my C program
1 + 1; // INVALID
int my_sqr(int n) {
return n * n;
}
my_sqr(7); // INVALID
1 + 1 => 2
my_sqr(7) => 49
You can leave the tracing in your code. It is ignored in our tests
and does not affect your results (no need to comment it out).
// My first C program
int my_sqr(int n) {
return n * n;
}
int main(void) {
trace_int(1 + 1);
trace_int(my_sqr(7));
}
An expression produces zero (0) for “false”, and one (1) for “true”.
negation !
multiplicative * / %
additive + -
and &&
or ||
is:
assert(my_sqr(7) == 49);
#include "cs136.h"
int my_sqr(int n) {
return n * n;
}
int main(void) {
assert(my_sqr(0) == 0);
assert(my_sqr(1) == 1);
assert(my_sqr(2) == 4);
assert(my_sqr(32) == 1024);
assert(my_sqr(-1) == 1);
assert(my_sqr(-32) == 1024);
}
// my_divide(x, y) ....
// requires: y is not 0
// my_function(x, y, z) ....
// requires: x is positive
// y < z
bool is_even(int n) {
return (n % 2) == 0;
}
bool my_negate(bool v) {
return !v;
}
int my_abs(int n) {
if (n < 0) {
return -n;
} else {
return n;
}
}
There can be more than one return in a function, but only one
value is returned. The function “exits” when the first return is
reached.
int sum_first(int n) {
if (n <= 0) {
return 0;
} else {
return n + sum_first(n - 1);
}
}
C’s if statement does not produce a value: it only controls the “flow
of execution” and cannot be similarly used within an expression.
For example:
(v >= 0) ? v : -v // abs(v)
(a > b) ? a : b // max(a, b)
You may use the ?: operator in this course, but use it sparingly.
Overuse of the ?: operator can make your code hard to follow.
(define n 5)
(add1 n) ; => 6
n ; => 5
Similarly, (sort lon) returns a new list that is sorted, but the
original list lon does not change.
(f 10)
=> 40
(mystery 10)
=> ???
CS 136 Fall 2018 03: Imperative C 5
Racket (implicitly) always uses the begin special form:
(define (mystery n)
(begin ; explicitly showing begin
(/ n 2)
(+ n 3)
(* n 4)))
A side effect changes the state of the program (or “the world”).
(scary 10)
=> 40
(define (scary n)
(begin
(turn-the-lights 'off)
(play-mp3 "scary.mp3")
(shout "Boo!")
(* n 4)))
A program may also interact with non-human entities such as: a file,
a GPS, a printer or even a different computer on the internet.
#include "cs136.h"
int main(void) {
printf("Hello, World");
}
Hello, World
int main(void) {
printf("Hello, World");
printf("C is fun!");
}
†
Blocks can also contain local variable definitions, which are not
statements.
printf("Hello, World\n");
printf("C is\nfun!\n");
Hello, World
C is
fun!
Similarly,
int quiet_sqr(int n) {
return n * n;
}
// noisy_sqr(n) squares n
// effects: produces output
int noisy_sqr(int n) {
printf("I'm squaring %d\n", n);
return n * n;
}
int main(void) {
assert(quiet_sqr(3) == 9);
assert(quiet_sqr(7) == 49);
assert(noisy_sqr(3) == 9);
assert(noisy_sqr(7) == 49);
}
• With the [RUN] button, input is read from the keyboard. Any
output is displayed in the console (“screen”).
• With the [I/O TEST] button, input is read from input file(s)
(e.g., testfile.in) instead of the keyboard.
If a corresponding output test file exists
(e.g., testfile.expect), Seashell checks the output
against the expected output test file to see if they match.
int main(void) {
assert(noisy_sqr(3) == 9);
assert(noisy_sqr(7) == 49);
}
You should always use our tracing tools to help debug your code.
int quiet_sqr(int n) {
trace_msg("quiet_sqr was called");
trace_int(n);
return n * n;
}
int main(void) {
trace_msg("main started");
assert(quiet_sqr(3) == 9);
trace_int(quiet_sqr(7));
}
"main started"
"quiet_sqr was called"
n => 3
"quiet_sqr was called"
n => 7
quiet_sqr(7) => 49
When your [RUN] your code, the two streams may appear
mixed together in the screen (console) output.
void say_hello(void) {
printf("hello!\n");
return; // this is optional
}
printf("hello!\n");
int main(void) {
noisy_sqr(3);
printf("These are ");
printf("expression ");
printf("statements.\n");
10 + 3;
}
The five values from the previous example are never used.
• expression statements
for generating side effects
We have seen how a side effect changes the state of a program (or
“the world”). For example, printf changes the state of the output.
int my_variable = 7;
The equal sign (=) and semicolon (;) complete the syntax.
int main(void) {
int my_local_variable = 11;
//...
}
Global variables are defined outside of functions (at the “top level”).
For now, always make sure you define your variables above any
code that references them.
Variables with the same name can shadow other variables from
outer scopes, but this is obviously very poor style. The following
code defines three different variables named n.
int n = 1;
int main(void) {
trace_int(n); // n => 1
int n = 2;
trace_int(n); // n => 2
{
int n = 3;
trace_int(n); // n => 3
}
}
int main(void) {
int m = 5;
trace_int(m);
m = 6; // mutation!
trace_int(m);
}
m => 5
m => 6
m = m + 1;
• The RHS must be an expression with the same type as the LHS.
• The variable on the LHS is changed (mutated) to store the value
of the RHS. In other words, the RHS value is assigned to the
variable.
x = y;
y = x;
x = 0;
if (x = 1) {
printf("disaster!\n");
}
Both initialization and assignment use the equal sign (=), but they
have different semantics.
n = 6; // assignment operator
The distinction is not too important now, but the subtle difference
becomes important later.
int f(int n) {
int k = 3;
k = k + 1;
return n * k;
}
The function f mutates the local variable k, but it does not modify
any global variables or use any I/O.
int g(int n) {
n = n * 4;
return n;
}
As with the previous example, the function g itself does not have any
side effects.
int main(void) {
int x = 10;
trace_int(x);
trace_int(g(x));
trace_int(x);
}
x => 10
g(x) => 40
x => 10
int increment(void) {
count = count + 1;
return count;
}
int main(void) {
trace_int(increment());
trace_int(increment());
trace_int(increment());
}
increment() => 1
increment() => 2
increment() => 3
int n = 10;
int addn(int k) {
return k + n;
}
int main(void) {
trace_int(addn(5));
n = 100;
trace_int(addn(5));
}
addn(5) => 15
addn(5) => 105
The difference between x++ and ++x and the relationship between
their values and their side effects is tricky (see following slide).
x = 5
j = ++x; // j = 6, x = 6
In this course, the term “variable” is used for both variable and
constant identifiers.
int count_even_inputs(void) {
int n = read_int();
if (n == READ_INT_FAIL) {
return 0;
} else if (n % 2 == 0) {
return 1 + count_even_inputs();
} else {
return count_even_inputs();
}
}
int main(void) {
printf("%d\n", count_even_inputs());
}
To indicate that there is no more input, press the [EOF] (End Of File)
button, or type Ctrl-D.
test1.in test1.expect
1 5
2
2
3
4
4
5
6
One of the great features of the [I/O TEST] in Seashell is that you
can add multiple test files.
test2.in test2.expect
1 3 3 7 0
test3.in test3.expect
6 6 6 3
The input:
1
2
3
4
5
and:
1 2 3
4 5
The first read_int() reads in the first int, but then that value is
now “lost” . The next read_int() reads in the second int, which
is not likely the desired behaviour.
4 23skidoo 57
• produce output
• read input
• mutate a global variable
If the side effect does not always occur, preface it with “may” in your
contract.
// effects: reads input
// may produce output
// may mutate secret
void update_secret(void) {
int n = read_int();
if (n == READ_INT_FAIL) {
printf("error: could not read in number\n");
} else {
secret = n;
}
}
(+ 2 (my-sqr (+ 3 1)))
=> (+ 2 (my-sqr 4))
=> (+ 2 (* 4 4))
=> (+ 2 16)
=> 18
• control flow
• memory
• function calls
• conditionals (i.e., if statements)
• iteration (i.e., loops)
int f(int x) {
return 2 * x + g(x);
}
int main(void) {
int a = f(2);
//...
}
The syntax of if is
if (expression) statement
if (expression) {
statement(s)
} else if (expression) {
statement(s)
} else if (expression) {
statement(s)
} else {
statement(s)
}
int sum(int k) {
if (k <= 0) {
return 0;
}
return k + sum(k - 1);
}
The difference is, while repeatedly “loops back” and executes the
statement until the expression is false.
Like with if, you should always use braces ({}) for a compound
statement, even if there is only a single statement.
variable value
i 2
⇒ int i = 2;
while (i >= 0) {
printf("%d\n", i);
--i;
}
OUTPUT:
// recursion // iteration
int sum(int k) { int sum(int k) {
if (k <= 0) { int s = 0;
return 0; while (k > 0) {
} s += k;
return k + sum(k - 1); --k;
} }
return s;
}
******
*****
****
***
**
*
do {
printf("try to guess my number!\n");
guess = read_int();
} while (guess != my_number && guess != READ_INT_FAIL);
while (1) {
n = read_int();
if (n == READ_INT_FAIL) break;
//...
}
is equivalent to
for (i = 100; i >= 0; --i) {
printf("%d\n", i);
}
// Counting up from 1 to n
for (i = 1; i <= n; ++i) {...}
This is very convenient for defining a variable that only has local
(block) scope within the for loop.
You can use the comma operator (,) to use more than one
expression in the setup and update statements of a for loop.
See CP:AMA 6.3 for more details.
for (i = 1, j = 100; i < j; ++i, --j) {...}
address contents
0x00000 00101001
0x00001 11001101
... ...
0xFFFFE 00010111
0xFFFFF 01110011
int n = 0;
trace_int(sizeof(int));
trace_int(sizeof(n));
sizeof(int) => 4
sizeof(n) => 4
In this course, you should only use int, and there are always
32 bits in an int.
int n = 0;
n int 4 0x5000
C updates the contents of the 4 bytes to store the initial value (0).
You should never assume what the value of an int will be after
an overflow occurs.
trace_int(bil);
trace_int(four_bil);
trace_int(nine_bil);
There are only 28 (256) possible values for a char and the range of
values is (−128 . . .127) in our Seashell environment.
Because of this limited range, chars are rarely used for calculations.
As the name implies, they are often used to store characters.
The only control character we use in this course is the line feed (10),
which is the newline \n character.
CS 136 Fall 2018 04: C Model 53
/*
32 space 48 0 64 @ 80 P 96 ` 112 p
33 ! 49 1 65 A 81 Q 97 a 113 q
34 " 50 2 66 B 82 R 98 b 114 r
35 # 51 3 67 C 83 S 99 c 115 s
36 $ 52 4 68 D 84 T 100 d 116 t
37 % 53 5 69 E 85 U 101 e 117 u
38 & 54 6 70 F 86 V 102 f 118 v
39 ' 55 7 71 G 87 W 103 g 119 w
40 ( 56 8 72 H 88 X 104 h 120 x
41 ) 57 9 73 I 89 Y 105 i 121 y
42 * 58 : 74 J 90 Z 106 j 122 z
43 + 59 ; 75 K 91 [ 107 k 123 {
44 , 60 < 76 L 92 \ 108 l 124 |
45 - 61 = 77 M 93 ] 109 m 125 }
46 . 62 > 78 N 94 ^ 110 n 126 ~
47 / 63 ? 79 O 95 _ 111 o
*/
letter_a in decimal: 97
ninety_seven in decimal: 97
C also has a double type that is still inexact but has significantly
better precision.
trace_int(p.x);
trace_int(p.y);
p.x => 3
p.y => 4
p = q;
p.x = 23;
trace_int(p.x);
trace_int(p.y);
p.x => 23
p.y => 4
p.x = 5; // VALID
p.y = 6;
// alternatively:
struct posn new_p = {5, 6};
p = new_p;
Also, printf only works with elementary types. You have to print
each field of a structure individually:
struct posn {
int x;
int y;
};
struct s1 { struct s2 {
char c; char c;
int i; char d;
char d; int i;
}; };
trace_int(sizeof(struct s1));
trace_int(sizeof(struct s2));
sizeof(struct s1) => 12
sizeof(struct s2) => 8
Code
Read-Only Data
Global Data
Heap
Stack
This machine code is then placed into the code section of memory
where it can be executed.
• First, the code from the entire program is scanned and all global
variables are identified.
In this course, we use the name of the calling function and a line
number (or an arrow) to represent the return address.
When the function returns, the variable (and the entire frame) is
popped and effectively “disappears”.
void print_size(int n) {
if (n > 1000000) {
printf("n is huge\n");
} else if (n > 10) {
printf("n is big\n");
} else {
printf("n is tiny\n");
}
}
In practice, the “bottom” of the stack (i.e., where the main stack
frame is placed) is placed at the highest available memory address.
Each additional stack frame is then placed at increasingly lower
addresses. The stack “grows” toward lower addresses.
If the stack grows too large, it can “collide” with other sections of
memory. This is called “stack overflow” and can occur with very
deep (or infinite) recursion.
int i;
int g = 0;
void mystery(void) {
int k;
printf("the value of k is: %d\n", k);
}
Read-Only Data
Global Data
Heap
↑
Stack
high
• make sure you show any variables in the global and read-only
sections, separate from the stack
k => 0
k => 1
k => 2
• use structures in C
int main(void) {
printf("the value of g is: %d\n", g);
printf("the address of g is: %p\n", &g);
}
the value of g is: 42
the address of g is: 0x71a0a0
int i = 42;
int *p = &i; // p "points at" i
trace_int(i);
trace_ptr(&i);
trace_ptr(p);
trace_ptr(&p);
i => 42
&i => 0xf020
p => 0xf020
&p => 0xf024
i int 0xf020 42
p int * 0xf024 0xf020
sizeof(int *) ⇒ 8
sizeof(char *) ⇒ 8
int i = 42;
int *p = &i; // pointer p points at i
trace_ptr(p);
trace_int(*p);
p => 0xf020
*p => 42
*p ⇒ 42
int i = 42;
int *p1 = &i; // pointer p1 points at i
int **p2 = &p1; // pointer p2 points at p1
if (p) ...
if (p != NULL) ...
p = NULL;
i = *p; // crash!
int *p = &i;
int *q = &j;
p = q;
int *p = &i;
int *q = &j;
the statement
*p = *q;
does not change the value of p: it changes the value of what p
points at. In this example, it changes the value of i to 6, even
though i was not used in the statement.
int i = 1;
int *p1 = &i;
int *p2 = p1;
int **p3 = &p1;
trace_int(i);
*p1 = 10; // i changes...
trace_int(i);
*p2 = 100; // without being used directly
trace_int(i);
**p3 = 1000;
trace_int(i);
i => 1
i => 10
i => 100
i => 1000
int main(void) {
int x = 5;
inc(x);
trace_int(x); // 5 or 6 ?
}
The inc function is free to change it’s own copy of the argument (in
the stack frame) without changing the original variable.
int main(void) {
int x = 5;
trace_int(x);
inc(&x); // note the &
trace_int(x);
}
x => 5
x => 6
int main(void) {
int a = 3;
int b = 4;
trace_int(a); trace_int(b);
swap(&a, &b); // Note the &
trace_int(a); trace_int(b);
}
a => 3
b => 4
a => 4
b => 3
• produce output
• read input
• mutate a global variable
• mutate a variable through a pointer parameter
// effects: modifies *px and *py
void swap(int *px, int *py) {
int temp = *px;
*px = *py;
*py = temp;
}
This will help you debug your code and facilitate our testing.
The return value can also be the special constant value EOF to
indicate that the End Of File (EOF) has been reached.
if (retval != 1) {
printf("Fail! I could not read in an integer!\n");
}
This function performs division and “returns” both the quotient and
the remainder.
void divide(int num, int denom, int *quot, int *rem) {
*quot = num / denom;
*rem = num % denom;
}
int *bad_idea(int n) {
return &n; // NEVER do this
}
int *bad_idea2(int n) {
int a = n*n;
return &a; // NEVER do this
}
For structures, the entire structure is copied into the frame. For large
structures, this can be inefficient.
struct bigstruct {
int a; int b; int c; int d; int e; ... int y; int z;
};
Large structures also increase the size of the stack frame. This can
be especially problematic with recursive functions, and may even
cause a stack overflow to occur.
int main(void) {
struct posn p1 = {2, 4};
struct posn p2 = {5, 8};
The rule is “const applies to the type to the left of it, unless it’s
first, and then it applies to the type to the right of it”.
Because a copy of the argument is made for the stack, it does not
matter if the original argument value is constant or not.
The type of a function pointer includes the return type and all of the
parameter types, which makes the syntax a little messy.
int main(void) {
int (*fp)(int, int) = NULL;
fp = my_add;
trace_int(fp(7, 3));
fp = my_sub;
trace_int(fp(7, 3));
}
fp(7, 3) => 10
fp(7, 3) => 4
For larger programs, keeping all of the code in one file is unwieldy.
†
Modules can provide elements that are not functions (e.g., data
structures and variables) but their primary purpose is to provide
functions.
While the terms file and module are often used interchangeably, a
file is only a module if it provides functions for use outside of the file.
There must be a “root” (or main file) that acts only as a client.
This is the program file that defines main and is “run”.
Imagine that some integers are more “fun” than others, and we want
to create a fun module that provides an is_fun function.
// fun.c [MODULE]
bool is_fun(int n) {
return (n == -3 || n == 42 || n == 136 ||
n == 1337 || n == 4010 || n == 8675309);
}
int main(void) {
//...
b = is_fun(k); // OK
//...
}
int main(void) {
trace_int(1 + 1);
trace_int(my_sqr(7)); // OK
}
int f(int n) {
return n + my_variable; // this is now ok
}
• local identifiers are only visible inside of the function (or block )
where it is defined
• global identifiers are defined at the top level, and are visible to
all code following the definition
// fun.h [INTERFACE]
The client can also “copy & paste” the function declarations from the
interface file to make the module functions available.
// main.c [CLIENT]
int main(void) {
//...
b = is_fun(k); // OK
//...
}
//////////////////////////////////////////////////////////////
// main.c [CLIENT]
#include "fun.h"
int main(void) {
//...
b = is_fun(k);
//...
}
The function is_fun is fully documented in the interface file for the
client, so in the implementation a simple comment referring the
reader to the interface file is sufficient.
// see fun.h for details
bool is_fun(int n) {
//...
}
The caller (client) does not need to know if the function mutates
the copy of the argument value.
int my_add(int n) {
return n + MY_NUMBER;
}
#include <stdio.h>
#include "mymodule.h"
#include <assert.h>
#include "fun.h"
int main(void) {
assert(is_fun(42));
assert(!is_fun(13));
//...
}
High cohesion means that all of the interface functions are related
and working toward a “common goal”. A module with many
unrelated interface functions is poorly designed.
An opaque structure is like a “black box” that the client cannot “see”
inside of.
// stopwatch.h [INTERFACE]
struct stopwatch;
// stopwatch.h [INTERFACE]
// stopwatch.c [IMPLEMENTATION]
struct stopwatch {
int min;
int sec;
};
// requires: 0 <= min
// 0 <= sec <= 59
// stopwatch.c [IMPLEMENTATION]
#include "cs136.h"
#include "stopwatch.h"
int main(void) {
struct stopwatch *sw = stopwatch_create();
stopwatch_add_time(sw, 1, 59);
stopwatch_add_time(sw, 3, 30);
trace_int(stopwatch_get_minutes(sw));
trace_int(stopwatch_get_seconds(sw));
stopwatch_destroy(sw);
}
stopwatch_get_minutes(sw) => 5
stopwatch_get_seconds(sw) => 29
// stopwatch.c [IMPLEMENTATION]
struct stopwatch {
int seconds;
};
// requires: 0 <= seconds
However, the client doesn’t need to know how the data is structured.
The client only requires an abstract understanding that a
stopwatch stores time information.
As the client, if you have a data structure, you know how the data
is “structured” and you can access the data directly in any manner
you desire.
With an ADT, the client does not know how the data is structured
and can only access the data through the interface functions
(operations) provided by the ADT.
1234567 "Sally"
3141593 "Archie"
8675309 "Jenny"
You likely have an intuition that BSTs are “more efficient” than
association lists. In Section 08 we introduce a formal notation to
describe the efficiency of an implementation.
• stack
• queue
• sequence
Stacks are often used in browser histories (“back”) and text editor
histories (“undo”).
• structures
• arrays
Because arrays are built-in to C, they are used for many tasks
where lists are used in Racket, but arrays and lists are very
different. In Section 11 we construct Racket-like lists in C.
int j = a[0]; // j is 4
int *p = &a[j - 1]; // p points at a[3]
In this example, a and b are character arrays and are not valid
strings. This will be revisited in Section 09.
int main(void) {
int a[A_LEN] = {4, 8, 15, 16, 23, 42};
// ...
int some_function(int n) {
int m = n * 2;
int a[m]; // length determined at run time
// ...
If a is an integer array with six elements (int a[6]) the size of a is:
(6 × sizeof(int)) = 6 × 4 = 24.
Not everyone uses the same terminology for length and size.
a => 0x5000
&a => 0x5000
&a[0] => 0x5000
Even though a and &a have the same value, they have different
types, and cannot be used interchangeably.
trace_int(a[0]);
trace_int(*a);
a[0] => 4
*a => 4
This is more efficient than copying the entire array to the stack.
Functions should require that the length is valid, but there is no way
for a function to assert that requirement.
int main(void) {
int my_array[6] = {4, 8, 15, 16, 23, 42};
trace_int(sum_array(my_array, 6));
}
sum_array(my_array, 6) => 108
It’s good style to use the const keyword to both prevent mutation
and communicate that no mutation occurs.
int sum_array(const int a[], int len) {
int sum = 0;
for (int i = 0; i < len; ++i) {
sum += a[i];
}
return sum;
}
p + i × sizeof(∗p).
• Subtracting an integer from a pointer (p - i) works in the
same way.
(p − q)/sizeof(∗p).
In other words, if p = q + i then i = p - q.
Recall that for an array a, the value of a is the address of the first
element (&a[0]).
int main(void) {
int a[6] = {4, 8, 15, 16, 23, 42};
print_array(a, 6);
}
4, 8, 15, 16, 23, 42.
int add1(int i) {
return i + 1;
}
int sqr(int i) {
return i * i;
}
int main(void) {
int a[6] = {4, 8, 15, 16, 23, 42};
print_array(a, 6);
array_map(add1, a, 6);
print_array(a, 6);
array_map(sqr, a, 6);
print_array(a, 6);
}
4, 8, 15, 16, 23, 42.
5, 9, 16, 17, 24, 43.
25, 81, 256, 289, 576, 1849.
8 6 7 5 3 0 9
0 6 7 5 3 8 9
0 6 7 5 3 8 9
and then we swap that element with the second one, and so forth...
0 3 7 5 6 8 9
We then “insert” the second element into the existing sequence into
the correct position, and then the third element, and so on.
For each iteration of Insertion sort, the first i elements are sorted.
We then “insert” the element a[i] into the correct position, moving
all of the elements greater than a[i] one to the right to “make
room” for a[i].
3 7 8 5 4 9 0
3 7 5 8 4 9 0
3 5 7 8 4 9 0
3 5 7 8 4 9 0
// Notes:
// i: loops from 1 ... len-1 and represents the
// "next" element to be replaced
// j: loops from i ... 1 and is "inserting"
// the element that was at a[i] until it
// reaches the correct position
The len field keeps track of the actual length of the stack.
struct stack {
int len;
int maxlen;
int data[100];
};
if (something_bad) {
printf("FATAL ERROR: Something bad happened!\n");
exit(EXIT_FAILURE);
}
• use both array index notation ([]) and array pointer notation and
convert between the two
†
We revisit the issue of built-in functions later.
CS 136 Fall 2018 08: Efficiency 6
You are not expected to count the exact number of
operations.
Homer and Bart are debating the best algorithm (strategy) for
implementing check_array.
For Bart’s code, the best case is the same as the worst case.
Is it more “fair” to compare against the best case or the worst case?
example: orders
When multiplying two orders, the result is the product of the two
orders.
In CS 240 and CS 341 you will study orders and Big O notation
much more rigourously.
In this course, our goal is to give you experience and work toward
building your intuition:
int sum_array(const int a[], int len) {
int sum = 0;
for (int i = 0; i < len; ++i) {
sum += a[i];
}
return sum;
}
For example, the running time to add two large positive integers
is O(log n), where n is the largest number.
List functions that process the full list are typically O(n):
length last reverse append
†
This highlights another difference between symbols & strings.
∗
unless x is exponential (e.g., O(2i )).
sum = 0;
for (i = 0; i < n; ++i) {
sum += i;
}
P
n
O(1) = O(n)
i=1
P
n−1
Outer loop: (O(1) + O(i)) = O(n2 )
i=0
When sorting strings or large data structures, you must also include
the time to compare each element.
P
n P
i
T (n) = O(1) = O(n2 )
i=1 j=1
However, in the best case, the array is already sorted, and the inner
loop terminates immediately. This best case running time is O(n).
Despite its worst case behaviour, quick sort is still popular and in
widespread use. The average case behaviour is quite good and
there are straightforward methods that can be used to improve
the selection of the pivot.
From this table, it might appear that insertion sort is the best choice.
In Section 10, we will see merge sort, which is O(n log n) in the
worst case.
CS 136 Fall 2018 08: Efficiency 44
Binary search
In Section 07, we implemented binary search on a sorted array.
int find_sorted(int item, const int a[], int len) {
// ...
while (low <= high) {
mid = (low + high) / 2;
// ...
if (a[mid] < item) {
low = mid + 1;
} else {
high = mid - 1;
//...
If two algorithms have the same time complexity but different space
complexity, it is likely that the one with the lower space complexity is
faster.
Both functions return the same result and both functions have a time
complexity T (n) = O(n).
The significant difference is that asum uses accumulative recursion.
The sum expression “grows” to O(n) +’s, but the asum expression
does not use any additional space.
CS 136 Fall 2018 08: Efficiency 54
The measured run-time of asum is significantly faster than sum
(in an experiment with a list of one million 1’s, over 40 times faster).
But both functions make the same number of recursive calls, how is
this explained?
With tail recursion, the previous stack frame can be reused for the
next recursion (or the previous frame can be discarded before the
new stack frame is created).
n2 ∈ O(n100 )
n3 ∈ O(2n )
While you can say that n2 is in the set O(n100 ), it’s not very useful
information.
Many confuse these two topics but they are completely separate
concepts. You can asymptotically define the best case and the
worst case behaviour of an algorithm.
For example, the best case insertion sort is O(n), while the worst
case is O(n2 ).
For example,
c · g(n)
f (n)
n0 n
Because they all have a null terminator, they are also strings.
The strlen function returns the length of the string, not necessarily
the length of the array . It does not include the null character.
trace_bool(strcmp(a, b) == 0);
trace_bool(!strcmp(a, b));
trace_bool(!strcmp(a, c));
char name[81];
printf("What is your first name?\n");
scanf("%s", name);
You must be very careful to reserve enough space for the string to
be read in, and do not forget the null character.
The input:
Samantha Bob [EOF]
B o b \0 n t h a \0 \0
char name[81];
printf("What is your first name?\n");
scanf("%s", name);
int main(void) {
char name[8];
char message[] = "Hello.";
char prompt[] = "What is your name?";
while (1) {
printf("message: %s\n", message);
printf("prompt: %s\n", prompt);
if (scanf("%s", name) != 1) break;
printf("Welcome, %s!\n", name);
}
}
In practice you would never use this insecure method for reading in
a string.
It writes four chars into the four bytes where balance is stored.
The value of balance is a “re-interpretation” of those four bytes as
an int, instead of four chars.
You should always ensure that the dest array is large enough (and
don’t forget the null terminator).
printf("literal\n");
strcpy(dst, "literal");
int i = strlen("literal");
scanf("%d", &i);
In the code, the occurrence of the string literal is replaced with the
address of the corresponding array.
int main(void) {
char a[] = "mutable char array";
char *p = string_literal_1;
//...
}
• The second reserves space for a char pointer (p) in the stack
frame (8 bytes), initialized to point at a string literal
(const char array) created in the read-only data section.
• a has the same value as &a, while p and &p have different
values
Note that the string literal used with printf must always be
constant length (i.e., printf("literal")).
Read-Only Data
Global Data
Heap
↓
↑
Stack
high
For example, if you want enough space for an array of 100 ints:
int *my_array = malloc(100 * sizeof(int));
Seashell allows
int *my_array = malloc(400);
instead of
int *my_array = malloc(100 * sizeof(int));
int main(void) {
int *arr1 = malloc(10 * sizeof(int));
int *arr2 = malloc(5 * sizeof(int));
//...
}
In practice it’s good style to check every malloc return value and
gracefully handle a NULL instead of crashing.
In the “real world” you should always perform this check, but in this
course, you do not have to check for a NULL return value unless
instructed otherwise.
Pointer variables may still contain the address of the memory that
was freed, so it is often good style to assign NULL to a freed
pointer variable.
In this example, the address from the original malloc has been
overwritten.
That memory is now “lost” (or leaked) and so it can never be freed.
The arrays are divided into two smaller problems, which are then
sorted (conquered). The results are combined to solve the original
problem.
merge_sort(left, llen);
merge_sort(right, rlen);
free(left);
free(right);
}
When allocating memory for strings, don’t forget to include space for
the null terminator.
As we will see shortly, this is not how it is done in practice, but this is
an illustrative example.
// my_array has a length of 100
int *my_array = malloc(100 * sizeof(int));
// stuff happens...
char *readstr(void) {
char c;
if (scanf(" %c", &c) != 1) return NULL; // ignore initial WS
int len = 1;
char *str = malloc(len * sizeof(char));
str[0] = c;
while (1) {
if (scanf("%c", &c) != 1) break;
if (c == ' ' || c == '\n') break;
++len;
str = realloc(str, len * sizeof(char));
str[len - 1] = c;
}
str = realloc(str, (len + 1) * sizeof(char));
str[len] = '\0';
return str;
}
By using this doubling strategy, the total run time for readstr is
now only O(n).
In Section 06, the first ADT we saw was a simple stopwatch ADT . It
demonstrated information hiding, which provides both security and
flexibility .
It used an opaque structure, which meant that the client could not
create a stopwatch.
struct stopwatch;
struct stopwatch {
int seconds;
};
// requires: 0 <= seconds
struct stack;
struct stack {
int len;
int maxlen;
int *data;
};
Each node contains an item and a link (pointer ) to the next node in
the list.
no mutation mutation
• If we explicitly free all of the memory for list a, then list b will
become invalid.
CS 136 Fall 2018 11: Linked Data Structures 13
To avoid mixing paradigms, we use the following guidelines when
implementing linked lists in C:
struct llist {
struct llnode *front;
};
int main(void) {
And the following code inserts a new node to the front of the list.
void add_front(int i, struct llist *lst) {
lst->front = new_node(i, lst->front);
}
insert( 5, a);
insert(30, a);
What if the length field does not accurately reflect the true length?
Or a naïve coder may think that the following statement removes all
of the nodes from the list.
lst->length = 0;
Advanced testing methods can often find these types of errors, but
you must exercise caution.
struct queue;
struct llnode {
int item;
struct llnode *next;
};
struct queue {
struct llnode *front;
struct llnode *back; // <--- NEW
};
For example, a dictionary node can contain both a key (item) and a
corresponding value.
Or for a priority queue, each node can additionally store the priority
of the item.
struct bstnode {
int item;
struct bstnode *left;
struct bstnode *right;
};
struct bst {
struct bstnode *root;
};
The worst case is when the tree is unbalanced, and every node in
the tree must be visited.
left: 2i+1
right: 2i+2
40 20 50 10 30 - 60
// dictionary.h
struct dictionary;
struct bstnode {
int item; // key
char *value; // additional value (augmentation)
struct bstnode *left;
struct bstnode *right;
};
struct dictionary {
struct bstnode *root;
};
If the client tries to insert a duplicate key, we replace the old value
with the new value.
• A) If the node with the key (“key node”) is a leaf, we remove it.
• B) If one child of the key node is empty (NULL), the other child is
“promoted” to replace the key node.
• C) Otherwise, we find the node with the next largest key (“next
node”) in the tree (i.e., the smallest key in the right subtree). We
replace the key/value of the key node with the key/value of the
next node, and then remove the next node from the right
subtree.
If the data is sequenced, then a data structure that sorts the data
(e.g., a BST) is likely not an appropriate choice. Arrays and linked
lists are better suited for sequenced data.
∗
A hash table is typically an array of linked lists (more on hash
tables in CS 240).
Integer i;
IntPtr p = &i;
int main(void) {
int arr[6] = {4, 8, 15, 16, 23, 42};
array_map(add1, arr, 6);
MapFn f = add1;
array_map(f, arr, 6);
//...
}
// operations:
Stack stack_create(void);
or...
// item.h
typedef struct posn ItemType; // for stacks of posns
void pointers can point to “any” type, and are essentially just
memory addresses. They can be converted to any other type of
pointer, but they cannot be directly dereferenced.
int i = 42;
void *vp = &i;
int j = *vp; // INVALID
int *ip = vp;
int k = *ip; // VALID
#include "stack.h"
The ADT would then just call the comparison function whenever a
comparison is necessary.
• negative: a precedes b
• zero: a is equivalent to b
• positive: a follows b
// a comparison function for integers
int compare_ints(const void *a, const void *b) {
const int *ia = a;
const int *ib = b;
return *ia - *ib;
}
struct dictionary;
typedef struct dictionary *Dictionary;
struct bstnode {
void *item; // key
void *value; // additional value (augmentation)
struct bstnode *left;
struct bstnode *right;
};
struct dictionary {
struct bstnode *root;
DictKeyCompare key_compare; // function pointer
};
Dictionary dict_create(DictKeyCompare f) {
Dictionary d = malloc(sizeof(struct dictionary));
d->root = NULL;
d->key_compare = f;
return d;
}
The other parameters of qsort are an array of any type, the length
of the array (number of elements), and the sizeof each element.
int main(void) {
//...
}
55 89 E5 83 EC 10 C7 45 F8 00 00 00 00 C7 45 FC 01 00 00
00 EB 0A 8B 45 FC 01 45 F8 83 45 FC 01 8B 45 FC 3B 45 08
7E EE 8B 45 F8 C9 C3.
• preprocessing
• compilation
• linking
For example, the #include directive “cut and pastes” the contents
of one file into another file.
At the command line, you are always “working” in one directory. This
is also known as your “current” directory or the directory you are “in”.
$ pwd
/u1/username
The full directory name is the path through the tree starting from the
root (/) followed by each “sub-directory”, separated by /’s.
A file name may also include the path to the file, which can be
absolute (from the root) or relative to the current directory.
CS 136 Fall 2018 13: Beyond 15
SSH
SSH (Secure SHell) allows you to use a command-line interface on
a remote computer.
$ ssh [email protected]
One of the easiest text editors for beginners is nano. To start using
nano, you only need to remember two commands. To save (output)
your file, press (Ctrl-O), and to exit the editor, press (Ctrl-X).
int main(void) {
printf("Hello, World!\n");
}
$ gcc -c module1.c
$ ls
module1.c module1.o
The default source for stdin is the keyboard, and the default
destination for stdout is the “output window”.
int main(void) {
char c;
while(1) {
if (scanf("%c", &c) != 1) break;
if (c >= 'a' && c <= 'z') {
c = c - 'a' + 'A';
} else if (c >= 'A' && c <= 'Z') {
c = c - 'A' + 'a';
}
printf("%c", c);
}
}
To redirect input from a file, use the < symbol (i.e., < filename).
Dave Tompkins
[email protected]