Final Exam: 15-213 Introduction To Computer Systems
Final Exam: 15-213 Introduction To Computer Systems
Final Exam
May 3, 2006
Assembly Language 1 15
Out-of-Order Execution 2 20
Pointer Arithmetic 3 10
Caching 4 20
Signals 5 15
Semaphores 6 30
Servers 7 20
System-Level I/O 8 20
Total 150
1
1. Assembly Language (15 points)
Consider the following declaration of a binary search tree data structure.
struct TREE {
int data;
struct TREE* left;
struct TREE* right;
};
2
Next we consider the tree as a binary search tree, where elements to the left of a node
are smaller and elements to the right of a node are larger than the data stored in the node.
The following function checks whether a given integer x is stored in the global tree root.
typedef struct TREE tree;
tree* root;
int member(int x) {
tree* t = root;
while (t != NULL) {
if (x == t->data)
return 1;
if (x < t->data)
t = t->left;
else
t = t->right;
}
return 0;
}
This function might compile to the following piece of assembly code, omitting some
code alignment directives and three lines for you to fill in.
member:
movq root(%rip), %rax
testq %rax, %rax
je .L9
.L14:
___________________________
je .L13
jle .L5
movq 8(%rax), %rax
.L11:
testq %rax, %rax
jne .L14
.L9:
___________________________
ret
.L5:
movq 16(%rax), %rax
jmp .L11
.L13:
___________________________
ret
3
4. (4 pts) Complete the following table, associating C variables with machine registers
or assembly expressions.
x %edi
root root(%rip)
t %rax
return value %eax
member:
movq root(%rip), %rax
testq %rax, %rax
je .L9
.L14:
cmpl %edi, (%rax)
je .L13
jle .L5
movq 8(%rax), %rax
.L11:
testq %rax, %rax
jne .L14
.L9:
xorl %eax, %eax
ret
.L5:
movq 16(%rax), %rax
jmp .L11
.L13:
movl $1, %eax
ret
4
2. Out-of-Order Execution (20 points)
We continue the code from the previous problem
1. (15 pts) On a machine with pipelining and out-of-order execution as the machines
used in this course, the efficiency of the inner loop can be improved by the use of
conditional move instructions. Rewrite the code between .L14 and .L9 by using
only instructions from the original program, ordinary move instructions, and one
or more of the following conditional move instructions.
cmovl S, D
cmovle S, D
cmove S, D
cmovge S, D
cmovg S, D
As usual, S stands for the source, D for the destination, and the suffix l, le, e, ge,
g has the same meaning as for conditional branches.
.L14:
cmpl %edi, (%rax) # compare t->data:x
je .L13 # if = then return 1
cmovl 16(%rax), %rsi # if < then temp = t->right
cmovg 8(%rax), %rsi # if > then temp = t->left
movq %rsi, %rax # t = temp
testq %rax, %rax
jne .L14
does not work, since the arguments to a conditional move are fetched from memory even if
the condition is false. The cmovg will therefore lead to a segfault if cmovl stores NULL in
%rax.)
2. (5 pts) Explain why the code with conditional moves can be more efficient than the
original code produced by the compiler.
In general, it will be very difficult for the processor to predict whether the
jle instruction in the original program will branch or not. Assuming the
tree is balanced, it will be wrong roughly half the time, paying a high mis-
prediction penalty. The code with the conditional moves does not pay this
overhead, at the small cost of one additional move instruction.
5
3. Pointer Arithmetic (10 points)
A desparate student decided to write a dynamic memory allocator for an x86-64 machine
in which each block has the following form:
where
where bp points to the beginning of the payload and is aligned to 0 modulo 8. Circle each
of the following letters A–J for which the macros will correctly print the id string.
6
4. Caching (20 points)
Assume the following situation.
• A float is 4 bytes.
The function mm_ijk multiplies two N × N arrays A and B and puts the result in R.
For simplicity, we assume R is initialized to all zeros.
1. (6 pts) Consider the executions of mm_ijk with N=2 and N=4 or a 64-byte fully
associative LRU cache with 4-byte lines (the cache holds 16 lines). Fill in the table
below with the number of cache misses caused by accesses to each of the arrays A,
B, and R, assuming that the argument arrays are 16-byte aligned.
N A B R
2 4 4 4
4 16 64 16
2. (6 pts) Now suppose we consider the previous experiment on a 64-byte fully asso-
ciative LRU cache with 16 byte lines (the cache holds 4 lines). Fill in the table below
with the number of cache misses due to each array, assuming that the argument
arrays are 16 byte aligned.
N A B R
2 1 1 1
4 4 64 4
7
3. (5 pts) Even if R is initialized to all zeros, and even if the program is single-threaded
and no signals occur, after the execution of the mm_ijk function, R will not neces-
sarily contain the product of A and B. Give a concrete counterexample.
If " # " #
1 1 1 1
A= and B=
1 1 0 0
then A · B = A. But if R starts on the second row of B (in C, R = B+1 for
float B[2][2]) writing the results into R[0, 0] and R[0, 1] will overwrite
B[1, 0] and B[1, 1], which eventually yields
" #
2 2
R=
3 3
and R 6= A = A · B.
Since we are tied to a machine, we interpret product as a floating point opera-
tion. If we tried to find a difference from the mathematically correct answer
we would also have to consider overflow, underflow, and rounding error.
8
5. Signals (15 points)
For each code segment below, give the largest value that could be printed to stdout. Re-
member that when the system executes a signal handler, it blocks signals of the type
currently being handled (and no others).
/* Version A */
int i = 0;
void handler(int s) {
if (!i) {
kill(getpid(), SIGINT);
}
i++;
}
int main() {
signal(SIGINT, handler);
kill(getpid(), SIGINT);
printf("%d\n", i);
return 0;
}
/* Version B */
int i = 0;
void handler(int s) {
if (!i) {
kill(getpid(), SIGINT);
kill(getpid(), SIGINT);
}
i++;
}
int main() {
signal(SIGINT, handler);
kill(getpid(), SIGINT);
printf("%d\n", i);
return 0;
}
9
/* Version C */
int i = 0;
void handler(int s) {
if (!i) {
kill(getpid(), SIGINT);
kill(getpid(), SIGUSR1);
}
i++;
}
int main() {
signal(SIGINT, handler);
signal(SIGUSR1, handler);
kill(getpid(), SIGUSR1);
printf("%d\n", i);
return 0;
}
10
6. Semaphores (30 points)
We would now like to use binary search trees with (long) integer data to represent sets of
integers.
struct TREE {
long int data;
struct TREE* left;
struct TREE* right;
};
typedef struct TREE tree;
tree* root = NULL; /* tree is initially empty */
We assume that the functions member and insert are straightforward (as member
in an earlier problem) and are not thread-safe. In this problem we explore how to write
wrappers so they can be used in a thread-safe manner.
We would like to allow any number of concurrent readers (via the member function),
since simultaneous reading the data structure is safe. On the other hand, we would like to
ensure that a writer (via the insert function) has exclusive access to the data structure
(no other writers or readers allowed).
To implement this, we use one global variable, readers, which counts the number of
currently executing readers and two semaphores: r to ensure mutually exclusive access
to the readers variable, and w to ensure that a writer has exclusive access to the data
structure stored in the root variable.
int readers = 0;
sem_t w;
sem_t r;
1. (15 pts) Below is the skeleton of the wrapper functions. Fill in the missing lines from
the following selection. Note that you may need some commands more than once
while others may remain unused.
P(&r);
V(&r);
P(&w);
V(&w);
readers++;
readers--;
if (readers == 1) P(&w);
if (readers == 1) V(&w);
if (readers == 0) P(&w);
if (readers == 0) V(&w);
11
int member_safe(long int x) {
int b;
P(&r); /* wait for read lock */
readers++; /* increment readers */
if (readers == 1) P(&w); /* wait for write lock if first reader */
V(&r); /* release read lock */
b = member(x);
P(&r); /* wait for read lock */
if (readers == 1) V(&w); /* release write lock if last reader */
readers--; /* decrement readers */
V(&r); /* release read lock */
return(b);
}
(Solution lines are commented. There are some other correct solutions with the given com-
mands.)
2. (5 pts) Which code is needed to initialize the semaphores. Circle all needed initial-
ization.
12
3. (5 pts) Note that it is possible for a write request never to succeed by having certain
read patterns. Precisely explain the circumstances under which this may happen.
If the write request arrives when there is at least one reader, and from then
on there is at least one reader at any point in time, then the write lock is
never released and the writer does not get a turn.
4. (5 pts) Propose a simple solution to avoid this problem so that every write request
is eventually honored (in words—no need to show code).
For example, all requests could be entered into a queue so that read request
arriving after a write request must wait. Some care must be taken to ensure
that the queued read requests can still execute simultaneously and not in
sequence.
13
7. Servers (20 points)
We now write a server which maintains a do-not-call list of telephone numbers. Tele-
marketers who call numbers in this list are subject to heavy fines. Our protocol is much
simpler than HTTP. A client connects to the server and then sends either
• ?x\n to query whether the number x is in the do-not-call list, whereupon the server
responds with 0\n (not in the list) or 1\n (in the list).
Here, a phone number x is just a 10-digit number (no spaces or parentheses). The server
spawns a separate thread for each connection. We assume that serve(connfd) reads
one line of input from connfd, parses it, and responds if appropriate according to the
protocol above. For conciseness, the code below does not check return codes for system
calls.
int main() {
int listenfd, connfd;
pthread_t tid;
listenfd = open_listenfd(15213);
while (1) {
connfd = accept(listenfd, NULL, NULL);
pthread_create(&tid, NULL, thread, &connfd);
}
}
14
1. (10 pts) Even if serve is thread-safe, the code above exhibits a race condition. Ex-
plain this race condition in detail.
The connfd variable resides on the stack frame of the main function. When
the function thread is called, we have a race condition between the next
call to accept (which would overwrite the contents of &connfd) and the
first assignment in the thread which would dereference that location.
2. (10 pts) Correct the code above to avoid the race condition. Your code should not
cast integers to pointers or pointers to integers (which would be in poor taste).
int main() {
int listenfd;
int* connfdp; /* new */
pthread_t tid;
...
listenfd = open_listenfd(15213);
while (1) {
connfdp = malloc(sizeof(int)); /* new */
*connfdp = accept(listenfd, NULL, NULL); /* modified */
pthread_create(&tid, NULL, thread, connfdp); /* modified */
}
}
(Lines new or modified in the solution are annotated. Important is that the connfd is stored
in the heap, and that this space is freed by the thread.)
15
8. System-Level I/O (20 points)
We continue the example from the previous question. To complete the serve function
below, you should use
void rio_readinitb(rio_t* rp, int fd);
ssize_t rio_readlineb(rio_t* rp, char* usrbuf, size_t maxlen);
ssize_t rio_writen(int fd, char* usrbuf, size_t len);
from the robust I/O package. The function long atol(char* s) converts the begin-
ning of the string s to a long int.
void serve(connfd) {
long int phone_number;
rio_t rio;
char buf[16]; /* big enough for char, number, newline, 0 */
16
2. (5 pts) HTTP 1.1 allows for multiple client requests and and server responses on a
single connection. Explain why.
3. (5 pts) We would like our protocol to support multiple interactions per connection.
Show how to change the the code for the serve function above to handle multiple
requests per connection. [Hint: you don’t have to change much.]
17