Cs110 Practice Final 2
Cs110 Practice Final 2
Autumn 2014
This is a closed book, closed note, closed computer exam (although you are allowed to
use your two double-sided cheat sheets.) You have 180 minutes to complete all
problems. You don’t need to #include any libraries, and you needn’t guard against any
errors unless specifically instructed to do so. Understand that the majority of points are
awarded for concepts taught in CS110. If you’re taking the exam remotely, you can call
me at 415-205-2242 should you have questions.
Good luck!
I accept the letter and spirit of the honor code. I’ve neither given nor received aid on this exam.
I pledge to write more neatly than I ever have in my entire life.
[signed] __________________________________________________________
Score Grader
1. SIGCALL and System Call Traces [10] ______ ______
2. Multiprocessing Redux [12] ______ ______
3. Concurrent and Evaluation [10] ______ ______
4. Concurrency and Networking Redux [18] ______ ______
Relevant Prototypes
// exceptional control flow and multiprocessing
pid_t fork();
pid_t waitpid(pid_t pid, int *status, int flags);
typedef void (*sighandler_t)(int sig);
sighandler_t signal(int signum, sighandler_t handler); // ignore retval
int open(const char *pathname, int flags); // returns descriptor
int close(int fd); // ignore retval
int dup2(int old, int new); // ignore retval
int execvp(const char *path, char *argv[]); // ignore retval
#define WEXITSTATUS(status) // macro
// vector
template <typename T>
class vector {
public:
vector();
vector(size_t count, const T& elem = T());
size_t size() const;
const T& operator[](size_t index) const; // shorthand is v[index]
}
// thread
class ThreadPool {
public:
ThreadPool(size_t size);
~ThreadPool();
void schedule(const function<void(void)>& thunk);
void wait();
};
class mutex {
public:
mutex();
void lock();
void unlock();
};
class semaphore {
public:
semaphore(int count = 0);
void wait();
void signal();
};
class condition_variable_any {
public:
template <typename Mutex, typename Pred>
void wait(Mutex& m, Pred pred);
void notify_one();
void notify_all();
};
3
getsyscall accepts the id of a child process and returns the name of a system call recently
invoked by that child. More specifically, the names of the child’s system calls are queued up,
behind the scenes in FIFO order, and getsyscall can be repeatedly called to surface the
names of all system calls in the order they were invoked. If all of a child’s system calls made
thus far have been surfaced, getsyscall returns NULL.
Further imagine the set of signals (e.g. SIGCHLD, SIGTSTP, etc.) has been extended to include a
new type, SIGCALL. The kernel sends a SIGCALL to a process every time a child process
makes a system call (e.g. fork, execvp, open, read, accept, etc.).
By default, the SIGCALL signal is ignored, but a custom signal handler (e.g. handleSIGCALL,
where you specify the implementation) can be installed. A custom SIGCALL signal handler is
invoked whenever the kernel has SIGCALL’ed the parent one or more times since the handler
last executed.
These new directives (SIGCALL, getsyscall) can be used to implement a program called
trace. In particular, trace takes the name of an executable and the arguments that should be
passed to that executable’s main function and runs that executable in a child process. The child
process runs as it normally would, redirecting STDOUT_FILENO and STDERR_FILENO to a file
called "/dev/null"1, but the trace executable itself lists out the sequence of system calls
made by the child.
Example time: I’m able to run my own Assignment 7 solution this way:
If I’m curious what system calls are made when mr executes (and don’t care to see to its output),
I can execute this instead:
1
When one wants to discard standard output and/or standard error, one typically redirects one
or both of them to a file called /dev/null. For this problem, we don’t want the child process’s
output to interfere with trace’s.
4
jerry> trace mr --config odyssey-partial.cfg
execve
brk
access
mmap
open
stat
open
stat
// output omitted for brevity
futex
munmap
write
exit_group
jerry>
To be clear, the execution of mr, when invoked with --config and odyssey-partial.cfg
as arguments, involves calls to execve, brk, access, mmap, open, stat, and so forth, all of
which are system calls. Not surprisingly, some system calls are invoked several times, so you’d
expect to see open, read, write, accept, and close listed many, many times in this
particular trace’s output.
For this problem, you’re to present the entire main function needed to implement trace. Here
are some assumptions to make and constraints to be met:
• You can assume that trace gets at least one argument, and that argument is the name of
another legitimate executable.
• You may declare a single global variable of type pid_t, which can be used to store a
single process id. (Actually, I declared it for you. You can’t declare any others.)
• You should assume that all system calls succeed (i.e. you needn’t do any error checking.)
• You should assume that the child executable returns normally (e.g. returns some value
from its own main, or calls exit with some legitimate value; there’s no need to check
for abnormal termination.)
• The child process being traced should have its output (both standard and error) redirected
to "/dev/null". ("/dev/null" should be opened as a regular file with the flag
O_WRONLY.)
• trace should wait until the child process terminates, and the child process’s exit status
should be trace’s exit status.
• Your solution should implement a custom SIGCALL handler and install it.
• Your solution may not busy wait anywhere.
Use the next page to present your entire program. An unnecessarily complicated solution will
not get full credit, even if it’s correct.
5
/**
* File: trace.c
* -------------
* Implements an executable that traces all of the system calls
* made by another executable.
*
* myth8> ls -lta /usr/class/cs110/repos/assign1
* // lists all of the SUNet IDs of those who were
* // enrolled when the first assignment went out
* myth8> trace ls -lta /usr/class/cs110/repos/assign1
* // executes ls -lta /usr/class/cs110/repose/assign1 in a
* // child process, and lists the sequence of
* // system calls ls uses while executing with its two arguments
*/
a. [4 points] Recall that one can route the standard output of one process to the standard input
of a second using | (the vertical bar) on the command line. In fact, we can cascade pipes—
that’s what they’re called—so that the standard output of the first process sources the
standard input of a second, the standard output of the second sources the standard input of
the third, and so forth.
Consider the following program called conduit (this is the entire implementation):
int main(int argc, char *argv[]) {
while (true) {
sleep(1); // sleep one second
int ch = fgetc(stdin); // pulls a single character from stdin
if (ch == -1) return 0;
putchar(ch); // presses the char ch to stdout
fflush(stdout);
}
}
When I type the following line in at the command prompt on a myth machine, I create a
background job with five processes.
The echo process, which immediately prints and flushes abcdefghij to the standard input
of the first conduit process in the pipeline, has a process id of 20686. The first conduit
process—the one fed by echo—has a process id of 20687, the second has a process id of
20688, and so forth.
• [2 points] Assume I send a SIGTSTP to process id 20687 after two seconds. What
state will the other four processes be in 20 seconds later, assuming I don’t send any
other signals?
• [2 points] Assume I send a SIGTSTP to process id 20690 after two seconds. What
state will the other four processes be in 20 seconds later, assuming I don’t send any
other signals?
7
b. [4 points] Typically, each page of a process’s virtual address space maps to a page in
physical memory that no other virtual address space maps to. However, when two processes
are running the same executable (e.g. you have two instances of emacs running,) some
pages within each of the two processes’ virtual address spaces can map to the same exact
pages in physical memory. Name two segments (the heap is an example of a segment) of a
processes’ virtual address spaces that might be backed by the same pages of physical
memory, and briefly explain why it’s possible.
c. [4 points] Recall that the stack frames for system calls are laid out in a different segment of
memory than the stack frames of normal (i.e. user program) functions. How are the stack
frames for system calls set up? And how are the values passed to the system calls received
when invoked from user functions?
8
Assume a programming language models the notion of a Boolean expression like this:
class BoolExpression {
public:
bool evaluate() const; // evaluates the expression and returns true or false
// constructor and other methods omitted
private:
// implementation details omitted
};
Use the next page to present your implementation. An unnecessarily complicated solution will
not receive full credit, even if it’s correct.
9
/**
* Function: concurrentAnd
* -----------------------
* Concurrently evaluates each of the BoolExpressions, and returns
* true if and only if all BoolExpressions evaluate to true. This version
* does not support short circuit evaluation.
*/
a. [3 points] There are a very limited number of scenarios where busy waiting is a reasonable
approach to guarding a critical region. Briefly describe one such scenario in enough detail
that someone just learning about threading and concurrency would understand why busy
waiting might make more sense.
b. [3 points] Explain why a line as simple as i++, where i is a simple int, might be thread-
safe on some architectures but not thread-safe on others, even if the implementer fails to use
concurrency directives like the mutex or the semaphore.
11
This second, reentrant version is thread-safe, because the client shares the location of a
locally allocated struct hostent via argument 2 where the return value can be
placed, thereby circumventing the caller’s dependence on shared, statically allocated,
global data. Note, however, that the client is expected to pass in a large character buffer
(as with a locally declared char buffer[1 << 16]) and its size via arguments 3 and
4 (e.g. buffer and sizeof(buffer)). What purpose does this buffer serve?
12
f. [4 points] Your MapReduce server took the responsibility of actually spawning the
workers via a combination of threading, calls to system, and the ssh user program.
This worked for our implementation because we had at most 8 workers at any one time.
In practice, MapReduce implementations manage thousands of workers across
thousands of machines. Why does our implementation not scale to the realm where
there are thousands of workers instead of at most 32 (even if the myth cluster actually
had thousands of machines)? What changes can realistically be made to the
implementation to deal with thousands of workers instead of just 32?