Program 2 - Support For LWP
Program 2 - Support For LWP
This assignment originally by Dr. Nico is adopted by versions by Dr. Bellardo and Nico
What you’re doing is taking the one real stream of control—the one that calls main(), which we will call the
original system thread—and sharing it across an arbitrary number of lightweight threads.
Most of the real work will be in lwp_create(). lwp_create() creates a new thread and sets up its context
so that when it is selected by the scheduler to run and lwp_yield() uses swap_rfiles() to load its
context and returns1 to it, it will start executing at the very first instruction of the thread’s body function.
Calling lwp_yield() causes a thread to yield control to another thread, and lwp_exit() terminates the
calling thread and switches to another, if any. The whole system is started off by a call to lwp_start() which
adds the original system thread to the thread pool, then yields control whichever thread the scheduler should
choose.
1 This is important: none of these thread functions—the ones that are passed to lwp_create() to form the program of
the new thread—are ever called. They are returned to.
The Library Functions
The semantics of the individual library functions are listed in Table 2 with explanatory notes as necessary
below.
• lwp_create()
Creates a new thread and admits it to the current scheduler. The thread’s resources will consist of a
context and stack, both initialized so that when the scheduler chooses this thread and its context is
loaded via swap_rfiles() it will run the given function. This may be called by any thread.
• lwp_start()
Starts the threading system by converting the calling thread—the original system thread—into a LWP
by allocating a context for it and admitting it to the scheduler, and yields control to whichever thread the
scheduler indicates. It is not necessary to allocate a stack for this thread since it already has one.
• lwp_yield()
Yields control to the next thread as indicated by the scheduler. If there is no next thread, calls exit(3)
with the termination status of the calling thread (see below).
• lwp_exit(int status)
Terminates the calling thread. Its termination status becomes the low 8 bits of the passed integer. The
thread’s resources will be deallocated once it is waited for in lwp_wait(). Yields control to the next
thread using lwp_yield().
When lwp_wait() is called, if there exist terminated threads, it will return the oldest one without
blocking. That is, it will return terminated threads in FIFO order and the oldest will be the head of the
queue.
If there are no terminated threads, the caller of lwp_wait() will have to block.
Deschedule it (with sched->remove()) and place it on a queue of waiting threads. When another
thread eventually calls lwp_exit() associate it with the oldest waiting thread—the pointer exited may
be useful for this—remove it from the queue, and reschedule it (with sched->admit()) so it can finish
its call to lwp_wait().
The only exception to this blocking behavior is if there are no more threads that could possibly block. In
that case lwp_wait() just returns NO_THREAD. The way it can tell is by using the scheduler’s
qlen() function (see below). Most likely the calling thread will still be in the scheduler at the time of
this check, so you’re testing for whether qlen() is greater than 1.
Thread body functions
The code to be executed by a thread is contained in function whose address is passed to lwp_create().
The thread will execute until it either calls lwp_exit() or the function returns with a termination status.
This thread function takes a single argument, a pointer to anything, that is also passed to lwp_create().
Termination statuses
A thread’s status consists of a flag indicating whether it is running (LWP_LIVE) or terminated (LWP_TERM)
and an 8-bit integer that can be passed back via lwp_wait().
A thread’s termination value is the low 8 bits either of the argument to lwp_exit() or of the return value of
the thread function. These are combined into a single integer using the macro MKTERMSTAT() which is what
is passed back by lwp_wait().
Macros for dealing with termination statuses are given in Table 3.
Stacks
Every thread needs a stack, and that stack needs to come from somewhere. So far, a way you know to get
memory is malloc(3), which allocates to you a junk of memory in a contiguous heap, meaning that if one
stack overflows, it can overflow into neighboring regions. In this section we will look at using mmap(2) to
create stacks in memory regions that are not connected to each other.
mmap(2) is a versatile system call that allows processes to map regions of memory shared with other
processes, or to map files directly into their memory spaces bypassing the IO system calls. For our purposes,
we’re just going to use mmap(2) to create a region of memory for each of our threads to use as a stack. If a
thread’s stack overflows, this will generate a SEGV when it touches the first unmapped page, but it will not
corrupt its neighbors.
For our stacks, where should be NULL (let mmap(2) choose), fd should be −1 (some implementations
require this), and offset should be zero. We should offer read and write permission (but not execute) and we
should have flags appropriate to a stack:
Now, like pthreads(7) we will use the stack size resource limit if it exists. To get the value of a resource limit,
use getrlimit(2). The limit for stack size is RLIMIT_STACK. getrlimit(2) reports both hard and soft
resource limits. Use the soft one.
If RLIMIT_STACK does not exist or if its value is RLIM_INFINITY, choose a reasonable stack size. I use 8MB2.
On a sane system, this resource limit will be a multiple of the page size. But what if it’s not? Round up to the
nearest multiple of the page size. Now you’ve got your size. Allocate a stack and get on with it.
Note: The man page talks about mmap(2) being able to create regions that automatically grow downward to
support stacks. Apparently in current linux kernels this is. . . aspirational. Still, many megabytes of stack should
be good enough for our threads.
2 Yes, this feels rather large, but a 64-bit address space is huge, so why not?
Things to know
Everything in the rest of this document is intended to provide information needed to implement a lightweight
processing package for a 64-bit Intel x86_64 CPU compiling with gcc3. This is the environment found on the
Linux desktop machines in the CSL and unix[1-5].csc.calpoly.edu.
Before we build a thread support library, we need to consider what defines a thread. Threads exist in the same
memory as each other, so they can share their code and data segments, but each thread needs its own
registers and stack to hold local data, function parameters, and return addresses.
Registers
The x86_64 CPU (doing only integer arithmetic4) has sixteen registers of interest, shown in Table 4
Since C has no way of naming registers, I have provided some useful tools below that will allow you to access
these registers. The assembly language file, magic64.S5 contains a function
void swap_rfiles(rfile *old, rfile *new). This does two things:
1. if old != NULL it saves the current values of all 16 registers and the floating point state to the
struct registers pointed to by old.
2. if new != NULL it loads the 16 register values and the floating point state contained in the struct
registers pointed to by new into the registers.
In this assignment it should never be necessary to load or store a context independently. Always do atomic
context switches using swap_rfiles(). To assemble magic64.S, use gcc:
gcc -o magic64.o -c magic64.S
The whole function can be seen in Figure 3.
3 It should work with other compilers, but I’ve tested it with gcc.
4 As well as a bunch more for floating point, but we aren’t going to talk about those here. swap_rfiles() saves them,
though.
5 For what it’s worth, if an assembly file ends in “.S”, the compiler will run it through the C preprocesser. If it’s “.s”, it won’t
Floating Point State
As we said above, in addition to the registers, swap_rfiles() also preserves the state of the x87 Floating
Point Unit (FPU). This is stored in the last element of the struct rfile, the struct fxsave called
fxsave. This structure holds all the FPU state.
Important: when you initialize your thread’s register file, you will have to initialize this structure to the
predefined value FPU_INIT like so:
newthread->state.fxsave=FPU_INIT;
Stack structure: The gcc calling convention
In order to build a context in lwp_create() that will do the right thing when loaded and returned-to, you will
need to know the process by which stack frames are built up and torn down.
The extra registers available to the x86_64 allow it to pass some parameters in registers. This makes the
overall calling convention a little more complicated, but, in practice, it will be easier for your program since you
won’t be passing enough parameters to push you out of the registers onto the stack.
This section describes the calling convention which will allow you to both understand and construct the stack
frames you will need. These figures show normal stack development. What you will be developing will be
distinctly abnormal. The steps of the convention are as follows (illustrated in Figures 1a–f)
Note: Intel’s Application Binary Interface specification6 requires that all stack frames be aligned on a 16 byte
boundary7. The exact wording is:
This means that the address of the bottom (lowest in memory) element of the argument area needs to be
evenly divisible by 16, even if there isn’t an argument area. That is, the address above the frame’s return
address must be evenly divisible by 16 (equivalently, the saved base pointer’s address must be evenly divisible
by 16).
Be aware of this as you build your stacks. If your stack frame is not properly aligned, all you will see is a
SEGV.
LWP system architecture
Everything you need is defined in lwp.h, fp.h, and magic64.S, two of which are included in Figures 2 and
3 (for the third, see “Supplied Code” later on).
At the heart of lwp.h is the definition of a struct threadinfo_st which defines a thread’s context. This
contains:
• The thread’s thread ID. This must be a unique integer that stays the same for the lifetime of the thread.
It’s what a thread may use to identify itself. (NO_THREAD is defined to be 0 and is always invalid.) You
may assume that there will never be more than 264 − 2 threads, so a counter is just fine.
• A pointer to the base of the thread’s allocated stack space—the pointer originally returned by
mmap(2), see above—so that it can later be unmapped.
• A struct registers that contains a copy of all the thread’s stored registers.
• A status integer that encodes the current status of a thread (running or terminated) and an exit status if
terminated.
• Four pointers:
lib_one and lib_two are reserved for the use of the library internally, for any purpose or no purpose
at all. (Many people find these useful to maintain a global linked list of all threads for implementing
tid2thread() or perhaps for keeping track of threads that are waiting.)
sched_one and sched_two are reserved for use by schedulers, for any purpose or no purpose at all.
Most schedulers need to keep lists of threads, so this makes that convenient.
Neither the scheduler nor the library may make any assumptions about what the other is doing
These, along with each’s stack, hold all the state we need for each thread.
Scheduling
The lwp library’s default scheduling policy is round robin—that is, each thread takes its turn then goes to the
back of the line when it yields—but client code can install its own scheduler with lwp_set_scheduler().
The lwp scheduler type is a pointer to a structure that holds pointers to six functions. These are:
• void init(void)
This is to be called before any threads are admitted to the scheduler. It’s to allow the scheduler to set
up. This one is allowed to be NULL, so don’t call it if it is.
• void shutdown(void)
This is to be called when the lwp library is done with a scheduler to allow it to clean up. This, too, is
allowed to be NULL, so don’t call it if it is.
• thread next()
Return the next thread to be run or NULL if there isn’t one.
• int qlen()
Return the number of runnable threads. This will be useful for lwp_wait() in determining if waiting
makes sense
Changing schedulers will involve initializing the new one, pulling out all the threads from the old one (using
next() and remove()) and admitting them to the new one (with admit()), then shutting down the old
scheduler.
1. Write the default round robin scheduler. This consists almost entirely of keeping a list, and then you will
have a scheduler, and it feels good to have started.
2. Then, in lwp_create():
(b) Initialize the stack frame and context so that when that context is loaded in swap_rfiles(), it will
properly return to the lwp’s function with the stack and registers arranged as it will expect. This
involves making the stack look as if the thread called swap_rfiles() and was suspended.
How to do this? Figure out where you want to end up, then work backwards through the endgame of
swap_rfiles() to figure out what you need it to look like when it’s loaded.
You know that the end of swap_rfiles() (and every function) is:
leave
ret
and ret means pop the instruction pointer, so the whole thing becomes:
movq %rbp, %rsp ; copy base pointer to stack pointer
popq %rbp ; pop the stack into the base pointer
popq %rip ; pop the stack into the instruction pointer
Consider that what you’re doing, really, is creating a stack frame for swap_rfiles() to tear down—in
lieu of the one it created on the way in, on a different stack—and creating the caller’s half of lwpfun’s
stack frame since nobody actually calls it. (c) admit() the new thread to the scheduler.
(a) Transform the calling thread—the original system thread—into a LWP. Do this by creating a context
for it and admit()ing it to the scheduler, but don’t allocate a stack for it. Use the stack it already has.
Make sure not to deallocate this later (leave it NULL in the context or flag it some other way).
(c) The idea here is that once the original system thread calls lwp_start() it is transformed into just
another thread (other than that you shouldn’t free its stack). From here on out, the system continues
until there are no more runnable threads.
Remember, what you are trying to do is to build a context so that when lwp_yield() selects it, loads its
registers, and returns, it starts executing the thread’s very first instruction with the stack pointer pointing to a
stack that looks like it had just been called. If the arguments fit into registers (and they will in this case), this will
simply be:
But what is this return address? It’s supposed to be the place where the thread function should go “back” to
after it’s done, but it didn’t come from anywhere. You could use lwp_exit(). That way either it calls
lwp_exit() or it returns there, but one way or the other when it’s done, lwp_exit() will be called.
Note: What is this “original TOS”? This is the alleged past of this thread. Of course, it doesn’t have a past, so it
doesn’t exist. This thread came from nowhere.
• If you want to find out what your compiler is really doing, use the gcc -S switch to dump the
assembly output.
gcc -S foo.c will produce foo.s containing all the assembly.
• Remember that stacks start in high memory and grow towards low memory. You can find the high end
of your stack region through the magic of arithmetic.
• Also remember that pointer arithmetic is done in terms of the size of the thing pointed-to.
• I defined the stack member of the context structure to be an unsigned long * to make it easy to
treat the stack as an array of unsigned longs and index it accordingly.
• Despite the fact that it is possible to load and save contexts independently, don’t do it. The compiler
feels free—rightly—to move the stack pointer to allocate or deallocate local storage on the stack. If you
save your context in one place and load it in another, your thread will go through a time warp and
saved data may be corrupted. Use swap_rfiles to perform an atomic context switch.
• Finally, remember that there doesn’t have to be a next thread. If sched->next() returns NULL,
lwp_yield() will exit as described above.
● Building a library.
To build an archive, the program to do so is ar(1). The r flag means “replace” to insert new files into
the archive:
% ar r libstuff.a obj1.o obj2.o ...objn.o
Supplied Code
There are several pieces of supplied code along with this project, all available on the CSL machines in
~pn-cs453/Given/Asgn28
Note: When linking with libsnakes.a it is also necessary to link with the standard library ncurses using
-lncurses on the link line. Ncurses is a library that supports text terminal manipulation.
Assignment
Turn in this assignment on Canvas. Name of the submitted file must be project2_submission.tar
What to turn in
1. Your source files (.c, .h, etc.)
Your header file, lwp.h, suitable for inclusion with other programs. This must be compatabile with the
distributed one, but you may extend it
1. A make file (called Makefile) that will build liblwp.a on unix[1-4] from your source when
invoked with no target or with the appropriate target (liblwp.a). The makefile must also remove all binary
and object files when invoked with the “clean” target. Refer to the example makefile if you need more
guidance on this. [a sample Makefile is provided on Canvas.]
8 Choose a directory and move there. Use cp -r ~pn-cs453/Given/Asgn2 . to copy all the files over.
Files shared on Canvas:
-Makefile
-libPLN.a
-libsnakes.a
Sample runs
You can run the demos for the project yourself. The demos are found here:
~pn-cs453/Given/Asgn2/demos
Copy those files into your directory of choice.
LD_LIBRARY_PATH=../lib64/
export LD_LIBRARY_PATH
make nums
./nums
make snakes
./snakes
make hungry
./hungry