Ceng204 w8 Systems Programming2024 Spring
Ceng204 w8 Systems Programming2024 Spring
3
Concurrency and Threads
There are two drawbacks to using processes for some
concurrent systems:
1. A new process, own resources, can be an expensive.
2. Once you have separate processes running, getting them to
cooperate/communicate becomes complicated. There are
ways like via the file system or through pipes (which look like
files to your program but have special OS support).
Processes can also send signals to each other. But even with
fairly efficient communication methods, the context switch
from one process to another (and the copying of data from
one address space to another), can be quite expensive.
4
Concurrency and Threads
The desire for parallel execution and low cost communication
has lead to several models for concurrent programming,
including threads, sometimes called lightweight
threads or lightweight processes.
The principle idea behind threads is shared memory
multiprocessing, that is, each part of the concurrent system will
share the same memory/address space, thus, theoretically,
reducing communication cost. (Another well-known model
based on message passing is called communicating sequential
processes (CSP).)
5
Concurrency and Threads
Threads are a programming abstraction that is designed to
allow a programmer to control concurrency and asynchrony
within a program.
In some programming languages, like Java, threads are "first-
class citizens" in that they are part of the language definition
itself.
For others, like C and C++, threads are implemented as a
library that can be called from a program but otherwise are not
considered part of the language specification.
6
Concurrency and Threads
Some of the differences between having threads "in the
language" and threads "as a library" are often subtle.
For example, a C compiler need not take into account thread
control while a Java compiler must.
However one obvious difference is that in the library case, it is
possible to use different thread libraries with the same
language.
Linux supports POSIX threads, with all the attendent library
functions and system calls.
7
Concurrency and Threads
Abstractly, for our purposes, a thread is three things:
• a sequential list of instructions that will be executed
• a set of local variables that "belong" to the thread (thread
private)
• a set of shared global variables that all threads can read and
write
It is no accident that this definition corresponds roughly to the C
language sequential execution model and variable scoping
rules.
Operating systems are still, for the most part, written in C and
thus thread libraries for C are easiest to understand and
implement when they conform to C language semantics.
8
Concurrency and Threads
Threads versus Processes
Compiled program becomes a "process" when you run it.
A C program also defines a sequential list of instructions, local
variables, and global variables so you might be asking "What is
the difference between a thread and a process?"
The answer is "not much" as long as there is only one thread.
However, as discussed previously, threads are an abstraction
designed to manage concurrency which means it is possible to
have multiple threads "running" at the same time.
9
Concurrency and Threads
Why threads?
There are many reasons to program with threads. In the
context of this class, there are two important ones:
1. They allow you to deal with asynchronous events
synchronously and efficiently.
2. They allow you to get parallel performance on a shared-
memory multiprocessor.
You'll find threads to be a big help in writing an operating
system.
10
Concurrency and Threads
C Code, No threads
To begin with, consider a simple program that computes the
average over a set of random numbers.
You might also think that you know what the answer will be
ahead of time. For example, if the random number generator
generates numbers on the interval (0,1) then you'd expect the
average to be 0.5.
How true is that statement?
Is it affected by the number of numbers in the set?
This example can also be used to investigate these kinds of
questions but mostly it is designed to introduce the way in which
pthreads and C interact.
11
Concurrency and Threads
C Code, No threads
The basic program generates an array that it fills with random
numbers from the interval (0,1).
It then sums the values in the array and divides by the number
of values (which is passed as an argument from the command
line).
12
Concurrency and Threads
C Code, No threads
/*usage: avg-nothread count*/
#include < unistd.h >
#include < stdlib.h >
#include < stdio.h >
char *Usage = "usage: avg-nothread count";
#define RAND() (drand48()) /* basic Linux random number generator */
int main(int argc, char **argv){
int i, count;
double *data, sum;
count = atoi(argv[1]); /* count is first argument */
/** make an array large enough to hold #count# doubles*/
data = (double *)malloc(count * sizeof(double));
/** pick a bunch of random numbers*/
for(i=0; i < count; i++) { data[i] = RAND(); }
sum = 0;
for(i=0; i < count; i++) { sum += data[i]; }
printf("the average over %d random numbers on (0,1) is %f\n",count, sum/(double)count);
return(0);
}
13
Concurrency and Threads
C Code, No threads
First, it uses the utility function malloc() to allocate dynamically a
one-dimensional array to hold the list of random numbers.
It also casts the variable count to a double in the print statement
since the variable sum is a double.
Now is a good time to take a moment to make sure that you understand each
line of the program shown above -- each line. If there is something you don't
recognize or understand in this code you will want to speak with the instructor or
the TAs about brushing up on your C programming skills. This code is about as
simple as any C program will be that you will encounter in this class. If it isn't
completely clear to you it will be important to try and brush up because the
assignments will depend a working knowledge of C.
14
Concurrency and Threads
Computing the Average using One Thread
15
Concurrency and Threads
Computing the Average using One Thread
16
Concurrency and Threads
Computing the Average using One Thread
17
Concurrency and Threads
Computing the Average using One Thread
18
Concurrency and Threads
Computing the Average using One Thread
In pseudocode form, the logic is as follows.
The main() function does:
• allocate memory for thread arguments
• fill in thread arguments (marshal the arguments)
• spawn the thread
• wait for the thread to complete and get the result (the sum in
this example) computed by the thread
• print out the average
and the thread executes:
• unmarshal the arguments
• compute and marshal the sum so it can be returned
• return the sum and exit
19
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
The first thing to notice is that the code that computes the sum is
performed in a separate C function called SumThread().
The pthreads standard specifies that threads begin on function
boundaries.
That is, each thread starts with some function call which, in this
example, is SumThread().
This "first" function can call other functions, but it defines the
"body" of the thread.
We'll call this first function the "entry point" for the thread.
20
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
The second thing to notice are the types in the prototype for the
thread entry point:
void *SumThread(void *arg)
The standard as implemented for Linux specifies that
• the entry point funtion take one argument that is of type (void *)
• the entry point function return a single argument that if of
type (void *)
21
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
In C, a (void *) pointer is that can legally point to any data type.
The key is that your program can make the decision about what it
points to at run time.
You can use a (void *) pointer to point to any type that is
supported by C (e.g. int, double, char, etc.) but it is most useful
when it is used to point to a structure.
In a pthreads program, the assumption that the API designers
make is that you will define your own data type for the input
parameters to a thread and also one for the return values.
This way you can pass whatever arguments you like to your
threads and have them return arbitrary values.
22
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
In this example, we define two structures
struct arg_struct
{
int size;
double *data;
};
struct result_struct
{
23
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
In this example, we define two structures
The argument structure allows the code that spawns the thread to
pass it two arguments: the size of the array of values and a
pointer to the array. The thread passes back a single value: the
sum.
24
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
Notice that the thread entry point function converts its one
argument to a pointer to the argument data structure. The data
type for my_args is
struct arg_struct *my_args;
and the body of the thread assigns the arg pointer passed as an
argument to my_args via a C language cast.
my_args = (struct arg_struct *)arg;
25
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
You can think of the thread as receiving a message with its initial
arguments in it as its only parameter.
This message comes in a generic "package" with the type (void
*) and it is the thread's job to unpack the message into a structure
that it understands.
This process is called "unmarshaling" which refers to the process
of translating a set of data types from a generic transport form to
one that can be processed locally.
Thus the line shown above in which the (void *) is cast to a (struct
arg_struct *) is the thread "unmarshaling" its arguments.
26
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
When the thread has finished, it needs a data structure to pass
back. The code calls malloc() to allocate the memory necessary
to transmit the results once the thread has completed:
result = (struct result_struct *)malloc(sizeof(struct result_struct));
and when the sum is computed, the thread loads the sum into the
result structure:
result->sum = my_sum;
The marshaling into a (void *) of the (struct result_struct *) takes
place directly in the return call
return((void *)result);
27
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
From the my_args variable, the thread can then access the size value and
the data pointer that points to the array of numbers.
Notice that the very next thing the thread does is to call free() on
the arg pointer.
Good practice is to free memory as soon as you know the memory is no longer
needed.
In this example, the code that creates this thread in the main() function
calls malloc() to allocate the memory that is needed to hold the argument
structure.
28
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
Notice that the thread has called malloc() to create a result
variable to pass back the sum.
It must be the case that the main() thread calls free() on the
result structure it gets back from the thread.
Look for the free() call in the main() routine to see where this
takes place.
29
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
The main() function creates an argument structure, spawns the
thread, waits for it to complete, and uses the result that the
thread passes back to print the average.
Creating and marshaling the arguments for the thread:
args = (struct arg_struct *)malloc(sizeof(struct arg_struct));
args->size = count;
args->data = data;
30
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
The thread that computes the sum is created by
the pthread_create() call in the main() function.
err = pthread_create(&thread_id, NULL, SumThread, (void
*)args);
31
Concurrency and Threads
Computing the Average using One Thread
Reading through the Code
The pthread_create() call takes four arguments and returns a
single result. The arguments are:
1. a pointer to a variable of type pthread_t (as an out parameter)
2. a pointer to a structure indicating how to schedule the thread
(NULL means use the default scheduler)
3. the name of the entry point function
4. a single pointer to the arguments as a (void *)
The return value is an error code, with zero indicating success. If
the return value is zero, the variable pointed to by the first
argument will contain the thread identifier necessary to interact
with the thread (see pthread_join() below).
32
Concurrency and Threads
Computing the Average using One Thread
When does the thread run?
Logically, the thread begins executing as soon as the call
to pthread_create() completes.
However it is up to the implementation as to when the thread is
actually scheduled.
For example, some implementations will allow the spawning
thread to continue executing "for a while" before the spawned
threads begin running.
However, from a logical perspective, the newly created thread and
the thread that created it are running "in parallel."
33
Concurrency and Threads
Computing the Average using One Thread
When does the thread run?
Notice also that the main() function is acting like a thread even
though it wasn't spawned via pthread_create().
The logical abstraction is that Linux "spawned" this first thread
for you. It is a little different since it takes two arguments, but for
thread scheduling purposes, it behaves like a thread otherwise.
We'll call this thread "the main thread" from now on to indicate
that it is the "first" thread that gets created when the program
begins to run.
Thus the main thread in this example spawns a single thread to
compute the sum and waits for this thread to complete before
proceeding.
34
Concurrency and Threads
Computing the Average using One Thread
Waiting for the result
After the main thread spawns the thread to compute the sum, it
immediately calls
err = pthread_join(thread_id,(void **)&result);
The first argument to this call is the identifier (filled in by the call
to pthread_create() when the thread was created).
The second argument is an out parameter of type (void **).
That is, pthread_join() takes a pointer to a (void *) so that it can
return the (void *) pointer passed back from the thread on exit.
35
Concurrency and Threads
Computing the Average using One Thread
Waiting for the result
This "pointer to a pointer" parameter passing method often
confuses those new to pthreads.
The function pthread_join() needs a way to pass back a (void
*) pointer and it can't use the return value.
In C, the way that a function passes back a pointer through an
out parameter is to take a pointer to that kind of pointer as a
parameter.
Notice that the type of result is (struct result_struct *).
The using & operator, the parameter passed is the address of
result (which is a pointer) and that "pointer to a pointer" is cast as
a (void **).
36
Concurrency and Threads
Computing the Average using One Thread
Waiting for the result
Like with pthread_create(), pthread_join() returns an integer
which is zero on success and non-zero when an error occurs.
The output:
./avg-1thread 100000
main thread forking sum thread
main thread running after sum thread created, about to call join
sum thread running
sum thread done, returning
main thread joined with sum thread
the average over 100000 random numbers on (0,1) is 0.499644
37
Concurrency and Threads
Computing the Average using One Thread
Waiting for the result
Notice that the main thread continues to run after the Sum thread
is spawned.
Then it blocks in pthread_join() waiting for the Sum thread to
finish.
Then the sum thread runs and finishes.
When it exits, the call to pthread_join() unblocks and the main
thread completes.
38
Concurrency and Threads
Computing the sum in parallel
The previous example is a little contrived in that there is no real
advantage (and probably a very small performance penalty) in
spawning a single thread to compute the sum.
That is, the first non-threaded example does exactly what the
single threaded example does only without the extra work for
marshaling and unmarshaling the arguments and spawning and
joining.
You might ask, then, "why use threads at all?"
39
Concurrency and Threads
Computing the sum in parallel
The answer is that it is possible to compute some things in
parallel using threads.
In this example, we can modify the sum thread so that it works
on a subregion of the array.
The main thread can spawn multiple subregions (which are
computed in parallel) and then sum the sums that come back to
get the full sum.
The following example code does this parallel computation of the
sums.
40
Concurrency and Threads
Computing the sum in parallel
41
Concurrency and Threads
Computing the sum in parallel
42
Concurrency and Threads
Computing the sum in parallel
43
Concurrency and Threads
Computing the sum in parallel
44
Concurrency and Threads
Computing the sum in parallel
In this example, each thread is given a starting index into the array and
a range it needs to cover. It sums the values in that range and returns
the partial sum in the result structure.
The main thread spawns each thread one at a time in a loop, giving
each its own argument structure. Notice that the argument structure is
filled in with the starting index where that thread is supposed to start
and the range of values it needs to cover.
Aftre the main thread spawns all of the sum threads, it goes into
another loop and joins with them, one-at-a-time, in the order that they
were spawned.
Here is a sample output from this multi-threaded program:
45
Concurrency and Threads
Computing the sum in parallel
./avg-manythread 100000 5
46
Concurrency and Threads
Computing the sum in parallel
Notice that the main thread starts 5 threads.
Then it completes the creation of 3 threads.
It calls the create for the 4th thread, but before it prints the "created"
message, sum thread 1 starts running.
Then the message saying that 4 was created prints.
Then sum thread 2 and 3 start, and then the main thread creates thread
5.
This order of execution is not guaranteed.
In fact, there are many legal orderings of thread execution that are
possible.
All that matters is that the main thread not try and access the result
from a thread until after it has joined with that thread.
47
Concurrency and Threads
Computing the sum in parallel
All of the threads are independent and they can run in any order once
they are created.
They interleave their execution with each other and the main thread (or
execute in parallel if multiple cores are available).
However a call to pthread_join() ensures that the calling thread will not
proceed until the thread being joined with completes.
It is in this way that the main thread "knows" when each thread has
completed computing its partial sum and successfully returned it.
48
Concurrency and Threads
Computing the sum in parallel
You might wonder "What happens if a sum thread completes before the
main thread calls pthread_join()?" In fact, that occurs in this sample
execution.
Thread 1 returns before the main thread calls pthread_join() on thread
1. The semantics of pthread_join() are that it will immediately unblock if
the thread being joined with has already exited.
Thus pthread_join()
• blocks if the thread being joined with hasn't yet exited
• unblocks immediately if the thread being joined with has already
exited
Either way, the thread calling pthread_join() is guaranteed that the
thread it is joining with has exited and, thus, any work that thread was
doing must have been completed.
49
Concurrency and Threads
Synchronization
The functionality of pthread_join() illustrates an important operating
systems concept: synchronization.
The term synchronization literally means "at the same time." However, in
a computer science context it means
"the state of a concurrent program is consistent across two or more
concurrent events."
50
Concurrency and Threads
Synchronization
In this example, the main thread "synchronizes" with each of the sum
threads using the pthread_join() call. After each call
to pthread_join() you, the programmer, know the state of two threads:
• the main thread, which has received the partial sum from the thread
whose id is contained in the variable thread_ids[t]
• the sum thread whose id is contained in the
variable thread_ids[t] and this state is that the thread has exited after
successfully computing its partial sum.
Thus the main thread and one of the sum threads "synchronize" before
the main thread tries to use the partial sum computed by the sum
thread.
Synchronization is an important concept when concurrent and/or
asynchronous events must be managed.
51
Concurrency and Threads
How much faster is it?
You can experiment with these last two example pieces of code to see
how much of an improvement threading and parallelism make in terms
of the performance of the code. :
time ./avg-1thread 100000000
the average over 100000000 random numbers on (0,1) is 0.500023
real0m1.620s
user 0m1.419s
sys 0m0.195s
time ./avg-manythread 100000000 10
the average over 100000000 random numbers on (0,1) is 0.500023
real0m1.437s
user 0m1.501s
sys 0m0.207s
That's right -- using 10 threads only speeds it up with 0.2 seconds. Can
you figure out why?
52
References
• https://cs.wellesley.edu/~cs249/
• https://sites.cs.ucsb.edu/~rich/class/cs170/notes/IntroThreads/
53