0% found this document useful (0 votes)
133 views24 pages

Threads in C: David Chisnall

This document discusses threads and parallel programming in C. It describes the basic C memory model and thread APIs. It explains how to create threads using pthread_create and join threads using pthread_join. It then gives an example of parallelizing quicksort by splitting the recursive calls across multiple threads. Performance is tested by measuring wall clock time speedup from the parallel version.

Uploaded by

Simo Bouna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views24 pages

Threads in C: David Chisnall

This document discusses threads and parallel programming in C. It describes the basic C memory model and thread APIs. It explains how to create threads using pthread_create and join threads using pthread_join. It then gives an example of parallelizing quicksort by splitting the recursive calls across multiple threads. Performance is tested by measuring wall clock time speedup from the parallel version.

Uploaded by

Simo Bouna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Threads in C

David Chisnall

February 25, 2011


Revision Thread Portability Creating and Joining Threads Parallel Quicksort

The Basic C Model

One computer
One process
One thread
One stack
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Multithreaded Memory Layout (Again)


0xffffffff 0xffffffff
stack 1
stack 2
0xf0000000

Heap Memory

0x00140000
Library Code
0x00100000

0x000fe000
Static Data
0x0000f000
Program Code
0x00000000 0x00001000
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Thread APIs

Threads are not part of C99


They are part of C1X, but there are currently no
implementations of C1X
On UNIX-like platforms, the POSIX Thread APIs are portable
Other platforms there are different APIs
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Creating Threads


pthread_t thread ;
pthread_create (& thread , NULL , start_function ,
arg ) ;
 

Creates a new stack


Registers it for scheduling
Sets instruction pointer in new thread to start_function
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Combining Threads


void * ret ;
pthread_join ( thread , & ret ) ;
 

Waits for thread to finish


Sets ret to the return value from the thread start function
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Aside: Function Pointers


void example ( void ) { printf ( " Stuff goes here \ n " )
; }
void user ( void )
{
// Call example
example () ;
// Store a pointer to example :
void (* funcPtr ) ( void ) = example ;
// Call example via the function pointer
funcPtr () ;
}
 

Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Thread Overhead

Creating the stack


Copying all of the thread-local variables
Context switching if number of threads exceeds number of
cores
Synchronisation costs (including implicit cache coherency
cost)

Sometimes adding threads makes things slower!


Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Example: Parallel Quicksort

Recursive sorting algorithm


Perform sub-sorts in a new thread
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Revision: Quicksort

1. Pick pivot point


2. Split array into values lower than pivot and values greater
than pivot
3. Recursively sort each sub-array.
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Serial Quicksort


void quicksort ( int * array , int left , int right )
{
if ( right > left )
{
int pivotIndex = left + ( right - left ) /2;
pivotIndex = partition ( array , left , right ,
pivotIndex ) ;
quicksort ( array , left , pivotIndex -1) ;
quicksort ( array , pivotIndex +1 , right ) ;
}
}
 

Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Making it Parallel

Partition cant (easily) be done in parallel


Recursive calls can
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Recursive Quicksort

void quicksort ( int * array , int left , int right )
{
if ( right > left )
{
int pivotIndex = left + ( right - left ) /2;
pivotIndex = partition ( array , left , right ,
pivotIndex ) ;
struct qsort_starter arg = { array , left ,
pivotIndex -2};
pthread_t thread ;
// Create a new thread for one subsort
pthread_create (& thread , 0 , qsthread , & arg ) ;
quicksort ( array , pivotIndex +1 , right ) ;
// Wait for both to finish
pthread_join ( thread , NULL ) ;
}
}
 

Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Why The Structure?



struct qsort_starter
{
int * array ;
int left ;
int right ;
};
void * qsthreadthread ( void * init )
{
struct qsort_starter * start = init ;
quicksort ( start - > array , start - > left , start - >
right ) ;
return NULL ;
}
 

Thread function only takes one (void*) argument
If we want to pass more than one, we must pass a struct
pointer
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Is this Safe?


struct qsort_starter arg = { array , left ,
pivotIndex -2};
pthread_t thread ;
// Create a new thread for one subsort
pthread_create (& thread , 0 , qsthread , & arg ) ;
 

Usually not - passing stack allocation to another thread


Safe here because of the pthread_join() call.
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Testing Speedup

$ time ./a.out
real 0m30.792s
user 0m30.552s
sys 0m0.222s

real (a.k.a wall clock time) - elapsed time from start to


finish
user CPU time scheduled to the process
sys CPU time scheduled to the kernel to handle system
calls for the process
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Multithreading Performance

35
Wall Clock Time
CPU Time
30

25

20
Seconds

15

10

0
0 2 4 6 8 10 12
Depth
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Speedup

T1
Sp =
Tp
T1 time for sequential algorithm
Tp time for parallel algorithm with p processors
Sp speedup with p processors

Wall clock time, not CPU time!


Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Efficiency

Sp
Ep =
p

More useful metric


Can compare Ep values independently of p
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Parallel Quicksort Speedup

Threads p Tp Sp Ep
1 1 30.792 1 1
2 2 24.044 1.28 0.64
4 2 23.780 1.29 0.65
8 2 24.548 1.25 0.63
16 2 18.784 1.63 0.82

Linear speedup: Sp = p, Ep = 1
Superlinear speedup (very rare!) Ep > 1
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Why Sublinear?

The partition step is serial


Uneven workload distribution between threads (e.g. one
thread sorting 998 elements, one sorting 2)
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Amdahls Law

Maximum speedup of a program is limited by the sequential


portion
For quicksort, this is the partition step
Absolute best case, with infinite processors, is O(n) for
quicksort
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Parallel Quicksort is Easy

No data sharing between threads


No communication between threads, except on exit
Concurrent parts could be in separate processes
Threads just eliminate some copying overhead
Revision Thread Portability Creating and Joining Threads Parallel Quicksort

Questions?

You might also like