Course: Parallel Processing Lab #2 - Multithreads and Openmp
Course: Parallel Processing Lab #2 - Multithreads and Openmp
Thin Nguyen
Goal: This lab helps student to revise knowledge about multitheads and know how to use OpenMP.
1
CONTENTS CONTENTS
Contents
1 Multithreads 5
1.1 POSIX Threads - Linux 5
1.2 Examples 5
3. Exercises 14
Page 2 of 15
1.4 Set up Virtual Machine on your laptop [Optional] 2 REVIEW
1 INTRODUCTION
1 Multithreads
1.1 POSIX Threads - Linux
What is Pthreads?
Historically, hardware vendors have implemented their own proprietary versions of threads. These
imple- mentations differed substantially from each other making it difficult for programmers to
develop portable threaded applications. The POSIX standard has continued to evolve and undergo
revisions, including the Pthreads specification.
Pthreads are defined as a set of C language programming types and procedure calls, implemented
with a pthread.h header/include file and a thread library - though this library may be part of another
library, such as libc, in some implementations.
All threads have access to the same global, shared memory. Threads also have their own private data.
Programmers are responsible for synchronizing access (protecting) globally shared data.
1.2 Examples
Compiling Threaded Programs: several examples of compile commands used for pthreads codes
are listed in the table below.
Page 5 of 15
2.2 Examples 2 REVIEW
This simple example code creates 10 threads with the pthread create() routine. Each thread prints a ”Hello
World!” message, and then terminates with a call to pthread exit().
#include <pthread.h>
#include <stdio.h>
#define NUM_THREADS 10
// user-defined functions
void * user_def_func(void *threadID){
long TID;
TID = (long) threadID;
printf("Hello World! from thread #%ld\n", TID);
pthread_exit(NULL);
}
// free thread
pthread_exit(NULL);
return 0;
}
...
/* Thread Argument Passing */
// case-study 1
long taskids[NUM_THREADS];
// case-study 2
...
Page 6 of 15
2.2 Examples 2 REVIEW
...
}
Demonstrates how to explicitly create pthreads in a joinable state for portability purposes. Also shows
how to use the pthread exit status parameter.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
// Define CONSTANTS
#define NUM_THREADS 4
#define NUM_LOOPS 1000000
// user-defined function
void *user_def_func(void *threadID){
long TID;
TID = (long) threadID;
int i;
double result = 0.0;
printf("Thread %ld starting...\n", TID);
for(i = 0; i < NUM_LOOPS; i++){
result = result + sin(i) * tan(i);
}
This example uses a mutex variable to protect the global sum while each thread updates it. Race
condition is an important problem in parallel programming.
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
Page 8 of 15
2.2 Examples 2 REVIEW
pthread_mutex_destroy(&mutexsum);
pthread_exit(NULL);
}
Page 9 of 15
2 MULTITHREAD PROGRAMMING WITH OPENMP
• An Application Program Interface (API) that may be used to explicitly direct multithreaded,
shared memory parallelism.
• Comprised of three primary API components:
– Compiler Directives
– Runtime Library Routines
– Environment Variables
Goals of OpenMP
• Standardization
• Lean and Mean
• Ease of Use
• Portability
Shared Memory Model OpenMP is designed for multi-processor/core, shared memory machines.
The underlying architecture can be shared memory UMA or NUMA.
2.2 Examples
Compiling Threaded Programs: several examples of compile commands used for pthreads codes
are listed in the table below.
Page 10 of 15
Example1: Simple ”Hello World” program. Every thread executes all code enclosed in the parallel region.
OpenMP library routines are used to obtain thread identifiers and total number of threads.
#include <omp.h>
/* Fork a team of threads with each thread having a private tid variable */
#pragma omp parallel private(tid)
{
Page 11 of 15
3.2 Examples 3 MULTITHREAD PROGRAMMING WITH OPENMP
Example 2: Work-Sharing Constructs - DO / for Directive. The DO / for directive specifies that the
iterations of the loop immediately following it must be executed in parallel by the team. This assumes a
parallel region has already been initiated, otherwise it executes in serial on a single processor
#include <omp.h>
/* Global variables */
int main( int argc, char **argv){
int i, chunk;
float a[N], b[N], c[N];
/* Some initializations */
for(i = 0; i < N; i++){
a[i] = b[i] = i * 1.0; // values = i with float type
}
chunk = CHUNKSIZE;
#pragma omp parallel shared(a,b,c,chunk) private(i)
{
omp_set_num_threads(OMP_NUM_THREADS);
#pragma omp for schedule(dynamic,chunk) nowait
for(i = 0; i < N; i++){
int tid = omp_get_thread_num();
printf("Iter %d running from thread %d\n", i, tid);
c[i] = a[i] + b[i];
}
}
Page 11 of 15
3.2 Examples 3 MULTITHREAD PROGRAMMING WITH OPENMP
/* Validation */
printf("Vector c: \n");
for(i = 0; i < 10; i++){
printf("%f ", c[i]);
}
printf("...\n");
/* Statistic */
// printf("Num of iter with thread:\n");
// for(i = 0; i < MAX_THREADS; i++){
// if(count[i] != 0)
// printf("\tThread %d run %d iter.\n", i, count[i]);
// }
return 0;
}
/* Global variables */
int count[MAX_THREADS];
int main( int argc, char **argv){
int i, chunk;
float a[N], b[N], c[N], d[N];
/* Some initializations */
for(i = 0; i < N; i++){
a[i] = i * 1.0;
b[i] = i + 2.0;
}
for(i = 0; i < OMP_NUM_THREADS; i++){
count[i] = 0;
}
chunk = CHUNKSIZE;
#pragma omp parallel shared(a,b,c,d) private(i)
{
omp_set_num_threads(OMP_NUM_THREADS);
#pragma omp sections nowait
{
#pragma omp section
for(i = 0; i < N; i++){
int tid_s1 = omp_get_thread_num();
printf("\tIter %d running from thread %d\n", i, tid_s1);
c[i] = a[i] + b[i];
// Increase count
Page 12 of 15
3.2 Examples 3 MULTITHREAD PROGRAMMING WITH OPENMP
count[tid_s1]++;
}
/* Statistic */
printf("Num of iter with thread:\n");
for(i = 0; i < MAX_THREADS; i++){
i f (count[i] != 0)
printf("\tThread %d run %d iter.\n", i, count[i]);
}
return 0;
}
#include <omp.h>
/* Define some values */
#define N 1000
#define CHUNKSIZE 10
#define MAX_THREADS 48
#define NUM_THREADS 4
/* Global variables */
int count[MAX_THREADS];
int a, b, i, tid;
float x;
#pragma omp threadprivate(a, x)
int main( int argc, char **argv){
/* Explicitly turn off dynamic threads */
omp_set_dynamic(0);
omp_set_num_threads(NUM_THREADS);
Page 13 of 15
4 EXERCISES
printf("************************************ \n");
printf("Master thread doing serial work here\n");
printf("************************************ \n");
printf("2nd Parallel Region:\n");
#pragma omp parallel private(tid)
{
tid = omp_get_thread_num();
printf("Thread %d: a, b, x = %d, %d, %f\n", tid, a, b, x);
}
return 0;
}
3. Exercises
1. Matrix multiplication with Pthread: implement a parallel version for the given source code with
POSIX Thread. Student need to complete //TODO part in the source code. After you finish, lets run
the program with matrix sizes: 10, 100, 1000, 10000, 20000 (at least 10000) ... and record th e
execution time with the command:
// For example:
$ time ./mul_mat_pthread_output 1000 1
Finally, you plot a graph of performance between Serial Version (already provided in graph.py) and
Pthread Version, as Figure 5. In the source code, if you want to modify some variables or data type, it
is ok. Note: you can plot the graph on your machine by python. Please search Google to setup Python
and plot the graph (Matplotlib library is recommended).
Page 14 of 15
5 SUBMISSION
2a + b + 2c + 5d = 24
a + 3b + c + 4d = 15
2a + b + 4c + 7d = 28
5a + 4b + 7c + 3d = -21
One solution is the so-called matrix decomposition. In many cases these problems lead to a
symmetric (and positive definite) matrix, which can be efficiently decomposed with the Cholesky
Decomposition algorithm. Note: The parallel versions of this Cholesky implementation do not scale
very well with the number of CPUs.
Note: Student just need to change the given source code provided. All of exercises have the .py file to
plot the graph for evaluating performance among scales and problem sizes, so you need to record the
results and plot the graph. The number of threads as well as problem sizes is declared in .py files.
Page 15 of 15