High Performance Computing
High Performance Computing
Platforms
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
• Thread Basics
P M P
#include <pthread.h>
int pthread_create (
pthread_t *thread_handle,
const pthread_attr_t *attribute,
void * (*thread_function)(void *),
void *arg);
int pthread_join (
pthread_t thread,
void **ptr);
#include <pthread.h>
#include <stdlib.h>
main() {
...
pthread_t p_threads[MAX_THREADS];
pthread_attr_t attr;
pthread_attr_init (&attr);
for (i=0; i< num_threads; i++) {
hits[i] = i;
pthread_create(&p_threads[i], &attr, compute_pi,
(void *) &hits[i]);
}
for (i=0; i< num_threads; i++) {
pthread_join(p_threads[i], NULL);
total_hits += hits[i];
}
...
}
Thread Basics: Creation and Termination (Example)
hit_pointer = (int *) s;
seed = *hit_pointer;
local_hits = 0;
for (i = 0; i < sample_points_per_thread; i++) {
rand_no_x =(double)(rand_r(&seed))/(double)((2<<14)-1);
rand_no_y =(double)(rand_r(&seed))/(double)((2<<14)-1);
if (((rand_no_x - 0.5) * (rand_no_x - 0.5) +
(rand_no_y - 0.5) * (rand_no_y - 0.5)) < 0.25)
local_hits ++;
seed *= i;
}
*hit_pointer = local_hits;
pthread_exit(0);
}
Programming and Performance Notes
Time
"optimal"
"local"
"spaced_1"
"spaced_16"
5 "spaced_32"
0
0 1 2 3 4 5 6 7 8 9
• Consider:
int pthread_mutex_unlock (
pthread_mutex_t *mutex_lock);
int pthread_mutex_init (
pthread_mutex_t *mutex_lock,
const pthread_mutexattr_t *lock_attr);
Mutual Exclusion
• The producer thread must not overwrite the shared buffer when
the previous task has not been picked up by a consumer
thread.
pthread_mutex_t task_queue_lock;
int task_available;
...
main() {
....
task_available = 0;
pthread_mutex_init(&task_queue_lock, NULL);
....
}
• The type of the mutex can be set in the attributes object before
it is passed at time of initialization.
Overheads of Locking
int pthread_mutex_trylock (
pthread_mutex_t *mutex_lock);
main() {
/* declarations and initializations */
task_available = 0;
pthread_init();
pthread_cond_init(&cond_queue_empty, NULL);
pthread_cond_init(&cond_queue_full, NULL);
pthread_mutex_init(&task_queue_cond_lock, NULL);
/* create and join producer and consumer threads */
}
Producer-Consumer Using Condition Variables
pthread_mutexattr_settype_np (
pthread_mutexattr_t *attr,
int type);
Here, type specifies the type of the mutex and can take one
of:
– PTHREAD MUTEX NORMAL NP
– PTHREAD MUTEX RECURSIVE NP
– PTHREAD MUTEX ERRORCHECK NP
Composite Synchronization Constructs
• A read lock is granted when there are other threads that may
already have read locks.
• If there is a write lock on the data (or if there are queued write
locks), the thread performs a condition wait.
typedef struct {
int readers;
int writer;
pthread_cond_t readers_proceed;
pthread_cond_t writer_proceed;
int pending_writers;
pthread_mutex_t read_write_lock;
} mylib_rwlock_t;
• If the count is less than the total number of threads, the threads
execute a condition wait.
• The last thread entering (and setting the count to the number
of threads) wakes up all the threads using a condition
broadcast.
Barriers
typedef struct {
pthread_mutex_t count_lock;
pthread_cond_t ok_to_proceed;
int count;
} mylib_barrier_t;
• Once both threads arrive, one of the two moves on, the other
one waits.
• This is also called a log barrier and its runtime grows as O(log p).
Barrier
50
Log Barrier (1000, 32 procs)
Linear Barrier (1000, 32 procs)
45
40
35
30
Time (seconds)
25
20
15
10
0
0 20 40 60 80 100 120 140
Number of threads
int a, b;
main() {
// serial segment
// parallel segment
} Corresponding Pthreads translation
/* ******************************************************
An OpenMP version of a threaded program to compute PI.
****************************************************** */
32
B B B
32
32 32
16 cols
C C C
/* mutual exclusion */
void omp_init_lock (omp_lock_t *lock);
void omp_destroy_lock (omp_lock_t *lock);
void omp_set_lock (omp_lock_t *lock);
void omp_unset_lock (omp_lock_t *lock);
int omp_test_lock (omp_lock_t *lock);