HPC Unit 5 B
HPC Unit 5 B
Textbook: Hager & Wellein, Introduction to High Performance Computing for Scientists and
Engineers
Objectives of Chapter 2
int flag = 0;
for (i=0; i<N; i++) {
if ( some_function(A[i]) < threshold_value )
flag = 1;
}
int flag = 0;
for (i=0; i<N; i++) {
if ( some_function(A[i]) < threshold_value ) {
flag = 1;
break;
}
}
Do less work; example 2
How many times is the k-indexed loop executed? And how many
times for the j-indexed loop?
Do less work; example 2 (cont’d)
Improvement:
t = 0.;
for (j=0; j<ARRAY_SIZE; j++)
t = t + b[j]*d[j];
tmp = s + r*sin(x);
for (i=0; i<N; i++)
A[i] = A[i] + tmp;
Avoid expensive operations!
Special math functions (such as trigonometric, exponential and
logarithmic functions) are usually very costly to compute.
An example from simulating non-equilibrium spins:
double tanh_table[13];
for (i=0; i<=12; i+=2)
tanh_table[i] = 0.5*(1.0+tanh((i-6)/tt));
a[0] = b[1]-b[0];
for (i=1; i<n-1; i++)
a[i] = b[i+1]-b[i-1];
a[n-1] = b[n-1]-b[n-2];
Yet anothe example of avoiding branches
if (j>0)
for (i=0; i<n; i++)
x[i] = x[i] + 1;
else
for (i=0; i<n; i++)
x[i] = 0;
Using SIMD instructions
Example:
Knowing how much time is spent where is the first step. But what
is the actual reason for “a slow code” or by which resource is the
performance limited?
Modern processors feature a small number of performance
counters, which are special on-chip registers that get incremented
each time a certain event occurs.
Possible events that can be monitored: