By Kelvin Chou
Programming Parallelism
Data Level Parallelism
Single
Multiple
Instruction Instruction
Single Data SISD
MISD
Multiple
Data
SIMD
MIMD
Single Instruction Multiple
Data
One Clock Cycle
Vectorized
Question from Fall 2012
What would we
want START set
to be?
Question from Fall 2012
We can see that
we want to handle
the fringe first!
What part does
the fringe take car
of for us?
Question from Fall 2012
Since we are
32 bit integers
Into a 128 bit
vector, can fit
4 integers into
one vector.
Question from Fall 2012
We would want
Start set to be:
(n % 4)
#define START (n % 4)
Question from Fall 2012
What would we
want J_INIT set
to be?
#define START (n % 4)
Question from Fall 2012
Notice what
we are loading
into j_on_steroids?
#define START (n % 4)
Question from Fall 2012
The j_on_steroids
is being added to
a vector of what
START is.
#define START (n % 4)
Question from Fall 2012
We see that
j += START. Where
else do we notice
that START is bein
used to calculate?
#define START (n % 4)
Question from Fall 2012
There is a
START + 1 in
the fringe case!
That is where we
will start.
Therefore,
J_INIT = 1
#define START (n % 4)
#define J_INIT 1
Question from Fall 2012
Next will be
STEROIDS_INIT.
We see that this
Is being added
to the set1 of
START.
#define START (n % 4)
#define J_INIT 1
Question from Fall 2012
Given that we wan
to start at the
START, we know
that for
Differentiation
Multiply by one
Greater than
Current.
#define START (n % 4)
#define J_INIT 1
Question from Fall 2012
Since initial is
Added to set1.
Then, we can
see that we want
each field
Initialized
Differently.
#define START (n % 4)
#define J_INIT 1
Question from Fall 2012
Example:
If START = 2.
Want the vector
Going into main
Loop to contain
{3, 4, 5, 6}
#define START (n % 4)
#define J_INIT 1
Question from Fall 2012
{3, 4, 5, 6}
Therefore
{2, 2, 2, 2} +
{a, b, c, d}?
{1, 2, 3, 4}
#define START (n % 4)
#define J_INIT 1
#define STEROIDS_INIT \
{1, 2, 3, 4}
Question from Fall 2012
Last thing we
Want to find
Is where to
END the loop?
#define START (n % 4)
#define J_INIT 1
#define STEROIDS_INIT \
{1, 2, 3, 4}
Question from Fall 2012
Keep in mind
The constraints
Of the problem.
What is the
Last index that we
Need to keep track
#define START (n % 4)
#define J_INIT 1
#define STEROIDS_INIT \
{1, 2, 3, 4}
Question from Fall 2012
In differentiation
the nth term falls
off!
So this means
END = (n-1)
Thread Level Parallelism
Each thread executes on
different data
Allows things to be run in
parallel
OpenMP is an example
Thread Level Parallelism
Question from Spring 2013
Suppose we have int *A that points to the head of an
array of length len. Assume we have n > 1 threads.
#pragma omp parallel for
for (int x = 0; x < len; x++){
*A = x;
A++;
}
Is this always incorrect, sometimes incorrect, always
correct?
Question from Spring 2013
#pragma omp parallel for
for (int x = 0; x < len; x++){
*A = x;
A++;
}
Is this always incorrect, sometimes incorrect,
always correct?
This is due to the fact that we will have data races to
see who can increment A correctly. It is possibly
correct if the stars align.
Question from Spring 2013
#pragma omp parallel
{
for (int x = 0; x < len; x++) {
*(A+x) = x;
}
}
Is this always incorrect, sometimes incorrect, always
correct?
Question from Spring 2013
#pragma omp parallel
{
for (int x = 0; x < len; x++) {
*(A+x) = x;
}
}
Is this always incorrect, sometimes incorrect, always
correct?
Now, is this faster or slower than serial?
Question from Spring 2013
#pragma omp parallel
{
for (int x = 0; x < len; x++) {
*(A+x) = x;
}
}
Is this always incorrect, sometimes incorrect, always correct?
Now, is this faster or slower than serial?
Slower, due to duplication of work. And false sharing.