0% found this document useful (0 votes)

36 views110 pages

Pthreads Mod

Uploaded by

Dr. V. Padmavathi Associate Professor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views110 pages

Pthreads Mod

Uploaded by

Dr. V. Padmavathi Associate Professor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 110

ECE1747 Parallel Programming

Shared Memory Multithreading

Pthreads
Shared Memory
• All threads access the same shared memory
data space.

Shared Memory Address Space

proc1 proc2 proc3 procN

Shared Memory (continued)
• Concretely, it means that a variable x, a
pointer p, or an array a[] refer to the same
object, no matter what processor the
reference originates from.
• We have more or less implicitly assumed
this to be the case in earlier examples.
Shared Memory

proc1 proc2 proc3 procN

Distributed Memory - Message Passing

The alternative model to shared memory.

a a a a
mem1 mem2 mem3 memN

proc1 proc2 proc3 procN

network
Shared Memory vs. Message Passing

• Same terminology is used in distinguishing

hardware.
• For us: distinguish programming models,
not hardware.
Programming vs. Hardware
• One can implement
– a shared memory programming model
– on shared or distributed memory hardware
– (also in software or in hardware)
• One can implement
– a message passing programming model
– on shared or distributed memory hardware
Portability of programming models

shared memory message passing

programming programming

shared memory distr. memory

machine machine
Shared Memory Programming:
Important Point to Remember
• No matter what the implementation, it
conceptually looks like shared memory.
• There may be some (important)
performance differences.
Multithreading
• User has explicit control over thread.
• Good: control can be used to performance
benefit.
• Bad: user has to deal with it.
Pthreads
• POSIX standard shared-memory
multithreading interface.
• Provides primitives for process
management and synchronization.
What does the user have to do?
• Decide how to decompose the computation
into parallel parts.
• Create (and destroy) processes to support
that decomposition.
• Add synchronization to make sure
dependences are covered.
General Thread Structure
• Typically, a thread is a concurrent
execution of a function or a procedure.
• So, your program needs to be restructured
such that parallel parts form separate
procedures or functions.
Example of Thread Creation (contd.)
main()

pthread_ func()
create(func)
Thread Joining Example
void *func(void *) { ….. }
pthread_t id; int X;
pthread_create(&id, NULL, func, &X);
…..
pthread_join(id, NULL);
…..
Example of Thread Creation (contd.)
main()

pthread_
create(func) func()

pthread_
join(id)
pthread_
exit()
Sequential SOR
for some number of timesteps/iterations {
for (i=0; i<n; i++ )
for( j=1, j<n, j++ )
temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1]
[j]
grid[i][j-1] + grid[i]
[j+1] );
for( i=0; i<n; i++ )
for( j=1; j<n; j++ )
grid[i][j] = temp[i][j];
}
Parallel SOR
• First (i,j) loop nest can be parallelized.
• Second (i,j) loop nest can be parallelized.
• Must wait to start second loop nest until all
processors have finished first.
• Must wait to start first loop nest of next
iteration until all processors have second
loop nest of previous iteration.
• Give n/p rows to each processor.
Pthreads SOR: Parallel parts (1)
void* sor_1(void *s)
{
int slice = (int) s;
int from = (slice*n)/p;
int to = ((slice+1)*n)/p;
for( i=from; i<to; i++)
for( j=0; j<n; j++ )
temp[i][j] = 0.25*(grid[i-1][j] + grid[i+1]
[j]
+grid[i][j-1] + grid[i][j+1]);
}
Pthreads SOR: Parallel parts (2)
void* sor_2(void *s)
{
int slice = (int) s;
int from = (slice*n)/p;
int to = ((slice+1)*n)/p;

for( i=from; i<to; i++)

for( j=0; j<n; j++ )
grid[i][j] = temp[i][j];
}
Pthreads SOR: main
for some number of timesteps {
for( i=0; i<p; i++ )
pthread_create(&thrd[i], NULL, sor_1, (void *)i);
for( i=0; i<p; i++ )
pthread_join(thrd[i], NULL);
for( i=0; i<p; i++ )
pthread_create(&thrd[i], NULL, sor_2, (void *)i);
for( i=0; i<p; i++ )
pthread_join(thrd[i], NULL);
}
Summary: Thread Management
• pthread_create(): creates a parallel thread
executing a given function (and arguments),
returns thread identifier.
• pthread_exit(): terminates thread.
• pthread_join(): waits for thread with
particular thread identifier to terminate.
Summary: Program Structure
• Encapsulate parallel parts in functions.
• Use function arguments to parameterize
what a particular thread does.
• Call pthread_create() with the function and
arguments, save thread identifier returned.
• Call pthread_join() with that thread
identifier.
Pthreads Synchronization
• Create/exit/join
– provide some form of synchronization,
– at a very coarse level,
– requires thread creation/destruction.
• Need for finer-grain synchronization
– mutex locks,
– condition variables.
Use of Mutex Locks
• To implement critical sections.
• Pthreads provides only exclusive locks.
• Some other systems allow shared-read,
exclusive-write locks.
Condition variables (1 of 5)
pthread_cond_init(
pthread_cond_t *cond,
pthread_cond_attr *attr)
• Creates a new condition variable cond.
• Attribute: ignore for now.
Condition Variables (2 of 5)
pthread_cond_destroy(
pthread_cond_t *cond)
• Destroys the condition variable cond.
Condition Variables (3 of 5)
pthread_cond_wait(
pthread_cond_t *cond,
pthread_mutex_t *mutex)
• Blocks the calling thread, waiting on cond.
• Unlocks the mutex.
Condition Variables (4 of 5)
pthread_cond_signal(
pthread_cond_t *cond)
• Unblocks one thread waiting on cond.
• Which one is determined by scheduler.
• If no thread waiting, then signal is a no-op.
Condition Variables (5 of 5)
pthread_cond_broadcast(
pthread_cond_t *cond)
• Unblocks all threads waiting on cond.
• If no thread waiting, then broadcast is a no-
op.
Use of Condition Variables
• To implement signal-wait synchronization
discussed in earlier examples.
• Important note: a signal is “forgotten” if
there is no corresponding wait that has
already happened.
Barrier Synchronization
• A wait at a barrier causes a thread to wait
until all threads have performed a wait at
the barrier.
• At that point, they all proceed.
Implementing Barriers in Pthreads

• Count the number of arrivals at the barrier.

• Wait if this is not the last arrival.
• Make everyone unblock if this is the last
arrival.
• Since the arrival count is a shared variable,
enclose the whole operation in a mutex
lock-unlock.
Implementing Barriers in Pthreads
void barrier()
{
pthread_mutex_lock(&mutex_arr);
arrived++;
if (arrived<N) {
pthread_cond_wait(&cond, &mutex_arr);
}
else {
pthread_cond_broadcast(&cond);
arrived=0; /* be prepared for next barrier */
}
pthread_mutex_unlock(&mutex_arr);
}
Parallel SOR with Barriers (1 of 2)
void* sor (void* arg)
{
int slice = (int)arg;
int from = (slice * (n-1))/p + 1;
int to = ((slice+1) * (n-1))/p + 1;

for some number of iterations { … }

}
Parallel SOR with Barriers (2 of 2)
for (i=from; i<to; i++)
for (j=1; j<n; j++)
temp[i][j] = 0.25 * (grid[i-1][j] +
grid[i+1][j] + grid[i][j-1] + grid[i]
[j+1]);
barrier();
for (i=from; i<to; i++)
for (j=1; j<n; j++)
grid[i][j]=temp[i][j];
barrier();
Parallel SOR with Barriers: main
int main(int argc, char *argv[])
{
pthread_t *thrd[p];
/* Initialize mutex and condition variables */
for (i=0; i<p; i++)
pthread_create (&thrd[i], &attr, sor,
(void*)i);
for (i=0; i<p; i++)
pthread_join (thrd[i], NULL);
/* Destroy mutex and condition variables */
}
Note again
• Many shared memory programming
systems (other than Pthreads) have barriers
as basic primitive.
• If they do, you should use it, not construct it
yourself.
• Implementation may be more efficient than
what you can do yourself.
Busy Waiting
• Not an explicit part of the API.
• Available in a general shared memory
programming environment.
Busy Waiting
initially: flag = 0;

P1: produce data;

flag = 1;

P2: while( !flag ) ;

consume data;
Use of Busy Waiting
• On the surface, simple and efficient.
• In general, not a recommended practice.
• Often leads to messy and unreadable code
(blurs data/synchronization distinction).
• May be inefficient
Private Data in Pthreads
• To make a variable private in Pthreads, you
need to make an array out of it.
• Index the array by thread identifier, which
you should keep track of .
• Not very elegant or efficient.
Other Primitives in Pthreads
• Set the attributes of a thread.
• Set the attributes of a mutex lock.
• Set scheduling parameters.
ECE 1747 Parallel Programming

Machine-independent
Performance Optimization Techniques
Returning to Sequential vs. Parallel
• Sequential execution time: t seconds.
• Startup overhead of parallel execution: t_st
seconds (depends on architecture)
• (Ideal) parallel execution time: t/p + t_st.
• If t/p + t_st > t, no gain.
General Idea
• Parallelism limited by dependences.
• Restructure code to eliminate or reduce
dependences.
• Sometimes possible by compiler, but good
to know how to do it by hand.
Optimizations: Example 16
for (i = 0; i < 100000; i++)
a[i + 1000] = a[i] + 1;

Cannot be parallelized as is.

May be parallelized by applying certain
code transformations.
Example Transformation
for (i=1; i < 100; i++){
int stride = i* 1000;
for (j = 0; j < 1000; j++)
a[stride+j] = a[j] + i;
}
Code Transformations
• Reorganize code such that
– dependences are removed or reduced
– large pieces of parallel work emerge

• Code can become messy … there is a point

of diminishing returns.
Flavors of Parallelism
• Data parallelism: all processors do the same
thing on different data.
– Regular
– Irregular
• Task parallelism: processors do different
tasks.
– Task queue
– Pipelines
Task Parallelism
• Each process performs a different task.
• Two principal flavors:
– pipelines
– task queues
• Program Examples: PIPE (pipeline), TSP
(task queue).
Pipeline
• Often occurs with image processing
applications, where a number of images
undergoes a sequence of transformations.
• E.g., rendering, clipping, compression, etc.
Sequential Program
for( i=0; i<num_pic, read(in_pic[i]); i++ ) {
int_pic_1[i] = trans1( in_pic[i] );
int_pic_2[i] = trans2( int_pic_1[i]);
int_pic_3[i] = trans3( int_pic_2[i]);
out_pic[i] = trans4( int_pic_3[i]);
}
Parallelizing a Pipeline
• For simplicity, assume we have 4
processors (i.e., equal to the number of
transformations).
• Furthermore, assume we have a very large
number of pictures (>> 4).
Parallelizing a Pipeline (part 1)
Processor 1:

for( i=0; i<num_pics, read(in_pic[i]); i++ ) {

int_pic_1[i] = trans1( in_pic[i] );
signal(event_1_2[i]);
}
Parallelizing a Pipeline (part 2)
Processor 2:

for( i=0; i<num_pics; i++ ) {

wait( event_1_2[i] );
int_pic_2[i] = trans2( int_pic_1[i] );
signal(event_2_3[i] );
}

Same for processor 3

Parallelizing a Pipeline (part 3)
Processor 4:

for( i=0; i<num_pics; i++ ) {

wait( event_3_4[i] );
out_pic[i] = trans4( int_pic_3[i] );
}
Use of Wait/Signal (Pipelining)
• Sequential

• Parallel

(Pattern -- picture; horiz. line -- processor).

PIPE
P1:for( i=0; i<num_pics, read(in_pic); i++ ) {
int_pic_1[i] = trans1( in_pic );
signal( event_1_2[i] );
}
P2: for( i=0; i<num_pics; i++ ) {
wait( event_1_2[i] );
int_pic_2[i] = trans2( int_pic_1[i] );
signal( event_2_3[i] );
}
PIPE Using Pthreads
• Replacing the original wait/signal by a
Pthreads condition variable wait/signal will
not work.
– signals before a wait are forgotten.
– we need to remember a signal.
How to remember a signal (1 of 2)
semaphore_signal(i) {
pthread_mutex_lock(&mutex_rem[i]);
arrived [i]= 1;
pthread_cond_signal(&cond[i]);
pthread_mutex_unlock(&mutex_rem[i]);
}
How to Remember a Signal (2 of 2)
semaphore_wait(i) {
pthreads_mutex_lock(&mutex_rem[i]);
if( arrived[i] = 0 ) {
pthreads_cond_wait(&cond[i],
mutex_rem[i]);
}
arrived[i] = 0;
pthreads_mutex_unlock(&mutex_rem[i]);
}
PIPE with Pthreads
P1:for( i=0; i<num_pics, read(in_pic); i++ ) {
int_pic_1[i] = trans1( in_pic );
semaphore_signal( event_1_2[i] );
}
P2: for( i=0; i<num_pics; i++ ) {
semaphore_wait( event_1_2[i] );
int_pic_2[i] = trans2( int_pic_1[i] );
semaphore_signal( event_2_3[i] );
}
Another Sequential Program
for( i=0; i<num_pic, read(in_pic); i++ ) {
int_pic_1 = trans1( in_pic );
int_pic_2 = trans2( int_pic_1);
int_pic_3 = trans3( int_pic_2);
out_pic = trans4( int_pic_3);
}
Can we use same parallelization?
Processor 2:

for( i=0; i<num_pics; i++ ) {

wait( event_1_2[i] );
int_pic_2 = trans1( int_pic_1 );
signal(event_2_3[i] );
}

Same for processor 3

Can we use same parallelization?
• No, because of anti-dependence between
stages, there is no parallelism.
• We used privatization to enable pipeline
parallelism.
• Used often to avoid dependences (not only
with pipelines).
• Costly in terms of memory.
In-between Solution
• Use n>1 buffers between stages.
• Block when buffers are full or empty.

P1 P2 P3 P4
Perfect Pipeline ?

(Pattern -- picture; horiz. line -- processor).

Things are often not that perfect
• One stage takes more time than others.
• Stages take a variable amount of time.
• Extra buffers provide some cushion against
variability.
Task Parallelism
• Each process performs a different task.
• Two principal flavors:
– pipelines
– task queues
• Program Examples: PIPE (pipeline), TSP
(task queue).
TSP (Traveling Salesman)
• Goal:
– given a list of cities, a matrix of distances
between them, and a starting city,
– find the shortest tour in which all cities are
visited exactly once.
• Example of an NP-hard search problem.
• Algorithm: branch-and-bound.
Branching
Initialization:
 go from starting city to each possible city
 put resulting partial path into priority queue,
ordered by its current length.
Further (repeatedly):
 take head element out of priority queue,
 expand by each one of remaining cities,
 put resulting partial path into priority queue.
Finding the Solution
• Eventually, a complete path will be found.
• Remember its length as the current shortest
path.
• Every time a complete path is found, check
if we need to update current best path.
• When priority queue becomes empty, best
path is found.
Using a Simple Bound
• Once a complete path is found, we have a
bound on the length of shortest path.
• No use in exploring partial path that is
already longer than the current lower
bound.
Sequential TSP: Data Structures
• Priority queue of partial paths.
• Current best solution and its length.
• For simplicity, we will ignore bounding.
Sequential TSP: Code Outline
init_q(); init_best();
while( (p=de_queue()) != NULL ) {
for each expansion by one city {
q = add_city(p);
if( complete(q) ) { update_best(q) };
else { en_queue(q) };
}
}
Parallel TSP: Possibilities
• Have each process do one expansion.
• Have each process do expansion of one
partial path.
• Have each process do expansion of multiple
partial paths.
• Issue of granularity/performance, not an
issue of correctness.
• Assume: process expands one partial path.
Parallel TSP: Synchronization
• True dependence between process that puts
partial path in queue and the one that takes
it out.
• Dependences arise dynamically.
• Required synchronization: need to make
process wait if q is empty.
Parallel TSP: First cut (part 1)
process i:
while( (p=de_queue()) != NULL ) {
for each expansion by one city {
q = add_city(p);
if complete(q) { update_best(q) };
else en_queue(q);
}
}
Parallel TSP: First cut (part 2)
• In de_queue: wait if q is empty
• In en_queue: signal that q is no longer
empty
Parallel TSP: More synchronization

• All processes operate, potentially at the

same time, on q and best.
• This race must not be allowed to happen.
• Critical section: only one process can
execute in critical section at once.
Parallel TSP: Critical Sections
• All shared data must be protected by critical
section.
• Update_best must be protected by a critical
section.
• En_queue and de_queue must be protected
by the same critical section.
Termination condition
• How do we know when we are done?
• All processes are waiting inside de_queue.
• Count the number of waiting processes
before waiting.
• If equal to total number of processes, we are
done.
Parallel TSP
process i:
while( (p=de_queue()) != NULL ) {
for each expansion by one city {
q = add_city(p);
if complete(q) { update_best(q) };
else en_queue(q);
}
}
Parallel TSP
• Need critical section
– in update_best,
– in en_queue/de_queue.
• In de_queue
– wait if q is empty,
– terminate if all processes are waiting.
• In en_queue:
– signal q is no longer empty.
Parallel TSP: Mutual Exclusion
en_queue() / de_queue() {
pthreads_mutex_lock(&queue);
…;
pthreads_mutex_unlock(&queue);
}
update_best() {
pthreads_mutex_lock(&best);
…;
pthreads_mutex_unlock(&best);
}
Parallel TSP: Condition Synchronization
de_queue() {
while( (q is empty) and (not done) ) {
waiting++;
if( waiting == p ) {
done = true;
pthreads_cond_broadcast(&empty, &queue);
}
else {
pthreads_cond_wait(&empty, &queue);
waiting--;
}
}
if( done )
return null;
else
remove and return head of the queue;
}
Parallel TSP
• Complete parallel program will be provided
on the Web.
• Includes wait/signal on empty q.
• Includes critical sections.
• Includes termination condition.
Factors that Determine Speedup
• Characteristics of parallel code
– granularity
– load balance
– locality
– communication and synchronization
Granularity
• Granularity = size of the program unit that
is executed by a single processor.
• May be a single loop iteration, a set of loop
iterations, etc.
• Fine granularity leads to:
– (positive) ability to use lots of processors
– (positive) finer-grain load balancing
– (negative) increased overhead
Granularity and Critical Sections
• Small granularity => more processors =>
more critical section accesses => more
contention.
Issues in Performance of Parallel Parts

• Granularity.
• Load balance.
• Locality.
• Synchronization and communication.
Load Balance
• Load imbalance = difference in execution
time between processors between barriers.
• Execution time may not be predictable.
– Regular data parallel: yes.
– Irregular data parallel or pipeline: perhaps.
– Task queue: no.
Static vs. Dynamic
• Static: done once, by the programmer
– block, cyclic, etc.
– fine for regular data parallel
• Dynamic: done at runtime
– task queue
– fine for unpredictable execution times
– usually high overhead
• Semi-static: done once, at run-time
Choice is not inherent
• MM or SOR could be done using task
queues: put all iterations in a queue.
– In heterogeneous environment.
– In multitasked environment.
Static Load Balancing
• Block
– best locality
– possibly poor load balance
• Cyclic
– better load balance
– worse locality
• Block-cyclic
– load balancing advantages of cyclic (mostly)
– better locality
Dynamic Load Balancing (1 of 2)
• Centralized: single task queue.
– Easy to program
– Excellent load balance
• Distributed: task queue per processor.
– Less contention during synchronization
Dynamic Load Balancing (2 of 2)
• Task stealing with distributed queues:
– Processes normally remove and insert tasks
from their own queue.
– When queue is empty, remove task(s) from
other queues.
• Extra overhead and programming difficulty.
• Better load balancing.
Semi-static Load Balancing
• Measure the cost of program parts.
• Use measurement to partition computation.
• Done once, done every iteration, done every
n iterations.
Molecular Dynamics (MD)
• Simulation of a set of bodies under the
influence of physical laws.
• Atoms, molecules, celestial bodies, ...
• Have same basic structure.

F
F

F
Molecular Dynamics (Skeleton)
for some number of timesteps {
for all molecules i
for all other molecules j
force[i] += f( loc[i], loc[j] );
for all molecules i
loc[i] = g( loc[i], force[i] );
}
Molecular Dynamics
• To reduce amount of computation, account
for interaction only with nearby molecules.
Molecular Dynamics (continued)
for some number of timesteps {
for all molecules i
for all nearby molecules j
force[i] += f( loc[i], loc[j] );
for all molecules i
loc[i] = g( loc[i], force[i] );
}
Molecular Dynamics (continued)
for each molecule i
number of nearby molecules count[i]
array of indices of nearby molecules index[j]
( 0 <= j < count[i])
Molecular Dynamics (continued)
for some number of timesteps {
for( i=0; i<num_mol; i++ )
for( j=0; j<count[i]; j++ )
force[i] +=
f(loc[i],loc[index[j]]);
for( i=0; i<num_mol; i++ )
loc[i] = g( loc[i], force[i] );
}
Molecular Dynamics (simple)
for some number of timesteps {
parallel for
for( i=0; i<num_mol; i++ )
for( j=0; j<count[i]; j++ )
force[i] += f(loc[i],loc[index[j]]);
parallel for
for( i=0; i<num_mol; i++ )
loc[i] = g( loc[i], force[i] );
}
Molecular Dynamics (simple)
• Simple to program.
• Possibly poor load balance
– block distribution of i iterations (molecules)
– could lead to uneven neighbor distribution
– cyclic does not help
Better Load Balance
• Assign iterations such that each processor
has ~ the same number of neighbors.
• Array of “assign records”
– size: number of processors
– two elements:
• beginning i value (molecule)
• ending i value (molecule)
• Recompute partition periodically
Frequency of Balancing
• Every time neighbor list is recomputed.
– once during initialization.
– every iteration.
– every n iterations.
• Extra overhead vs. better approximation
and better load balance.
Summary
• Parallel code optimization
– Granularity
– Load balance
– Locality
– Synchronization

Laser Amnc 3i Ref E01 201412
No ratings yet
Laser Amnc 3i Ref E01 201412
264 pages
EI-0105I - Rev.01 - 02.11.2011 - Calibration and Tuning Sigas 2-3
No ratings yet
EI-0105I - Rev.01 - 02.11.2011 - Calibration and Tuning Sigas 2-3
36 pages
George Michael - Live in London 720p
No ratings yet
George Michael - Live in London 720p
3 pages
Getting Started Guide IRRIGATION
No ratings yet
Getting Started Guide IRRIGATION
41 pages
MAP - Unit2
No ratings yet
MAP - Unit2
134 pages
Lect9 Pthread
No ratings yet
Lect9 Pthread
24 pages
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
No ratings yet
Threads: Multicore Programming Multithreading Models Thread Libraries Threading Issues Operating System Examples
22 pages
Programming Shared Address Space Platforms
No ratings yet
Programming Shared Address Space Platforms
44 pages
Programming Shared-Memory Platforms With Pthreads: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Pthreads: John Mellor-Crummey
34 pages
Lecture09 ConcurrentProgramming 02 Synchronization
No ratings yet
Lecture09 ConcurrentProgramming 02 Synchronization
30 pages
Lecture 4
No ratings yet
Lecture 4
41 pages
Parallel Programming Unit 2
No ratings yet
Parallel Programming Unit 2
71 pages
PDC Lecture 05
No ratings yet
PDC Lecture 05
48 pages
Pthread PDF
No ratings yet
Pthread PDF
33 pages
Pthreads
No ratings yet
Pthreads
70 pages
High Performance Computing
No ratings yet
High Performance Computing
67 pages
Cos 7
No ratings yet
Cos 7
63 pages
Chap3 Pthread
No ratings yet
Chap3 Pthread
33 pages
Lecture 9 Programming Shared Address Space Platforms Using POSIX Thread API
No ratings yet
Lecture 9 Programming Shared Address Space Platforms Using POSIX Thread API
35 pages
POSIX Threads
No ratings yet
POSIX Threads
20 pages
Threads: Tevfik Koşar
100% (1)
Threads: Tevfik Koşar
40 pages
CSL 730: Parallel Programming: Openmp
No ratings yet
CSL 730: Parallel Programming: Openmp
74 pages
Lecture 5
No ratings yet
Lecture 5
51 pages
Unit 4
No ratings yet
Unit 4
42 pages
Programming with Shared Memory: Nguyễn Quang Hùng
No ratings yet
Programming with Shared Memory: Nguyễn Quang Hùng
54 pages
PThread and Semaphore Examples 3
No ratings yet
PThread and Semaphore Examples 3
6 pages
P Threads
No ratings yet
P Threads
72 pages
Front Threads
No ratings yet
Front Threads
18 pages
ch4并发编程
No ratings yet
ch4并发编程
45 pages
Introduction To Pthreads
No ratings yet
Introduction To Pthreads
19 pages
Thread Programming Ch6
No ratings yet
Thread Programming Ch6
4 pages
Multi-Threaded Programming With POSIX Threads - Linux Systems Programming
No ratings yet
Multi-Threaded Programming With POSIX Threads - Linux Systems Programming
2,608 pages
Week 9
No ratings yet
Week 9
20 pages
MIT6 087IAP10 Lec13
No ratings yet
MIT6 087IAP10 Lec13
38 pages
P Threads Intro
No ratings yet
P Threads Intro
19 pages
08 Systems Programming-Concurrent Programming
No ratings yet
08 Systems Programming-Concurrent Programming
61 pages
6-Posix Threads
No ratings yet
6-Posix Threads
32 pages
EE426 OS Lab7 Spring2023
No ratings yet
EE426 OS Lab7 Spring2023
7 pages
5 Threads
No ratings yet
5 Threads
33 pages
Lab 7
No ratings yet
Lab 7
24 pages
Lec7 - TLP Shared Memory and OpenMP
No ratings yet
Lec7 - TLP Shared Memory and OpenMP
45 pages
C++ With Pthreads: Ryan M. Swanstrom December 27, 2008
No ratings yet
C++ With Pthreads: Ryan M. Swanstrom December 27, 2008
8 pages
Lecture 16
No ratings yet
Lecture 16
30 pages
Threads
No ratings yet
Threads
32 pages
Parallel Progamming With Pthreads
No ratings yet
Parallel Progamming With Pthreads
79 pages
CS241 System Programming: Discussion Section 4 Feb 13 - Feb 16
No ratings yet
CS241 System Programming: Discussion Section 4 Feb 13 - Feb 16
31 pages
Session 15 POSIX Threads and Mutex
No ratings yet
Session 15 POSIX Threads and Mutex
20 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
Course: Parallel Processing Lab #2 - Multithreads and Openmp
No ratings yet
Course: Parallel Processing Lab #2 - Multithreads and Openmp
14 pages
Unix Threads
No ratings yet
Unix Threads
36 pages
Operating System - Lab 4
No ratings yet
Operating System - Lab 4
7 pages
Chapter 4: Threads & Concurrency: Difference Between Multiprocessing and Multithreading
No ratings yet
Chapter 4: Threads & Concurrency: Difference Between Multiprocessing and Multithreading
20 pages
Lab 2 Threads
No ratings yet
Lab 2 Threads
6 pages
Ex 5
No ratings yet
Ex 5
8 pages
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
No ratings yet
Parallel Programming: in C With Mpi and Openmp Michael J. Quinn
73 pages
05-Semaphores Monitors Barriers-S20
No ratings yet
05-Semaphores Monitors Barriers-S20
60 pages
Threads and Multithreading
No ratings yet
Threads and Multithreading
36 pages
POSIX Concurrency in C - Complete Guide2
No ratings yet
POSIX Concurrency in C - Complete Guide2
27 pages
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
CS211 Lec 15
No ratings yet
CS211 Lec 15
21 pages
12 Multithreading Patterns Data Structures
No ratings yet
12 Multithreading Patterns Data Structures
40 pages
Ecole Militaire Polytechnique: Content
No ratings yet
Ecole Militaire Polytechnique: Content
15 pages
Li Model-Contrastive Federated Learning CVPR 2021 Paper
No ratings yet
Li Model-Contrastive Federated Learning CVPR 2021 Paper
10 pages
1 s2.0 S0950705121000381 Main
No ratings yet
1 s2.0 S0950705121000381 Main
11 pages
Federated Learning For Healthcare Informatics
100% (1)
Federated Learning For Healthcare Informatics
19 pages
v1 Covered
No ratings yet
v1 Covered
16 pages
1 s2.0 S016740482300007X Main
No ratings yet
1 s2.0 S016740482300007X Main
18 pages
Securing Federated Learning With Blockchain: A Systematic Literature Review
No ratings yet
Securing Federated Learning With Blockchain: A Systematic Literature Review
35 pages
1 s2.0 S0925231223010202 Main
No ratings yet
1 s2.0 S0925231223010202 Main
18 pages
An Approachto Work With Quantum Data in
No ratings yet
An Approachto Work With Quantum Data in
6 pages
Sec22 Stevens
No ratings yet
Sec22 Stevens
18 pages
F Ml-He: A E H - E - B P - P F L S: ED N Fficient Omomorphic Ncryption Ased Rivacy Reserving Ederated Earning Ystem
No ratings yet
F Ml-He: A E H - E - B P - P F L S: ED N Fficient Omomorphic Ncryption Ased Rivacy Reserving Ederated Earning Ystem
20 pages
Dopamine: Differentially Private Federated Learning On Medical Data
No ratings yet
Dopamine: Differentially Private Federated Learning On Medical Data
9 pages
MPI Pacheco Ch3
No ratings yet
MPI Pacheco Ch3
124 pages
Lecture Open MP
No ratings yet
Lecture Open MP
25 pages
12 MPIProgramPerformance
No ratings yet
12 MPIProgramPerformance
33 pages
Quantum Federated Learning Remarks and Challenges
No ratings yet
Quantum Federated Learning Remarks and Challenges
5 pages
Entropy 23 00460 v2
No ratings yet
Entropy 23 00460 v2
14 pages
Qtutorial
No ratings yet
Qtutorial
7 pages
Cybersecurity
No ratings yet
Cybersecurity
11 pages
Final Autocad Manual Made by Warid Javed
No ratings yet
Final Autocad Manual Made by Warid Javed
5 pages
GEOVIA Surpac6.8 Remote Sensing FINAL
No ratings yet
GEOVIA Surpac6.8 Remote Sensing FINAL
34 pages
A Presentation ON Software & Hardware Characteristics OF Computer System
No ratings yet
A Presentation ON Software & Hardware Characteristics OF Computer System
18 pages
Maths
No ratings yet
Maths
12 pages
Agfa IMPAX
No ratings yet
Agfa IMPAX
23 pages
Concept Map 1
No ratings yet
Concept Map 1
1 page
Computer Science Revision Notes
No ratings yet
Computer Science Revision Notes
26 pages
Leapfrog Geo 4.3 Release Notes
No ratings yet
Leapfrog Geo 4.3 Release Notes
14 pages
General Information
No ratings yet
General Information
14 pages
Amd Ryzen™ Threadripper™ Nvme Raid Quick Start Guide Rc-9.1.0 Release Version 1.0
No ratings yet
Amd Ryzen™ Threadripper™ Nvme Raid Quick Start Guide Rc-9.1.0 Release Version 1.0
5 pages
Week 08
No ratings yet
Week 08
131 pages
CSE4015 Human Computer Interaction L T P J C Pre-Requisite Syllabus Version Course Objectives
No ratings yet
CSE4015 Human Computer Interaction L T P J C Pre-Requisite Syllabus Version Course Objectives
2 pages
Impact of Data Visualization On Management Decisions
No ratings yet
Impact of Data Visualization On Management Decisions
12 pages
Chapter4 Thread Questions With Answers
No ratings yet
Chapter4 Thread Questions With Answers
5 pages
PenMount Device Driver Users Guide X Server V2 0A
No ratings yet
PenMount Device Driver Users Guide X Server V2 0A
43 pages
Kogan 5-In-1 Combo Scanner
No ratings yet
Kogan 5-In-1 Combo Scanner
4 pages
Handout mp2-1
No ratings yet
Handout mp2-1
5 pages
MHS PowerFlex 525 Programming Rev.1.2
No ratings yet
MHS PowerFlex 525 Programming Rev.1.2
21 pages
Computers (ICDL Module 3)
No ratings yet
Computers (ICDL Module 3)
97 pages
CH 2
No ratings yet
CH 2
48 pages
Test Bank Professional Nursing Concepts Challenges 7th Edition Blackdownload
100% (5)
Test Bank Professional Nursing Concepts Challenges 7th Edition Blackdownload
44 pages
EasySet MiniPrinter Window Driver Manual en (v2.3.0)
No ratings yet
EasySet MiniPrinter Window Driver Manual en (v2.3.0)
39 pages
MS Word Complete Notes With Images
0% (1)
MS Word Complete Notes With Images
5 pages
TK - Tools Documentation: Jason R. Jones
No ratings yet
TK - Tools Documentation: Jason R. Jones
36 pages
E5261 V-P7H55E Manual Final 012010 Low
No ratings yet
E5261 V-P7H55E Manual Final 012010 Low
80 pages
D - Copia - 3524 3524mfplus 4023 4024 4024mfplus - Eng
No ratings yet
D - Copia - 3524 3524mfplus 4023 4024 4024mfplus - Eng
4 pages

Pthreads Mod

Uploaded by

Pthreads Mod

Uploaded by

ECE1747 Parallel Programming

Shared Memory Multithreading

Shared Memory Address Space

proc1 proc2 proc3 procN

proc1 proc2 proc3 procN

The alternative model to shared memory.

proc1 proc2 proc3 procN

• Same terminology is used in distinguishing

shared memory message passing

shared memory distr. memory

for( i=from; i<to; i++)

• Count the number of arrivals at the barrier.

for some number of iterations { … }

P1: produce data;

P2: while( !flag ) ;

Cannot be parallelized as is.

• Code can become messy … there is a point

for( i=0; i<num_pics, read(in_pic[i]); i++ ) {

for( i=0; i<num_pics; i++ ) {

Same for processor 3

for( i=0; i<num_pics; i++ ) {

(Pattern -- picture; horiz. line -- processor).

for( i=0; i<num_pics; i++ ) {

Same for processor 3

(Pattern -- picture; horiz. line -- processor).

• All processes operate, potentially at the

You might also like