0% found this document useful (0 votes)
10 views

3-Parallel Software

The document discusses parallel computing, focusing on memory structures such as shared and distributed memory systems. It outlines the components of an operating system process, multitasking, threading, and methodologies for writing parallel programs, including partitioning, communication, aggregation, and mapping. Additionally, it highlights the challenges of parallel programming, including synchronization and communication costs.

Uploaded by

kaganakinci60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

3-Parallel Software

The document discusses parallel computing, focusing on memory structures such as shared and distributed memory systems. It outlines the components of an operating system process, multitasking, threading, and methodologies for writing parallel programs, including partitioning, communication, aggregation, and mapping. Additionally, it highlights the challenges of parallel programming, including synchronization and communication costs.

Uploaded by

kaganakinci60
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

CENG479 PARALLEL COMPUTING

Lec.3: Parallel Software

Dr. Hüseyin TEMUÇİN


Gazi Üniversitesi Bilgisayar Mühendisliği Bölümü
CENG479 PARALLEL COMPUTING

Summarize - Memory Structures

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
CENG479 PARALLEL COMPUTING

What about memory structure ?

● Shared Memory System


● Distributed Memory System

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
CENG479 PARALLEL COMPUTING

Shared Memory System

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
CENG479 PARALLEL COMPUTING

Distributed Memory System

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
CENG479 PARALLEL COMPUTING

An operating system “process”

● An instance of a computer program that is being


executed.
● Components of a process:
○ The executable machine language program
○ A block of memory
○ Descriptors of resources the OS has allocated to
the process
○ Security information
○ Information about the state of the process

Copyright © 2010, Elsevier Inc. All rights Reserved

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Multitasking

● Gives the illusion that a single processor system


is running multiple programs simultaneously.
● Each process takes turns running time slice
● After its time is up, it waits until it has a turn
again.

Copyright © 2010, Elsevier Inc. All rights Reserved https://www.javatpoint.com/multitasking-operating-system

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
BM5351 PARALEL HESAPLAMA YÖNTEMLERİ

Threading

● Threads are contained within processes.


● They allow programmers to divide their
programs into (more or less) independent tasks.
● The hope is that when one thread blocks because
it is waiting on a resource, another thread will
have work to do and can run.

https://www.studytonight.com/operating-system/multithreading
Copyright © 2010, Elsevier Inc. All rights Reserved

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Hardware to Software mapping…

● In shared memory programs:


○ Start a single process and fork threads.
○ Threads carry out tasks.
● In distributed memory programs:
○ Start multiple processes.
○ Processes carry out tasks.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
SPMD – single program multiple data

● Same single executable program works on


different contexts
○ Data are different but execution is same
● We have to manage different data situations
using conditional statements

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Nondeterminism

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Writing Parallel Programs

● Divide the work among processes/threads double x[], y[];


○ Each process/thread gets roughly the same
amount of work for(int i = 0; i < n; i++) {
○ Communication is minimized. x[i] = y[i];
● Arrange for the processes/threads to }
synchronize.
● Arrange for communication among
processes/threads.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Sequential vs Parallel

● The parallel software must give same result / output with sequential version
○ The last result must be reduced on a process / thread
● Parallel software implementation is much more costly than sequential version
● Parallel software has side costs
○ Synchronization
○ Communication
Shared Memory System

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Shared Memory

● Dynamic threads
○ Master thread waits for work, forks new threads, and when threads are done, they terminate
○ Efficient use of resources
○ Thread creation and termination is time consuming
● Static threads
○ Pool of threads created and are allocated work, but do not terminate until cleanup.
○ Better performance
○ Potential waste of system resources

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Shared memory issues

● Race condition
● Critical section
● Mutually exclusive
● Mutual exclusion lock (mutex,
semaphore, ...)

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Busy-waiting

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Distributed Memory System

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.
Distributed Memory: message-passing

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
We want to write a parallel program ... Now what?

● We have a serial program.


● How to parallelize it?
● We know that we need to divide work
○ Load balancing,
○ Manage synchronization,
○ Reduce communication
● Unfortunately: there is no mechanical process.
● Ian Foster has some nice framework.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Foster’s methodology
(The PCAM Methodology)

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Foster Methodology - Partitioning

● Divide the computation to be performed


and the data operated on by the
computation into small tasks.
● The focus here should be on identifying
tasks that can be executed in parallel.
● This step brings out the parallelism in the
algorithm

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Foster Methodology - Communication

● Determine what communication needs to be carried out


among the tasks identified in the previous step.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Foster Methodology - Aggregation

● Combine tasks and communications identified in the


first step into larger tasks.
○ For example, if task A must be executed before task B can
be executed, it may make sense to aggregate them into a
single composite task.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Foster Methodology - Mapping

● Assign the composite tasks identified in the previous


step to processes/threads.
● This should be done so that communication is
minimized, and each process/thread gets roughly the
same amount of work.

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Example - histogram

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Serial program - input

● The number of measurements: data_count


● An array of data_count floats: data
● The minimum value for the bin containing the smallest values: min_meas
● The maximum value for the bin containing the largest values: max_meas
● The number of bins: bin_count

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Serial program - output

● bin_maxes : an array of bin_count floats


store the upper bound of each bin
● bin_counts : an array of bin_count ints stores
the number of elements in each bin

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Serial Program

int bin = 0;

for( i = 0; i < data_count; i++){

bin = find_bin(data[i], ...);

bin_counts[bin]++;

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Adding the local arrays

Better Performance ?

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
Adding the local arrays

These slides are adopted from Prof Dr Zahran's Parallel Computing lecture notes.z
BBM101- Bilgisayar Programlamaya Giriş-I

Questions

You might also like