100% found this document useful (1 vote)

530 views46 pages

CS3006 Parallel Computing Course Overview

This document outlines the course details for CS3006 Parallel and Distributed Computing offered in Fall 2021. It provides information on the instructor, recommended textbook, assessment details including midterm and final exams, assignments, semester project, pre-requisite courses, semester plan by week, learning management system, and an overview of key concepts in parallel computing including Flynn's taxonomy and parallel terminology.

Uploaded by

A N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

530 views46 pages

CS3006 Parallel Computing Course Overview

Uploaded by

A N

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

CS3006 Parallel and

Distributed Computing
FALL 2021
NATIONAL UNIVERSITY OF COMPUTER AND EMERGING
SCIENCES
Instructor
Muhammad Danish Khan
Lecturer, Department of Computer Science
FAST NUCES Karachi
[email protected]

Recommended Text Book

Introduction to Parallel Computing, Second Edition

by Ananth Grama
Marks Distribution and Deadlines
Assessment Weightage Schedule/Deadlines/ Remarks
Mid-term I Examinations 15 6th Week
Mid-term II Examinations 15 12th Week
Tasks/Assignments 10 1-2 Assignment Tasks per Week with Plagiarism
Checking
Semester Project 10 Project idea submission: 7th week, Final Submission
due in 15th - 16th Week
Final Examinations 50 Comprehensively from all the covered and assigned
topics.
Pre-Requisite
Operating System Concepts
Algorithms
Semester Plan
Week 1: Revision of Operating System Concepts

Week 2: Introduction to Parallel Computing, Assignment Task(s)

Week 3: Parallel Programming Platforms

Week 4: Parallel Programming Platforms , Assignment Task(s)

Week 5: Principles of Parallel Algorithm Design, Quiz-1

Week 6: Mid Term-1 Examinations

Semester Plan
Week 7: Principles of Parallel Algorithm Design, Assignment Task(s), Project Proposal Submission

Week 8: Programming Shared Address Space

Week 9: Programming Shared Address Space

Week 10: Programming Shared Address Space, Assignment Task(s)

Week 11: Programming Using the Message Passing Paradigm, Quiz-2

Week 12: Mid Term-2 Examinations

Semester Plan
Week 13: Programming Using the Message Passing Paradigm, Assignment Task(s)

Week 14: Distributed System Models and Enabling Technologies, Assignment Task(s)

Week 15: Distributed System Models and Enabling Technologies, Quiz-3, Project
Evaluations

Week 16: Distributed System Models and Enabling technologies, Project Evaluations
LMS: Google Class Room
Section 5C Class Code: eg2u47t
https://classroom.google.com/c/Mzg4NTA1MTMzNDI3?cjc=eg2u47t
Operating System Concepts Revision
Program
◦ Set of instructions and associated data
◦ resides on the disk and is loaded by the operating system to perform some task.
◦ E.g. An executable file or a python script file.

Process
◦ A program in execution.
◦ In order to run a program, the operating system's kernel is first asked to create a new
process, which is an environment in which a program executes.
◦ consists of instructions, user-data, and system-data segments, CPU, memory, address-
space, disk acquired at runtime
Thread
◦ the smallest unit of execution in a process.
◦ A thread simply executes instructions serially.
◦ A process can have multiple threads running as part of it.
◦ Processes don't share any resources amongst themselves whereas threads
of a process can share the resources allocated to that particular process,
including memory address space.
“Multiprocessing systems”
◦ where multiple processes get scheduled on more than one CPU.
◦ Usually, this requires hardware support where a single system comes with
multiple cores
◦ or the execution takes place in a cluster of machines.

Multiple Processors vs Multiple Cores

Concurrency

Parallel Execution
Parallelism
The term parallelism means that an application splits its tasks up
into smaller subtasks which can be processed in parallel, for instance
on multiple CPUs at the exact same time.
Serial Execution vs. Parallel Execution

Concurrent Execution on a Single-core System

Parallel Execution on a Multicore System

Limitations of Serial Computing
Limits to serial computing - both physical and practical reasons pose significant
constraints to simply building ever faster serial computers.

Transmission speeds - the speed of a serial computer is directly dependent upon how
fast data can move through hardware.
◦ Absolute limits are the speed of light (30 cm/nanosecond) and the transmission limit of copper wire
(9 cm/nanosecond). Increasing speeds necessitate increasing proximity of processing elements.

Economic limitations - it is increasingly expensive to make a single processor faster.

◦ Using a larger number of moderately fast commodity processors to achieve the same (or better)
performance is less expensive.
Parallel Computing
Traditionally, software has been written for serial computation:
◦ To be run on a single computer having a single Central Processing Unit (CPU);
◦ A problem is broken into a discrete series of instructions.
◦ Instructions are executed one after another.
◦ Only one instruction may execute at any moment in time.
Parallel Computing
In the simplest sense, parallel computing is the
simultaneous use of multiple computing resources
to solve a computational problem.
◦ To be run using multiple CPUs
◦ A problem is broken into discrete parts that can be solved
concurrently
◦ Each part is further broken down to a series of instructions

Instructions from each part execute simultaneously

on different CPUs
Parallel Computing: Resources
The compute resources can include:
◦ A single computer with multiple processors;
◦ A single computer with (multiple) processor(s) and some specialized
computer resources (GPU, FPGA …)
◦ An arbitrary number of computers connected by a network (Cluster)
◦ A combination of both.
Parallel Computing: The computational
problem
The computational problem usually demonstrates characteristics
such as the ability to be:
◦ Broken apart into discrete pieces of work that can be solved simultaneously;

◦ Execute multiple program instructions at any moment in time;

◦ Solved in less time with multiple compute resources than with a single
compute resource.
LD $12, (100)
ADD $11, $12
SUB $10, $11
INC $10
SW $13, ($10)
int sample1
{
X = sample2()
Return x;
}
float sample3
{
Pi=3.14
Return pi
}
Int sample2()
{
Cin>>I
Return I;
}
Parallel Computing: what for?
 Example applications include:

◦ Parallel Databases, Data Mining

◦ Web Search Engines, Web Based Business Services

◦ Computer-aided diagnosis in medicine

◦ advanced graphics and virtual reality, particularly in the entertainment industry

◦ networked video and multi-media technologies

Why Parallel Computing?
Save time

Solve larger problems

Provide parallelism (do multiple things at the same time)

… ..
Flynn Taxonomy
Based on the number of concurrent instruction (single or multiple)
and data streams (single or multiple) available in the architecture
Single Instruction, Single Data (SISD)
It represents the organization of a single computer containing a control unit,
processor unit and a memory unit.

Single instruction: only one instruction stream is being acted on by the CPU
during any one clock cycle
Single data: only one data stream is being used as input during any one
clock cycle

A single processor executes a single instruction stream, to operate

on data stored in a single memory.

This is the oldest and until recently, the most prevalent form of computer
◦ Examples: most PCs, single CPU workstations and mainframes
Single Instruction, Multiple Data
(SIMD)
Single instruction: All processing units execute the same instruction at any given clock cycle
Multiple data: Each processing unit can operate on a different data element
The processing units are made to operate under the control of a common control unit, thus
providing a single instruction stream and multiple data streams.
◦ Best suited for specialized problems characterized by a high degree of regularity, such as image processing.

Two varieties: Processor Arrays and Vector Pipelines

Examples:
◦ Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2
◦ Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820
50 X 50
{1,2,3,4,5,6,7,8,9,10}
ÞTWICE (*2) Every Element (2500)
ÞMUL 2
Multiple Instruction, Single Data
(MISD)
A single data stream is fed into multiple processing units.

It consists of a single computer containing multiple processors connected with

multiple control units and a common memory unit

Each processing unit operates on the data independently via

independent instruction streams.

Some conceivable uses might be:

◦ multiple frequency filters operating on a single signal stream
LD $3, ($12)
ADD $4, $12
OR$7, $12
Multiple Instruction, Multiple Data
(MIMD)
Currently, the most common type of parallel computer. Most
modern computers fall into this category.
It represents the organization which is capable of processing
several programs at same time.
Multiple Instruction: every processor may be executing a
different instruction stream
Multiple Data: every processor may be working with a
different data stream
Execution can be synchronous or asynchronous, deterministic
or non-deterministic
◦ Examples: most current supercomputers, networked parallel computer
"grids" and multi-processor SMP computers - including some types of PCs.
Some General Parallel Terminology
Task/Process
◦ A logically discrete section of computational work. A task is typically a program or program-like
set of instructions that is executed by a processor.

Parallel Task
◦ A task that can be executed by multiple processors safely (yields correct results)

Serial Execution
◦ Execution of a program sequentially, one statement at a time. In the simplest sense, this is what
happens on a one processor machine. However, virtually all parallel tasks will have sections of a
parallel program that must be executed serially.
Parallel Execution
◦ Execution of a program by more than one task, with each task being able to execute the same or
different statement at the same moment in time.
Shared Memory
◦ From a strictly hardware point of view, describes a computer architecture where all processors have
direct (usually bus based) access to common physical memory.
◦ In a programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly address and access the same
logical memory locations regardless of where the physical memory actually exists.

Distributed Memory
◦ In hardware, refers to network based memory access for physical memory that is not common. As a
programming model, tasks can only logically "see" local machine memory and must use
communications to access memory on other machines where other tasks are executing.
Communications
◦ Parallel tasks typically need to exchange data. There are several ways this can be accomplished, such as
through a shared memory bus or over a network, however the actual event of data exchange is
commonly referred to as communications regardless of the method employed.

Synchronization
◦ The coordination of parallel tasks in real time, very often associated with communications. Often
implemented by establishing a synchronization point within an application where a task may not
proceed further until another task(s) reaches the same or logically equivalent point.

◦ Synchronization usually involves waiting by at least one task, and can therefore cause a parallel
application's wall clock execution time to increase.
Granularity
◦ In parallel computing, granularity is a qualitative measure of the ratio of computation to communication.
◦ Coarse: relatively large amounts of computational work are done between communication events
◦ Fine: relatively small amounts of computational work are done between communication events

Observed Speedup
◦ Observed speedup of a code which has been parallelized, defined as:

◦ One of the simplest and most widely used indicators for a parallel program's performance.
Parallel Overhead
◦ The amount of time required to coordinate parallel tasks, as opposed to doing useful work. Parallel
overhead can include factors such as:
◦ Task start-up time
◦ Synchronizations
◦ Data communications
◦ Software overhead imposed by parallel compilers, libraries, tools, operating system, etc.
◦ Task termination time

Massively Parallel
◦ Refers to the hardware that comprises a given parallel system - having many processors. The meaning of
many keeps increasing, but currently BG/L* pushes this number to 6 digits.

*Blue Gene is an IBM project aimed at designing supercomputers that can reach operating
speeds in the petaFLOPS (PFLOPS) range, with low power consumption.
Scalability
◦ Refers to a parallel system's (hardware and/or software) ability to
demonstrate a proportionate increase in parallel speedup with the addition of
more processors.

◦ Factors that contribute to scalabilty include:

◦ Hardware - particularly memory-cpu bandwidths and network communications
◦ Application Algorithm
◦ Parallel overhead related
◦ Characteristics of your specific application and coding
Parallel Computer
Memory Architectures
Memory architectures
Shared Memory
Distributed Memory
Hybrid Distributed-Shared Memory
Shared Memory
Shared memory parallel computers vary widely, but generally have in common the ability for all processors to access all
memory as global address space.

Multiple processors can operate independently but share the same memory resources.
Changes in a memory location effected by one processor are visible to all other processors.
Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA.
Shared Memory : UMA vs. NUMA
Uniform Memory Access (UMA):
◦ Most commonly represented today by Symmetric Multiprocessor (SMP) machines
◦ Identical processors with equal access and access times to memory
◦ Sometimes called CC-UMA - Cache Coherent UMA.

Non-Uniform Memory Access (NUMA):

◦ Often made by physically linking two or more SMPs
◦ One SMP can directly access memory of another SMP
◦ Not all processors have equal access time to all memories
Shared Memory: Pro and Con
Advantages
◦ Global address space provides a user-friendly programming perspective to memory
◦ Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs

Disadvantages:
◦ Primary disadvantage is the lack of scalability between memory and CPUs. Adding more CPUs can
geometrically increases traffic on the shared memory-CPU path, and for cache coherent systems,
geometrically increase traffic associated with cache/memory management.
◦ Programmer responsibility for synchronization constructs that insure "correct" access of global memory.
◦ Expense: it becomes increasingly difficult and expensive to design and produce shared memory
machines with ever increasing numbers of processors.
Distributed Memory
Like shared memory systems, distributed memory systems vary widely but share a common characteristic. Distributed
memory systems require a communication network to connect inter-processor memory.
Processors have their own local memory. Memory addresses in one processor do not map to another processor, so
there is no concept of global address space across all processors.
Because each processor has its own local memory, it operates independently. Changes it makes to its local memory have
no effect on the memory of other processors. Hence, the concept of cache coherency does not apply.
When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define
how and when data is communicated. Synchronization between tasks is likewise the programmer's responsibility.
The network "fabric" used for data transfer varies widely, though it can can be as simple as Ethernet.
Distributed Memory: Pro and Con
Advantages
◦ Memory is scalable with number of processors. Increase the number of processors and the size of
memory increases proportionately.
◦ Each processor can rapidly access its own memory without interference and without the overhead
incurred with trying to maintain cache coherency.
◦ Cost effectiveness: can use commodity, off-the-shelf processors and networking.

Disadvantages
◦ The programmer is responsible for many of the details associated with data communication between
processors.
◦ It may be difficult to map existing data structures, based on global memory, to this memory
organization.
◦ Non-uniform memory access (NUMA) times
Hybrid Distributed-Shared Memory
Comparison of Shared and Distributed Memory Architectures

Architecture CC-UMA CC-NUMA Distributed

Examples SMPs Bull NovaScale Cray T3E

Sun Vexx SGI Origin Maspar
DEC/Compaq Sequent IBM SP2
SGI Challenge HP Exemplar IBM BlueGene
IBM POWER3 DEC/Compaq
IBM POWER4 (MCM)

Communications MPI MPI MPI

Threads Threads
OpenMP OpenMP
shmem shmem

Scalability to 10s of processors to 100s of processors to 1000s of processors

Draw Backs Memory-CPU bandwidth Memory-CPU bandwidth System administration

Non-uniform access times Programming is hard to develop
and maintain
Software Availability many 1000s ISVs many 1000s ISVs 100s ISVs

CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
90 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
28 pages
Parallel and Distributed Computing
100% (2)
Parallel and Distributed Computing
20 pages
Mid Paper (Parallel and Distributed Computing)
100% (1)
Mid Paper (Parallel and Distributed Computing)
1 page
CS621 - Handouts - Mids
No ratings yet
CS621 - Handouts - Mids
61 pages
Intro to Parallel Computing
No ratings yet
Intro to Parallel Computing
47 pages
5 Software - Architectures - Detailed - PPT
100% (1)
5 Software - Architectures - Detailed - PPT
12 pages
Introduction to Parallel and Distributed Computing
100% (1)
Introduction to Parallel and Distributed Computing
25 pages
Parallel Computing Course Overview
No ratings yet
Parallel Computing Course Overview
19 pages
CS - 687 Parallel and Distributed Computing
100% (2)
CS - 687 Parallel and Distributed Computing
3 pages
Lecture 12 - Parallel IO, Performance Analysis and Tuning, Power
No ratings yet
Lecture 12 - Parallel IO, Performance Analysis and Tuning, Power
20 pages
Parallel and Distributed Computing Lecture#12
No ratings yet
Parallel and Distributed Computing Lecture#12
19 pages
Parallel and Distributed Computing Test One October 2022
100% (1)
Parallel and Distributed Computing Test One October 2022
3 pages
PDC 1 - PD Computing
No ratings yet
PDC 1 - PD Computing
12 pages
07 Parallel Algorithms in Parallel and Distributed Computing
No ratings yet
07 Parallel Algorithms in Parallel and Distributed Computing
13 pages
Software Engineering Lecture 5 Requirement Engineering
No ratings yet
Software Engineering Lecture 5 Requirement Engineering
35 pages
Parallel and Distributed Computing Handout
100% (3)
Parallel and Distributed Computing Handout
3 pages
Distributed Vs Parallel Computing
No ratings yet
Distributed Vs Parallel Computing
31 pages
Software Crisis
No ratings yet
Software Crisis
120 pages
Parallel & Distributed Computing Course
50% (2)
Parallel & Distributed Computing Course
2 pages
Chapter-2 Process Management
100% (1)
Chapter-2 Process Management
56 pages
Assignment: Parallel and Distributed Computing Submitted To: Sir Shoaib Date: 25-03-2019
No ratings yet
Assignment: Parallel and Distributed Computing Submitted To: Sir Shoaib Date: 25-03-2019
5 pages
Hardware: Architectures
No ratings yet
Hardware: Architectures
33 pages
Parallel and Distributed Algorithms: Johnnie W. Baker
No ratings yet
Parallel and Distributed Algorithms: Johnnie W. Baker
67 pages
CSC 211 Operating System Lab Exam Guide
100% (1)
CSC 211 Operating System Lab Exam Guide
4 pages
Virtual System and Services
50% (2)
Virtual System and Services
5 pages
Chapter 1 (Part 1) : Computer Networking: A Top Down Approach
No ratings yet
Chapter 1 (Part 1) : Computer Networking: A Top Down Approach
43 pages
Chapter No. 01
No ratings yet
Chapter No. 01
45 pages
Exercises & LAB 1: Exercises - Silberschatz 9'th Edition, Operating Systems Concepts
No ratings yet
Exercises & LAB 1: Exercises - Silberschatz 9'th Edition, Operating Systems Concepts
5 pages
CS416: Intro to Parallel Computing
No ratings yet
CS416: Intro to Parallel Computing
20 pages
Chapter 1-Introduction To Distributed Systems
No ratings yet
Chapter 1-Introduction To Distributed Systems
59 pages
Assessing Software Architecture Styles
No ratings yet
Assessing Software Architecture Styles
14 pages
What Is Serial Computing?: Traditionally, Software Has Been Written For Serial Computation
No ratings yet
What Is Serial Computing?: Traditionally, Software Has Been Written For Serial Computation
22 pages
Distributed System
No ratings yet
Distributed System
14 pages
EE 213 COAL Course Outline - Fall 2015
No ratings yet
EE 213 COAL Course Outline - Fall 2015
2 pages
Object-Oriented System Design Plan
No ratings yet
Object-Oriented System Design Plan
3 pages
Linker and Loader
100% (1)
Linker and Loader
25 pages
Understanding Peripheral Control Interrupts
No ratings yet
Understanding Peripheral Control Interrupts
1 page
Distributed System 25 Questions
No ratings yet
Distributed System 25 Questions
19 pages
Past Papers 7th Sem CE19
No ratings yet
Past Papers 7th Sem CE19
13 pages
CPU Scheduling in Operating Systems
No ratings yet
CPU Scheduling in Operating Systems
17 pages
VU Amsterdam PDCS Course Overview 2012-2013
No ratings yet
VU Amsterdam PDCS Course Overview 2012-2013
29 pages
Parallel Performance Analysis and Tuning
No ratings yet
Parallel Performance Analysis and Tuning
8 pages
Software Requirements Course
No ratings yet
Software Requirements Course
25 pages
Principles of Programming Languages Overview
No ratings yet
Principles of Programming Languages Overview
41 pages
CS708 Lecture Handouts
50% (2)
CS708 Lecture Handouts
202 pages
Unit I
No ratings yet
Unit I
53 pages
Anandh Shinte Introduction - To - Cyber - Security
No ratings yet
Anandh Shinte Introduction - To - Cyber - Security
327 pages
Types of Parallel Computing
No ratings yet
Types of Parallel Computing
11 pages
Lecture - 24 24 Parallel and Distributed Databases Parallel and Distributed Databases
No ratings yet
Lecture - 24 24 Parallel and Distributed Databases Parallel and Distributed Databases
23 pages
Parallel & Distributed Computing
100% (1)
Parallel & Distributed Computing
52 pages
Parallel and Distributed Computing Course Syllabus
No ratings yet
Parallel and Distributed Computing Course Syllabus
3 pages
Roll Number Slip for GCUF Students 2023
No ratings yet
Roll Number Slip for GCUF Students 2023
21 pages
Parallel Processing Practice Questions
100% (2)
Parallel Processing Practice Questions
1 page
Numerical Methods for Continuous Simulation
0% (1)
Numerical Methods for Continuous Simulation
7 pages
COSC 4101 Parallel and Distributed Computing Final
100% (1)
COSC 4101 Parallel and Distributed Computing Final
4 pages
Usability Engineering : Metrics and Measures For Evaluations
No ratings yet
Usability Engineering : Metrics and Measures For Evaluations
24 pages
Professional Practices Course Outline
No ratings yet
Professional Practices Course Outline
2 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
Digital Logic Design Exam Key
No ratings yet
Digital Logic Design Exam Key
5 pages
OS Quiz 1 (Chapters 1-3) Flashcards - Quizlet
No ratings yet
OS Quiz 1 (Chapters 1-3) Flashcards - Quizlet
12 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
Chap3 Slides Week4
No ratings yet
Chap3 Slides Week4
42 pages
Chap2 Slides Week3
No ratings yet
Chap2 Slides Week3
28 pages
Functionality Added in M-16DX 2.0
No ratings yet
Functionality Added in M-16DX 2.0
24 pages
Oil Tanker Design & Systems
No ratings yet
Oil Tanker Design & Systems
21 pages
Philips Zenition 50 C Arm
50% (2)
Philips Zenition 50 C Arm
2 pages
Basic Electronics: Dr. Prasanta Kumar Guha
No ratings yet
Basic Electronics: Dr. Prasanta Kumar Guha
39 pages
Schools in Malappuram and Courses
No ratings yet
Schools in Malappuram and Courses
17 pages
AC - H1 Math 2014
No ratings yet
AC - H1 Math 2014
8 pages
Be Sharp With C# (Table of Contents)
100% (1)
Be Sharp With C# (Table of Contents)
12 pages
Avl Project
No ratings yet
Avl Project
29 pages
Class X - PPT - Part 7 of 8 - Chapter2 - Acids, Bases and Salts
No ratings yet
Class X - PPT - Part 7 of 8 - Chapter2 - Acids, Bases and Salts
21 pages
Chemical Engineering Mock Exam
No ratings yet
Chemical Engineering Mock Exam
42 pages
Tascam DM 24 Manual de Usuario
100% (1)
Tascam DM 24 Manual de Usuario
12 pages
DM10 Module Update Instructions
No ratings yet
DM10 Module Update Instructions
3 pages
Class 12 Indefinite Integrals Assignment
No ratings yet
Class 12 Indefinite Integrals Assignment
4 pages
QUESTION MEE 431 Creep, Fatigue
No ratings yet
QUESTION MEE 431 Creep, Fatigue
2 pages
Smart Three Phase Reference Standard Meter
No ratings yet
Smart Three Phase Reference Standard Meter
4 pages
Republic of The Philippines Isabela State University Echague, Isabela
No ratings yet
Republic of The Philippines Isabela State University Echague, Isabela
11 pages
Lecture 06 - NUMERICALS ON GATING AND RISER DESIGN
No ratings yet
Lecture 06 - NUMERICALS ON GATING AND RISER DESIGN
5 pages
CEGP013091: 49.248.216.238 06/06/2023 10:20:54 Static-238
No ratings yet
CEGP013091: 49.248.216.238 06/06/2023 10:20:54 Static-238
181 pages
Francis Turbine Guide Vane Study
No ratings yet
Francis Turbine Guide Vane Study
8 pages
Experiment - Buckling of Strut
No ratings yet
Experiment - Buckling of Strut
2 pages
Soil Mechanics: Engr. Cheryll C. Malibiran
No ratings yet
Soil Mechanics: Engr. Cheryll C. Malibiran
32 pages
JNTUK R19 C Programming Lab
No ratings yet
JNTUK R19 C Programming Lab
30 pages
Introduction To Reliefs
100% (1)
Introduction To Reliefs
50 pages
Towards A Low-Carbon Future Exploring Green Urea S
No ratings yet
Towards A Low-Carbon Future Exploring Green Urea S
22 pages
Combined Adv No 12-2023
No ratings yet
Combined Adv No 12-2023
19 pages
Plasma Proteins-II
No ratings yet
Plasma Proteins-II
25 pages
Operating Instructions: Metering Pump Pneumados PNDB
No ratings yet
Operating Instructions: Metering Pump Pneumados PNDB
32 pages
1994 J.Y. Wong - On The Role of Mean Maximum Pressure As An Indicator of Cross-Country Mobility For Tracked Vehicles
No ratings yet
1994 J.Y. Wong - On The Role of Mean Maximum Pressure As An Indicator of Cross-Country Mobility For Tracked Vehicles
17 pages
Principles of Concurrency and Parallelism
No ratings yet
Principles of Concurrency and Parallelism
22 pages
Sulphur Ash & Residue Testing Guide
No ratings yet
Sulphur Ash & Residue Testing Guide
4 pages

CS3006 Parallel Computing Course Overview

Uploaded by

CS3006 Parallel Computing Course Overview

Uploaded by

CS3006 Parallel and

Recommended Text Book

Introduction to Parallel Computing, Second Edition

Week 2: Introduction to Parallel Computing, Assignment Task(s)

Week 3: Parallel Programming Platforms

Week 4: Parallel Programming Platforms , Assignment Task(s)

Week 5: Principles of Parallel Algorithm Design, Quiz-1

Week 6: Mid Term-1 Examinations

Week 8: Programming Shared Address Space

Week 9: Programming Shared Address Space

Week 10: Programming Shared Address Space, Assignment Task(s)

Week 11: Programming Using the Message Passing Paradigm, Quiz-2

Week 12: Mid Term-2 Examinations

Multiple Processors vs Multiple Cores

Concurrent Execution on a Single-core System

Parallel Execution on a Multicore System

Economic limitations - it is increasingly expensive to make a single processor faster.

Instructions from each part execute simultaneously

◦ Execute multiple program instructions at any moment in time;

◦ Parallel Databases, Data Mining

◦ Web Search Engines, Web Based Business Services

◦ Computer-aided diagnosis in medicine

◦ advanced graphics and virtual reality, particularly in the entertainment industry

◦ networked video and multi-media technologies

Solve larger problems

Provide parallelism (do multiple things at the same time)

A single processor executes a single instruction stream, to operate

Two varieties: Processor Arrays and Vector Pipelines

It consists of a single computer containing multiple processors connected with

Each processing unit operates on the data independently via

Some conceivable uses might be:

◦ Factors that contribute to scalabilty include:

Non-Uniform Memory Access (NUMA):

Architecture CC-UMA CC-NUMA Distributed

Examples SMPs Bull NovaScale Cray T3E

Communications MPI MPI MPI

Scalability to 10s of processors to 100s of processors to 1000s of processors

Draw Backs Memory-CPU bandwidth Memory-CPU bandwidth System administration

You might also like