0% found this document useful (0 votes)

4 views8 pages

High Performance Computing

The document discusses high-performance computing, focusing on the differences between serial and parallel computing. It highlights the limitations of serial computing and the advantages of parallel computing, including faster processing and better fault tolerance. Various methods of parallelism, challenges, classifications of parallel computers, and parallel algorithms are also explored.

Uploaded by

nehal.mahajan.ug22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views8 pages

High Performance Computing

Uploaded by

nehal.mahajan.ug22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

HIGH PERFORMANCE COMPUTING

A single processor computer consists of:

- Input Unit: which accepts the list of instructions to solve a problem (a program) and
the relevant data
- Memory/Storage Unit: where program, data and intermediate results are stored
- Processing Unit: which interprets and executes the instructions
- Output Unit: which displays or prints the results

This structure of a computer is known as Von Neumann Architecture.

In this,
- A program is first stored in the memory
- The PE retrieves one instruction at a time and executes it.
Hence the operation is sequential.

SERIAL COMPUTING
- Program broken down into discrete instructions
- Instruction executed one by one
- One instruction at a time

PROBLEM WITH SERIAL COMPUTING:

The speed of the sequential computer is thus limited by the speed at which a PE can retrieve
instructions and data from the memory and the speed at which it can process the retrieved
data.

To increase the speed of processing of data one may increase the speed of the PE by
increasing the clock speed. The clock speed increased from a few hundred kHz in the 1970s
to 3 GHz in 2005. It was difficult to increase the clock speed further as the chip was getting
overheated.

Does that mean we cannot achieve greater performance if we are unable to develop more
efficient processors?
NO, WE CAN
The number of transistors which could be integrated in a chip could, however, be doubled
every two years. Thus, processor designers placed many processing “cores” inside the
processor chip to increase its effective throughput. The processor retrieves a sequence of
instructions from the main memory and stores them in an on-chip memory. The “cores” can
then cooperate to execute these instructions in parallel. HENCE CAME PARALLELISM.

HENCE, LIMITATIONS OF SERIAL COMPUTING

- Waits for input output events: slow operation: weak performance
- Single point of failure
- Complex applications: real time simulations: gaming, imaging, database applications
- Limited RAM

PARALLEL COMPUTING
- Use multiple cores to perform several operations at once
- Much faster than sequential for doing repetitive calculations on vast amount of data
- Single computation problem divided into pieces for simultaneous work: Parallel
Processing
Multiple computational units for multiple unrelated problems: Multiprocessing

ADVANTAGES OF PARALLEL COMPUTING:

- Faster
- Complex computational applications
- Power consumption
- Reliable
- Better quality of solution. When arithmetic operations are distributed to many
computers, each one does a smaller number of arithmetic operations. Thus, rounding
errors are lower when parallel computers are used.
- Better algorithms. The availability of many computers that can work simultaneously
leads to different algorithms which are not relevant for purely sequential computers. It
is possible to explore different facets of a solution simultaneously using several
processors and these give better insight into solutions of several physical problems.
- Better storage distribution. Certain types of parallel computing systems provide much
larger storage which is distributed. Access to the storage is faster on each computer.
This feature is of special interest in many applications such as information retrieval
and computer aided design.

CHALLENGES OF PARALLEL COMPUTING:

- Synchronization and communication between the processors and sub-processes
- Algorithms need to be designed so as to handle parallelism, must have low coupling
and high cohesion, and skilled programmers hence needed

BUT HOW WILL PARALLELISM WORK?

(Different ways in which a job can be solved using parallelism)

Method 1: TEMPORAL PARALLELISM

This method breaks up a job into a set of tasks to be executed overlapped in time. (different
subtasks simultaneous at one point of time)
This method of parallel processing is appropriate if:
1. The jobs to be carried out are identical for all data
2. A job can be divided into many independent tasks
3. The time taken for each task is the same.
4. The time taken to send a job from one teacher to the next is negligible compared to the
time needed to do a task.
5. The number of tasks is much smaller as compared to the total number of jobs to be done.

Calculations:
If n jobs, p time to complete a job, and each job into k subtasks
Assuming time taken for completing each task: p/k
Time taken to complete all jobs without parallelism: np
Time taken to complete all jobs with temporal parallelism: p + (n-1)*p/k
SpeedUp: without/with =

If n >> k then (k – 1)/n 0 and the speedup is nearly equal to k

Problem with this method of parallelism:

1. Synchronization: to have identical times for doing each task
2. Bubbles in pipelines: some tasks absent in a job. For example, if there are some
answer books with only 2 questions answered, two teachers will be forced to be idle
during the time allocated to them to correct these answers.
3. Fault Tolerance: one task (stage) fails, whole pipeline is abrupt
4. Inter-task communication: should be small
5. Scalability: n>>k must hold

Advantage:
Efficient: each stage in the pipeline can be optimized to do a specific task well.

*It was the main technique used by vector supercomputers such as CRAY to attain their high
speed.

Method 2: DATA PARALLELISM

In this method, the input data is divided into independent sets and processed
simultaneously.

Calculations:
If n jobs, p time to complete a job, k teachers for data to be distributed between
Assuming time taken to distribute the data into k processors: kq (proportional to k)
Time taken to complete all jobs without parallelism: np
Time taken to complete all jobs with parallelism: kq + (n/k)*p
SpeedUp: without/with =
If k^2q << np then the speedup is nearly equal to k, the number of teachers working
independently. Observe that this will be true if the time to distribute the jobs is small.
And the number of processes is also small. (because distribution time also increases)

Efficiency: actual speedup/maximum possible speedup = actual speedup/k

Observe that the speedup is not directly proportional to the number of teachers as the time
to distribute jobs to teachers (which is an unavoidable overhead) increases as the number of
teachers is increased

Advantages:
1. Synchronization: Not required, each teacher can check on his/her own pace
2. Bubbles: absent, no need to sit idle
3. Fault Tolerance: More fault tolerant, doesn’t affect the whole work
4. Inter-task communication: not required (hence no delay), processors are independent

Disadvantages:
1. No specialization, each teacher must be able to correct all answers
2. Division into mutually independent jobs with same time
3. Distribution time must be small, k should be small
4. The assignment of jobs to each teacher is pre-decided. This is called a static
assignment. Thus, if a teacher is slow then the completion time of the total job will be
slowed down. If another teacher gets many blank answer books he will complete his
work early and will thereafter be idle. Thus, a static assignment of jobs is not efficient.

Method 3: MIXED PARALLELISM

Combination of both temporal and data. Called Parallel Pipeline Processing.
More than one pipeline is formed and data distributed to each.
The method is effective only if the number of jobs given to each pipeline is much larger than
the number of stages in the pipeline.
Multiple pipeline processing was used in supercomputers such as Cray and NEC-SX as this
method is very efficient for numerical computing in which a number of long vectors and large
matrices are used as data and could be processed simultaneously

CLASSIFICATION OF PARALLEL COMPUTERS

Basis:
1. HOW DO INSTRUCTIONS AND DATA FLOW IN THE SYSTEM (How instructions
process data): Flynn’s Classification
2. WHAT IS COUPLING BETWEEN THE PEs
3. HOW DO PEs ACCESS MEMORY
4. WHAT IS THE QUANTUM OF WORK DONE BY A PE BEFORE IT
COMMUNICATES WITH ANOTHER PE (What’s the grain size of computation)

1. FLYNN’S CLASSIFICATION
SISD: Computer with single processor, a single stream of instructions and a single
stream of data are accessed by the PE from the main memory, processed and the
results stored back in the main memory
SIMD: There is no explicit communication among processors. However, data paths
between nearest neighbours are used in some structures. SIMD computers have
also been built as a grid with communication between nearest neighbours. All
processors in this structure are given identical instructions to execute and they
execute them in a lock-step-fashion (simultaneously) using data in their memory.
All PEs work synchronously controlled by a single stream of instructions. An
instruction may be broadcast to all PEs and they can process data items fed to them
using this instruction. If instead of a single instruction, the PEs use identical programs
to process different data streams, such a parallel computer is called a Single
Program Multiple Data (SPMD) Computer

MISD: Different PEs run different programs on the same data

MIMD:

DEPENDENCES:
The ability to execute several program segments in parallel requires each segment to be
independent of the other segments.
We use a dependence graph to describe the relations. The nodes of a dependence graph
correspond to the program statements [instructions], and the directed edges with different
labels show the ordered relations among the statements. The analysis of dependence
graphs shows where opportunity exists for parallelization and vectorization.

Dependences are of various types:

DATA DEPENDENCE: Two statements have a data dependence if they cannot be executed
simultaneously because of conflicting uses of the same variable. The ordering relationship
between statements is indicated by data dependence.
Five types of data dependence:
Flow (RAW), Anti(WAR), Output(WAW), I/O(Access to same file), Unknown(Lol)

CONTROL DEPENDENCE:
RESOURCE DEPENDENCE:

PARALLEL ALGORITHMS
A parallel algorithm defines how a given problem can be solved on a given parallel
computer, i.e., how the problem is divided into subproblems, how the processors
communicate, and how the partial solutions are combined to produce the final result.

Parallel algorithms depend on the kind of parallel computer they are designed for. In order to
simplify the design and analysis of parallel algorithms, parallel computers are represented by
various abstract machine models.

Models:
RAM: Random Access Machine
Abstracts the sequential computer

MAU is the address mapping unit from the logical address provided by the CPU to the
physical address in main memory.

Any step of an algorithm designed for RAM Model consists of three basic phases:
Read (from memory and store in local register), Write, Execute (No role of memory)

PRAM: Parallel Random Access Machine

One of the popular models for designing parallel algorithms.

(N identical processors)
Shared memory also functions as the communication medium for the processors.
A PRAM can be used to model both SIMD and MIMD machines.

Three basic phases for any step: Read, Write and Compute

In this model, one memory location can be accessed by more than one processor
simultaneously. This is a problem. The PRAM model can be subdivided into four categories
based on the way simultaneous memory accesses are handled:
- Exclusive Read Exclusive Write (EREW) PRAM: any memory access is exclusive,
provides least amount of memory concurrency, hence weakest
- Concurrent Read Exclusive Write (CREW) PRAM: can concurrently read and write
is exclusive, most commonly used
- ERCW: never used
- CRCW: most memory concurrency, most powerful

Concurrent Write: There are several protocols that are used to specify the value that is
written to a memory in such a situation. They are:
- Priority CW: one with the highest assigned priority is assigned to write in case of
conflict
- Common CW: allowed if same value to be written
- Combining CW: output of a function that maps those multiple values to a single value
is written
- Arbitrary CW: any one of the processors may succeed in writing, and others fail

K P P Abhilash Emergency Medicine Best Practices at CMC EMAC 2018
100% (1)
K P P Abhilash Emergency Medicine Best Practices at CMC EMAC 2018
531 pages
Code Blue PDF
No ratings yet
Code Blue PDF
9 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
Easy Love Spell
50% (2)
Easy Love Spell
2 pages
Ppt1 Lecture 1 Distributed and Parallel Computing CSE423
No ratings yet
Ppt1 Lecture 1 Distributed and Parallel Computing CSE423
24 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
28 pages
COA - Module-5
No ratings yet
COA - Module-5
35 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Unit 1
No ratings yet
Unit 1
22 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Basics of Parallel Programming: Unit-1
No ratings yet
Basics of Parallel Programming: Unit-1
79 pages
Logical Fallacies
100% (1)
Logical Fallacies
52 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
90 pages
Deloitte Full Test 1 Q
No ratings yet
Deloitte Full Test 1 Q
13 pages
BacaanMinggu2 - Terminologi Komputasi Paralel
No ratings yet
BacaanMinggu2 - Terminologi Komputasi Paralel
38 pages
Parallel Computing MCSE011
No ratings yet
Parallel Computing MCSE011
189 pages
Philippine Education: Where We Are, Basic Characteristics, Issues and Concerns
No ratings yet
Philippine Education: Where We Are, Basic Characteristics, Issues and Concerns
56 pages
Parallel Computers
No ratings yet
Parallel Computers
19 pages
Parallel Archtecture and Computing
No ratings yet
Parallel Archtecture and Computing
65 pages
UNIT-2 PP FlynnsClassification
No ratings yet
UNIT-2 PP FlynnsClassification
80 pages
P 1
No ratings yet
P 1
44 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
47 pages
Parallel 123
No ratings yet
Parallel 123
28 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
38 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
28 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
PC 1
No ratings yet
PC 1
53 pages
Unit2 A
No ratings yet
Unit2 A
70 pages
Computer Achitecture II - Parallel - Computing
No ratings yet
Computer Achitecture II - Parallel - Computing
46 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
Parallel Computing A Comparative
No ratings yet
Parallel Computing A Comparative
65 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
No ratings yet
A Presentation On Parallel Computing: - Ameya Waghmare (Rno 41, BE CSE) Guided by-Dr.R.P.Adgaonkar (HOD), CSE Dept
32 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Parallel Programming - Unit 1
No ratings yet
Parallel Programming - Unit 1
81 pages
Module 1
No ratings yet
Module 1
14 pages
Parallel Computing
No ratings yet
Parallel Computing
19 pages
001 - DDS IIIT Jan 10th
No ratings yet
001 - DDS IIIT Jan 10th
34 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Lecture-2-06 01 2025
No ratings yet
Lecture-2-06 01 2025
21 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Introduction
No ratings yet
Introduction
17 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Pda 2
No ratings yet
Pda 2
105 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Multiple Choice Questions (1-5) 1 Tick For Each Correct Answer PDF
No ratings yet
Multiple Choice Questions (1-5) 1 Tick For Each Correct Answer PDF
2 pages
Week1 Parallel and Distributed Computing
No ratings yet
Week1 Parallel and Distributed Computing
55 pages
Lecture Notes On Parallel Computation
No ratings yet
Lecture Notes On Parallel Computation
30 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Parallel VS Distributed Computing
No ratings yet
Parallel VS Distributed Computing
9 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Briandavidphillips - Core Skills Hypnosis DVD Course
No ratings yet
Briandavidphillips - Core Skills Hypnosis DVD Course
6 pages
PDC 3
No ratings yet
PDC 3
26 pages
Rebranding and Revitalisation
100% (1)
Rebranding and Revitalisation
7 pages
Multiprocessing Vs Multithreading 2
No ratings yet
Multiprocessing Vs Multithreading 2
16 pages
Module - 4 - Parallel Processing
No ratings yet
Module - 4 - Parallel Processing
32 pages
State Budget 2025-26
No ratings yet
State Budget 2025-26
13 pages
Unit 1
No ratings yet
Unit 1
22 pages
Name: - Date: - Grade & Section: - Score
No ratings yet
Name: - Date: - Grade & Section: - Score
2 pages
Mental Math Slide Show
No ratings yet
Mental Math Slide Show
22 pages
Unit 9 Simple Linear Regression: Structure
No ratings yet
Unit 9 Simple Linear Regression: Structure
22 pages
Signature Assignment Art Analysis-Final Paper
No ratings yet
Signature Assignment Art Analysis-Final Paper
5 pages
1.1 Identify Ty
No ratings yet
1.1 Identify Ty
7 pages
Water Cooled Cofigured Brochure A4 Revsd Low Res
No ratings yet
Water Cooled Cofigured Brochure A4 Revsd Low Res
16 pages
ANCHORE
No ratings yet
ANCHORE
2 pages
M4 Merge PDF
No ratings yet
M4 Merge PDF
68 pages
Tax Problems
No ratings yet
Tax Problems
3 pages
IClebo Arte User Guide-English
No ratings yet
IClebo Arte User Guide-English
20 pages
Rez Sisters Thesis
100% (3)
Rez Sisters Thesis
7 pages
15MW Periodic Maintenace Schedule
No ratings yet
15MW Periodic Maintenace Schedule
8 pages
Idsl 865 Leveraging Human Resources
No ratings yet
Idsl 865 Leveraging Human Resources
7 pages
ST LINES + CIRCLES TOP 200 PYQs of JEE Mains 2022
No ratings yet
ST LINES + CIRCLES TOP 200 PYQs of JEE Mains 2022
60 pages
Bell, SOME EXPERIMENTS IN DIAGNOSTIC TEACHING
No ratings yet
Bell, SOME EXPERIMENTS IN DIAGNOSTIC TEACHING
23 pages
Mobilink Packages FF
No ratings yet
Mobilink Packages FF
6 pages
What Is Weather in Canada
No ratings yet
What Is Weather in Canada
5 pages
Differential Protection of A Potential Transformer
No ratings yet
Differential Protection of A Potential Transformer
24 pages
Spring Lighting 2013 - HKD1800 Travel Reimbursement
No ratings yet
Spring Lighting 2013 - HKD1800 Travel Reimbursement
1 page
UEFA Euro 2020 Case Study
No ratings yet
UEFA Euro 2020 Case Study
3 pages
Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

High Performance Computing

Uploaded by

High Performance Computing

Uploaded by

HIGH PERFORMANCE COMPUTING

A single processor computer consists of:

This structure of a computer is known as Von Neumann Architecture.

PROBLEM WITH SERIAL COMPUTING:

HENCE, LIMITATIONS OF SERIAL COMPUTING

ADVANTAGES OF PARALLEL COMPUTING:

CHALLENGES OF PARALLEL COMPUTING:

BUT HOW WILL PARALLELISM WORK?

Method 1: TEMPORAL PARALLELISM

If n >> k then (k – 1)/n 0 and the speedup is nearly equal to k

Problem with this method of parallelism:

Method 2: DATA PARALLELISM

Efficiency: actual speedup/maximum possible speedup = actual speedup/k

Method 3: MIXED PARALLELISM

CLASSIFICATION OF PARALLEL COMPUTERS

MISD: Different PEs run different programs on the same data

Dependences are of various types:

PRAM: Parallel Random Access Machine

You might also like