0% found this document useful (0 votes)

59 views18 pages

CH 1 Intro To Parallel Architecture

The document provides an introduction to parallel architecture. It discusses the growing need for parallel architecture due to demands for improved performance and ability to process large volumes of data swiftly. Some key points: 1) Parallel architecture uses multiple processors simultaneously to perform tasks for benefits like exponentially higher performance, faster responses for real-time applications, and ability to handle big data and train machine learning models swiftly. 2) Major applications of parallel architecture include high-performance computing, real-time applications, data analytics, machine learning, scientific research, and cryptography. 3) The evolution of multi-core CPUs and specialized hardware like GPUs has made parallel computing more accessible and driven demand from increasingly complex applications.

Uploaded by

digvijay dhole

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views18 pages

CH 1 Intro To Parallel Architecture

Uploaded by

digvijay dhole

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

PCA: Chapter 1: Introduction to parallel architecture

Chapter 1: Introduction to parallel architecture

Introduction: Need of parallel architecture, application trends, Technology trends,

Architectural trends, Architectures and its classification, Evolution of parallel processors,
current & future trends towards parallel processors. Principles of pipelining and array
processing. Classification of pipelined processors, Instruction Pipeline design, performance
evaluation factors

❖ Need of parallel architecture

In today's rapidly evolving digital landscape, the need for parallel architecture has become
increasingly crucial. Parallel architecture refers to the use of multiple processors working
together simultaneously to perform computational tasks. This approach has gained immense
significance due to the growing demands for enhanced performance, improved efficiency, and
the ability to process large volumes of data swiftly. In this note, we will explore the need for
parallel architecture, providing insights through bullet points, case studies, and examples.

1. Performance Enhancement:

• High-Performance Computing (HPC): Parallel architecture plays a pivotal role in

high-performance computing, where complex simulations and data-intensive tasks are
prevalent. By employing multiple processors, HPC systems can deliver exponentially
higher performance compared to single-processor systems.

Case Study: The Large Hadron Collider (LHC) at CERN, one of the most significant
scientific experiments, utilizes a massive parallel computing cluster to analyse vast
amounts of data generated from particle collisions. This parallel processing enables the
LHC to make ground breaking discoveries in particle physics.

• Real-time Applications: Applications such as real-time gaming, video rendering, and

AI-driven tasks require immediate responses. Parallel architecture allows these
applications to distribute processing loads across multiple cores, resulting in smoother
and faster user experiences.

*Example: Modern gaming consoles, like the Xbox Series X and PlayStation 5, employ
multi-core processors to deliver superior graphics and real-time gameplay.

1|P r ep are d b y : Priy an ka Mo re

PCA: Chapter 1: Introduction to parallel architecture

2. Handling Big Data:

• Data Analytics: With the exponential growth of data, organizations need to process
and analyze vast datasets swiftly. Parallel architecture is indispensable for distributed
data analytics, allowing organizations to extract valuable insights from their data.

Case Study: Google's BigQuery, a data warehouse that can scan and analyse terabytes
of data within seconds, utilizes parallel processing to handle massive datasets and
deliver real-time queries.

• Machine Learning: Training deep learning models, such as neural networks, is

computationally intensive. Parallel architecture accelerates the training process by
distributing the workload across multiple GPUs or TPUs (Tensor Processing Units).

*Example: The NVIDIA Tesla V100, a GPU designed for AI and deep learning, employs
thousands of cores to parallelize tasks, significantly reducing the time required to train
complex machine learning models.

3. Scalability and Resource Efficiency:

• Scalability: Parallel architecture allows systems to scale up easily by adding more

processors. This flexibility is critical for businesses and data centres, as they can adapt
to changing workloads without overhauling their infrastructure.

*Example: Cloud computing platforms like Amazon Web Services (AWS) and Microsoft
Azure offer scalable, parallel computing instances that can be adjusted to meet specific
resource demands.

• Energy Efficiency: Parallel architectures can be more energy-efficient than running a

single, high-power processor. In many cases, parallel processors can distribute tasks
efficiently, reducing overall power consumption.

*Example: ARM's big.LITTLE architecture combines high-performance cores with

power-efficient cores in a single chip, allowing mobile devices to switch between them
as needed, thereby conserving battery life.

4. Parallelism in Scientific Research:

2|P r ep are d b y : Priy an ka Mo re

PCA: Chapter 1: Introduction to parallel architecture

• Genomic Sequencing: Genomic research, like DNA sequencing, generates colossal

datasets. Parallel computing accelerates the analysis and comparison of genetic
information.

*Case Study: The Human Genome Project used a parallel computing approach to map
the entire human genome. This monumental scientific achievement was made possible
through the parallel processing of DNA sequences.

• Climate Modelling: Climate scientists use parallel architecture to run complex climate
models, simulating various climate scenarios and understanding the impact of climate
change.

*Case Study: The European Centre for Medium-Range Weather Forecasts (ECMWF)
employs a supercomputer with parallel architecture to run global weather forecasts and
climate simulations.

5. Enhanced Security and Cryptography:

• Cryptography: Encryption and decryption processes often require parallelism to

ensure data security. Cryptographic algorithms benefit from parallel architecture to
process data more securely and quickly.

*Example: The Advanced Encryption Standard (AES), widely used for data encryption,
can be accelerated through parallel processing, making it suitable for securing data
transmissions in real-time.

6. Evolution of Hardware and Technology:

• Processor Evolution: The development of multi-core processors, GPUs, and

specialized hardware accelerators has made parallel computing more accessible and
affordable.

*Example: Intel's Core i9 processors and AMD's Ryzen processors are consumer-grade
CPUs with multiple cores, making parallel processing capabilities readily available for
personal computers.

• Specialized Hardware: The advent of specialized hardware, like GPUs and TPUs, has
enabled parallel processing for specific tasks, such as graphics rendering, AI, and deep
learning.

3|P r ep are d b y : Priy an ka Mo re

PCA: Chapter 1: Introduction to parallel architecture

*Example: NVIDIA's GeForce RTX 30 series GPUs are designed for parallel
processing, making them popular choices for gaming, content creation, and AI
applications.

In conclusion, the need for parallel architecture has grown exponentially as our digital world
becomes more data-driven and reliant on high-performance computing. The examples and case
studies provided underscore the critical role of parallel architecture in various domains, from
scientific research to real-time applications, data analytics, and security. With the continuous
evolution of hardware and technology, parallel architecture is poised to remain a fundamental
component of the computing landscape, enabling us to meet the demands of the future
effectively.

❖ Application Trends

• The need for making applications work faster is something we see in every part of
computing. As computer technology gets better, we can do more things with our
applications. But this also means these applications become more demanding and need
even better technology. So, it's like a never-ending cycle where we keep improving
computer technology to make applications run faster.
• This cycle of improvement is what pushes us to make microprocessors perform better
and better. Microprocessors are like the brains of computers, and we keep making them
more powerful. This also puts extra pressure on parallel architecture because it's used
for the most demanding applications that need a lot of computing power.
• To give you an idea of how much better things are getting, if a regular computer gets
50% faster every year, a computer with a hundred processors working together is like
having the power that regular computers will have in ten years. And if you have a
thousand processors, it's like having the power of computers almost twenty years in the
future.
• Because different applications need different levels of performance, computer
companies make different types of computers. Some are not very powerful and are used
by most people, while others are super powerful and are used for the most demanding
applications. This creates what we call a "platform pyramid," where most people use
the less powerful computers, and a smaller group uses the super powerful ones. The

4|P r ep are d b y : Priy an ka Mo re

PCA: Chapter 1: Introduction to parallel architecture

pressure to make computers even more powerful is greatest at the top of the pyramid,
where the most demanding applications are.
• Before microprocessors came into the picture, we used fancy technologies and different
ways of organizing computers to make them faster. But nowadays, the best way to make
computers way faster than what we have now is by using multiple processors. The
applications that need the most power are written to run on multiple processors at the
same time. So, the need for better performance is highest for parallel architectures and
the applications that use them.
• Both architects and application developers want to know how using parallelism makes
applications run faster. We can measure this improvement in performance by something
called "speedup on processors."

When you have one specific problem to solve, the machine's performance on that
problem is just the opposite of how long it takes to solve it. So, in a special situation,
we can say the following:

• Scientific and engineering computing

- Numerous fields, particularly in computational science and engineering, rely on
enhanced computer performance. Computers are vital for simulating complex,
expensive-to-investigate real-world phenomena, enabling cost-effective analysis.
As computers become more powerful, we can address increasingly intricate issues
through simulations.
- Parallel computers are essential in various scientific fields like physics, chemistry,
and more. Industries like petroleum, automotive, aeronautics, and pharmaceuticals
rely on them for modelling and simulations. These applications also often need
complex visualizations, which parallel computing can handle. In 1995, a parallel
computer system made the first full-length computer-animated movie, "Toy Story,"
by bringing together affordable technology and powerful processors.
• Commercial computing

5|P r ep are d b y : Priy an ka Mo re

PCA: Chapter 1: Introduction to parallel architecture

In the business world, high-end computers now often use parallel technology. They
might not need as much parallel power as in scientific work, but they use it a lot.
Multi-processor systems have been the top choice for business computing since the
1960s. In this area, how fast and powerful a computer is directly affects how big of
a business it can support. We check this using test, like those for online transaction
processing (OLTP) sponsored by the Transaction Processing Performance Council
(TPC). These tests measure how many transactions a computer can handle in a
minute.

❖ Technology Trends

• The need for parallelism to achieve better performance becomes clearer when we
consider technological advancements. Relying solely on faster single processors may
not suffice, making parallel architectures more attractive. Furthermore, the challenges
in parallel computer architecture resemble those in traditional computers, such as
resource allocation and data handling.
• The big technological change is that computer parts are getting smaller and faster. This
means we can fit more of them in the same space. Also, the area where we can put these
parts is getting larger. So, the speed of computers goes up as the parts get smaller, and
we can add more parts because of the bigger space. In the end, using lots of parts at the
same time (parallelism) will likely make computers faster than just making each part
run faster.
• This idea is confirmed when we look at commercial microprocessors. If you see Figure
1-5, you'll notice that the clock speed (how fast it works) of important microprocessors
goes up by about 30% every year, and the number of transistors (tiny on-off switches)
goes up by about 40% each year. So, if we look at how powerful a chip is (how many
transistors it uses every second), the number of transistors has made a chip about ten
times more powerful than increasing the clock speed over the past two decades. This
means that microprocessors are doing better on standard tests at a much higher rate.

6|P r ep are d b y : Priy an ka Mo re

PCA: Chapter 1: Introduction to parallel architecture

• As technology advances, more parts of a computer can fit onto a single chip, including
memory and support for connecting devices (I/O). Modern high-end microprocessors
for computers and GPUs have surpassed billions of transistors, with some flagship
CPUs and GPUs featuring more than 20 billion transistors.
• The processors need data from memory faster. This is achieved through parallelism by
sending more data at once. Designs across computers, from PCs to servers, are adapting
to this requirement by using wider memory paths and better organization. Advanced
DRAM designs transfer numerous bits in parallel within the chip, then move them
quickly through a narrower path. These designs also retain recent data in fast on-chip
buffers, similar to processor caches, to speed up future data access. Utilizing parallelism
and data locality is essential for advancing memory technology.

❖ Architectural Trends

• Technology progress shapes what's possible, while computer architecture turns that
potential into actual performance and capability. Having more transistors (tiny
switches) can boost performance in two ways: parallelism and locality. Parallelism
means doing multiple things at once, reducing the time it takes to complete tasks. But
it needs resources to support all those simultaneous activities. Locality involves keeping
data close to the processor, which speeds things up, but this also requires resources.
The best performance usually comes from a balance between using parallelism and
maintaining locality.

7|P r ep are d b y : Priy an ka Mo re

PCA: Chapter 1: Introduction to parallel architecture

• The early days of microprocessors benefited from an easy form of parallelism: bit-level
parallelism in every operation. The sharp change in microprocessor growth in following
Figure shows that the widespread use of 32-bit operations, along with the use of caches,
made a big difference.

• During the mid-80s to mid-90s, the focus was on making instructions in computers
work faster. They figured out how to do the basic steps of processing instructions (like
understanding what an instruction means, doing math, and finding data) in a single step.
Thanks to caches (a type of high-speed memory), they could also quickly get the
instructions and data they needed most of the time. The RISC approach showed that,
with careful planning, they could organize the steps of instruction processing so that
they could do an instruction almost every cycle, on average.
• In the mid-80s, microprocessor-based computers used separate chips for different tasks.
As technology improved, these tasks were combined into a single chip for better
communication. This single chip handled math, memory, decisions, and floating-point
operations. They also started working on multiple instructions simultaneously, known
as "superscalar execution," which used the chip's resources more effectively. This
approach involved fetching and processing more instructions at once, making
computers faster and more efficient.
• To boost a processor's speed through instruction-level parallelism, it needs a steady
flow of instructions and data. To meet this demand, larger on-chip caches were added,
using more transistors. But having both the processor and cache on the same chip

8|P r ep are d b y : Priy an ka Mo re

PCA: Chapter 1: Introduction to parallel architecture

allowed for quicker data access. Still, as more instructions were processed, delays
caused by control transfers and cache misses became more significant.

❖ Principles of pipelining and array processing

Pipelining:
• Pipelining is the process of accumulating instruction from the processor through a
pipeline. It allows storing and executing instructions in an orderly process. It is also
known as pipeline processing.
• Pipelining is a technique where multiple instructions are overlapped during execution.
Pipeline is divided into stages and these stages are connected with one another to form
a pipe like structure. Instructions enter from one end and exit from another end.
• Pipelining increases the overall instruction throughput.
• In pipeline system, each segment consists of an input register followed by a
combinational circuit. The register is used to hold data and combinational circuit
performs operations on it. The output of combinational circuit is applied to the input
register of the next segment.

• Pipeline system is like the modern day assembly line setup in factories. For example,
in a car manufacturing industry, huge assembly lines are setup and at each point, there
are robotic arms to perform a certain task, and then the car moves on ahead to the next
arm.
• Types of pipelining:
It is divided into 2 categories:
1. Arithmetic Pipeline
2. Instruction Pipeline

9|P r ep are d b y : Priy an ka Mo re

PCA: Chapter 1: Introduction to parallel architecture

1. Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for
floating point operations, multiplication of fixed-point numbers etc. For example: The
input to the Floating-Point Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating-point numbers), while a and b
are exponents.
The floating point addition and subtraction is done in 4 parts:
a. Compare the exponents.
b. Align the mantissas.
c. Add or subtract mantissas
d. Produce the result.

Registers are used for storing the intermediate results between the above operations.

2. Instruction Pipeline

In this a stream of instructions can be executed by

overlapping fetch, decode and execute phases of an instruction cycle. This type of
technique is used to increase the throughput of the computer system.

An instruction pipeline reads instruction from the memory while previous instructions
are being executed in other segments of the pipeline. Thus, we can execute multiple
instructions simultaneously. The pipeline will be more efficient if the instruction cycle
is divided into segments of equal duration.

Advantages of Pipelining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.

Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.

10 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

Vector (Array) Processing

• A Scalar processor is a normal processor, which works on simple instruction at a time,

which operates on single data items. But in today's world, this technique will prove to
be highly inefficient, as the overall processing of instructions will be very slow.
• There is a class of computational problems that are beyond the capabilities of a
conventional computer. These problems require vast number of computations on
multiple data items, that will take a conventional computer(with scalar processor) days
or even weeks to complete.
• Such complex instructions, which operates on multiple data at the same time, requires
a better way of instruction execution, which was achieved by Vector processors.
• Scalar CPUs can manipulate one or two data items at a time, which is not very efficient.
Also, simple instructions like ADD A to B, and store into C are not practically efficient.
• Addresses are used to point to the memory location where the data to be operated will
be found, which leads to added overhead of data lookup. So until the data is found, the
CPU would be sitting ideal, which is a big performance issue.
• Hence, the concept of Instruction Pipeline comes into picture, in which the instruction
passes through several sub-units in turn. These sub-units perform various independent
functions, for example: the first one decodes the instruction, the second sub-unit fetches
the data and the third sub-unit performs the math itself. Therefore, while the data is
fetched for one instruction, CPU does not sit idle, it rather works on decoding the next
instruction set, ending up working like an assembly line.
• Vector processor, not only use Instruction pipeline, but it also pipelines the data,
working on multiple data at the same time.
• A normal scalar processor instruction would be ADD A, B, which leads to addition of
two operands, but what if we can instruct the processor to ADD a group of
numbers(from 0 to n memory location) to another group of numbers(lets say, n to k
memory location). This can be achieved by vector processors.
• In vector processor a single instruction, can ask for multiple data operations, which
saves time, as instruction is decoded once, and then it keeps on operating on different
data items.
• Applications of Vector Processors

11 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

i. Computer with vector processing capabilities are in demand in specialized applications.

The following are some areas where vector processing is used:
ii. Petroleum exploration.
iii. Medical diagnosis.
iv. Data analysis.
v. Weather forecasting.
vi. Aerodynamics and space flight simulations.
vii. Image processing.
viii. Artificial intelligence.

❖ Classification of pipelined processors

The pipelined processors are classified based on the following factors:

I. Levels of processing
II. Pipeline configuration
III. Types of instruction and data
I. Based on Levels of processing (Handler’s Classification)
According to the level of processing Handler proposed three classification schemes:
a. Arithmetic pipeline
b. Processor Pipeline
c. Instruction Pipeline
a. Arithmetic Pipeline
An arithmetic pipeline generally breaks an arithmetic operation into multiple
arithmetic steps that can be executed one by one in segments in Arithmetic Logic
Unit. In arithmetic pipeline, the ALU of a computer is segmented for pipeline
operations in various data formats.

For example,
- 4-stage pipeline in Star-100
- 8-stage pipeline in TI-ASC
- 14-stage pipeline in Cray-1

12 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

- 26-stage pipeline in Cyber-305

b. Processor Pipeline
In processor pipeline processing, the same data stream is processed by a cascade of
processors. Each processor performs a specific task. The data stream passes the first
processor with results stored in a memory block which is also accessible by second
processor. The second processor processes this result and passes it to third and so
on. This pipeline processor is not much popular. There is no practical example
found for processor pipeline.

c. Instruction Pipeline
In instruction pipeline, the execution of a stream of instructions can be pipelined by
overlapping the execution of the current instruction with the fetch, decode and
operand fetch of subsequent instructions. This technique is also known as look
ahead. Example: Almost all high-performance computers nowadays are equipped
with instruction pipeline processor.

II. Based on Pipeline configuration (Ramamoorthy and Li's Classification)

According to pipeline configuration and control strategies, Li and Ramamoorthy
have proposed the following pipeline classification:
a. Unifunction vs. Multifunction Pipelines
b. Static vs Dynamic Pipelines

a. Unifunction vs. Multifunction Pipelines

13 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

Unifunction Pipeline:

A pipeline with fixed and dedicated function is called a Unifunction pipeline. For
example, Floating point adder

The Cray-1 has 12 unifunctional pipeline units for various scalar, vector, fixed point
and floating-point operations.

Multifunction Pipeline:

A pipeline that performs different functions either at different times or at the same
time, by interconnecting different subsets of stages in the pipeline is called a
Multifunction pipeline. For example: TI-ASC has multifunction pipeline
processors.

b. Static vs Dynamic Pipelines

Static Pipeline:

• Static pipeline assumes only one functional configuration at a time.

• It can either be unifunctional or multifunctional.

• Static pipelines are preferred when instructions of same type are to be executed
continuously.

• A unifunction pipeline must be static.

Dynamic Pipeline:

• Dynamic pipeline permits several functional configurations to exist simultaneously.

A dynamic pipeline must be multi-functional

• The dynamic configuration requires more elaborate control and sequencing

mechanisms than static pipelining.

III. Based on types of instruction and data

According to the types of instruction and data, following pipeline types are
identified under this classification:
a. Scalar Pipelines:
This type of pipeline processes scalar operands of repeated scalar instructions
(i.e processes scalar operands under the control DO LOOP). Instructions in a

14 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

small DO loop are often prefetched into the instruction buffer. The required
scalar operands for repeated scalar instructions are moved into a data cache in
order to continuously supply the pipeline with operands.
Example: IBM-360
b. Vector Pipelines:
This type of pipeline processes vector instruction over vector operands.
Computers having vector instructions are often called vector processors. The
design of a vector pipeline is expanded from that of a scalar pipeline.
Example: STAR-100, Cray-1

❖ Instruction Pipeline design

• A stream of instructions can be executed by a pipeline in an overlapped manner. A

typical instruction execution consists of a sequence of operations, including instruction
fetch, decode, operand fetch, execute, and write-back phases. Pipelined Instruction
Processing A typical instruction pipeline is shown below

• The fetch stage (F) fetches instructions from a cache memory, ideally one per cycle.
• The decode stage (D) reveals the instruction function to be performed and identifies the
resources needed. Resources include general-purpose registers buses, and functional
units.
• The issue stage (I) reserves resources. The operands are also read from registers during
the issue stage.
• The instructions are executed in one or several execute stages (E). Three execute stages
are shown in Fig.
• The last writeback stage (W) is used to write results into the registers. Memory load or
store operations are treated as part of execution.
• Figure (b) illustrates the issue of instructions following the original program order. The
shaded boxes correspond to idle cycles when instruction issues are blocked due to
resource latency or conflicts or due to data dependencies.

15 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

• The total time required is 17 Clock cycles. This time is measured beginning at cycle 4
when the first instruction starts execution until cycle 20 when the last instruction starts
execution.

• Figure (c) shows an improved timing after the instruction issuing order is changed to
eliminate unnecessary delays due to dependence.
• The idea is to issue all four load operations in the beginning.
• Both the add and multiply instructions are blocked fewer cycles due to this data
prefetching. The reordering should not change the end results.
• The time required is being reduced to 11 cycles, measured from cycle 4 to cycle I4.

❖ Performance evaluation factors

1. Clock Period
The CPU of digital computer is driven by a clock with a constant cycle time (in nano
seconds).
Clock Rate: The inverse of the cycle time is the clock rate.
f = 1/T in megahertz.

16 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

Instruction Count: The size of a program is determined by its instruction count in

terms of the number of machine instructions to be executed in the program.
Cycles Per Instruction (CPI): CPI is the parameter for measuring the time needed to
execute each instruction. Different machine instructions may require different clock
cycles to execute.
2. Speedup
How much speed up performance we get through pipelining.
Here, n: Number of tasks to be performed
In conventional Machine (Non-Pipelined) ,
tn: Clock cycle
t1: Time required to complete the n tasks
t1 = n * tn
In pipelined Machine (k stages)
tp: Clock cycle (time to complete each sub operation)
tk: Time required to complete the n tasks
tk = (k+n-1)* tp
So, Speedup (Sk) is calculated as,

Example
- 4-stage pipeline
- sub operation in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS
- Pipelined System
(k + n - 1)*tp = (4+99) * 20 = 2060nS
- Non-Pipelined System
n*k*tp = 100*80 = 8000nS
- Speedup
Sk=8000/2060 = 3.88
- 4-Stage Pipeline is basically identical to the system with 4 identical function units

17 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

3. Efficiency:
The efficiency of a pipeline can be measured as the ratio of busy time span to the total
time span including the idle time. Let c be the clock period of the pipeline, the efficiency
E can be denoted as:
E = (n. m. c) / m. [m. c + (n-1).c] = n / [(m + (n-1)]
As n → ∞, E becomes 1.

4. Throughput:
Throughput of a pipeline can be defined as the number of results that have been
achieved per unit time. It can be denoted as:
T = (n/ [m + (n-1)]) / c = E / c
Throughput denotes the computing power of the pipeline. Maximum speed up,
efficiency and throughput are the ideal cases.

18 | P r e p a r e d b y : P r i y a n k a M o r e

Primer Parrallel Processing 1980 To 2020
No ratings yet
Primer Parrallel Processing 1980 To 2020
192 pages
Parallelism in Computer Architecture
No ratings yet
Parallelism in Computer Architecture
27 pages
Parallel Processing Assignment 1
No ratings yet
Parallel Processing Assignment 1
14 pages
Parallel Computing Terminology
No ratings yet
Parallel Computing Terminology
11 pages
Parallel Computing
No ratings yet
Parallel Computing
57 pages
Module - 01 CC (BCS601)
No ratings yet
Module - 01 CC (BCS601)
47 pages
B.tech CS S8 High Performance Computing Module Notes Module 1
100% (1)
B.tech CS S8 High Performance Computing Module Notes Module 1
19 pages
COA Imp Questions 20-21
No ratings yet
COA Imp Questions 20-21
4 pages
Data Parallel Architecture
No ratings yet
Data Parallel Architecture
17 pages
Lecture 1 Introduction
No ratings yet
Lecture 1 Introduction
34 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 05-Aug-2021 Module1 (Part 1)
30 pages
Introduction To Computer PDF
No ratings yet
Introduction To Computer PDF
220 pages
ch1 PC
No ratings yet
ch1 PC
84 pages
Parallel Computing An Introduction
No ratings yet
Parallel Computing An Introduction
40 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
Parallel Processing Assignment 1
No ratings yet
Parallel Processing Assignment 1
14 pages
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
No ratings yet
Theory of Distributed Computing and Parallel Processing With Its Applications, Advantages and Disadvantages
11 pages
Service Culture Syllabus Guidelines and Topics
No ratings yet
Service Culture Syllabus Guidelines and Topics
7 pages
Assignment 1st PC
No ratings yet
Assignment 1st PC
12 pages
Parallel Architecture Fundamental
No ratings yet
Parallel Architecture Fundamental
18 pages
Arciticher
No ratings yet
Arciticher
6 pages
Parallel Computing
No ratings yet
Parallel Computing
25 pages
Unit4 Session1 Intro To Parallel Computing
No ratings yet
Unit4 Session1 Intro To Parallel Computing
24 pages
Lec1-Introduction To Parallel - Distributed System
No ratings yet
Lec1-Introduction To Parallel - Distributed System
29 pages
Introduction To PCA
No ratings yet
Introduction To PCA
7 pages
Notes - Cloud Computing
No ratings yet
Notes - Cloud Computing
34 pages
Parallel Archit 1
No ratings yet
Parallel Archit 1
18 pages
BCSE412L - Parallel Computing 01
No ratings yet
BCSE412L - Parallel Computing 01
27 pages
Computer Architecture Report
No ratings yet
Computer Architecture Report
32 pages
02 - Lecture #2
No ratings yet
02 - Lecture #2
29 pages
Lecture-2-06 01 2025
No ratings yet
Lecture-2-06 01 2025
21 pages
PC Notes
No ratings yet
PC Notes
26 pages
Topic 1 2024
No ratings yet
Topic 1 2024
41 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
Lec1 Introduction
No ratings yet
Lec1 Introduction
23 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
Introduction To Parallel Computing
No ratings yet
Introduction To Parallel Computing
30 pages
Lecture 2 Computer Architecture Course 2024 1
No ratings yet
Lecture 2 Computer Architecture Course 2024 1
57 pages
Chapter 1 - Parallel Architectures
No ratings yet
Chapter 1 - Parallel Architectures
60 pages
Preview-9781482211191 A37870511
No ratings yet
Preview-9781482211191 A37870511
50 pages
Watercolor Organic Shapes SlidesMania
No ratings yet
Watercolor Organic Shapes SlidesMania
23 pages
Introduction To Parallel Computing LLNL
No ratings yet
Introduction To Parallel Computing LLNL
44 pages
Intro Parallel Computing PDF
No ratings yet
Intro Parallel Computing PDF
58 pages
Parallel Programming - Unit 1
No ratings yet
Parallel Programming - Unit 1
81 pages
Chapter # 1
No ratings yet
Chapter # 1
117 pages
PDC 3
No ratings yet
PDC 3
26 pages
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
No ratings yet
Multiprocessors - Parallel Processing Overview: "The Real World Is Inherently Concurrent Yet Our Computational
78 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
HPC BOOk
No ratings yet
HPC BOOk
68 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Lecture 9
No ratings yet
Lecture 9
72 pages
Jetson Orin NX Series Modules Data Sheet DS 10712 001 v1.1
No ratings yet
Jetson Orin NX Series Modules Data Sheet DS 10712 001 v1.1
54 pages
Introduction To Parallel Processing: Chapter 1 From Culler & Singh Winter 2007
No ratings yet
Introduction To Parallel Processing: Chapter 1 From Culler & Singh Winter 2007
44 pages
Generic Questions
No ratings yet
Generic Questions
70 pages
Lec1 and 2
No ratings yet
Lec1 and 2
52 pages
CMP 252 - Parallelism Fundamentals
No ratings yet
CMP 252 - Parallelism Fundamentals
64 pages
What Is Parallel Computing 1 PDF
No ratings yet
What Is Parallel Computing 1 PDF
21 pages
CS7103 - MultiCore Architecture Ppts Unit-II
No ratings yet
CS7103 - MultiCore Architecture Ppts Unit-II
43 pages
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
No ratings yet
EE664: Introduction To Parallel Computing: Dr. Gaurav Trivedi Lectures 5-14
170 pages
1 Introduction
No ratings yet
1 Introduction
48 pages
Necessity of High Performance Parallel Computing
No ratings yet
Necessity of High Performance Parallel Computing
44 pages
IBM z13 Overview For DFW System Z User Group - 2015mar
No ratings yet
IBM z13 Overview For DFW System Z User Group - 2015mar
107 pages
The New Trends of Parallel Processing
No ratings yet
The New Trends of Parallel Processing
5 pages
EE3018 EP QUESTION BANK 2 Marksnew
No ratings yet
EE3018 EP QUESTION BANK 2 Marksnew
21 pages
Different Types of Computers
No ratings yet
Different Types of Computers
9 pages
Memory Management in LINUX
No ratings yet
Memory Management in LINUX
16 pages
CO Module 5 Notes
No ratings yet
CO Module 5 Notes
16 pages
MPS Manual 9
No ratings yet
MPS Manual 9
4 pages
Aehowto
No ratings yet
Aehowto
98 pages
1 Vector Processing: Solutions
No ratings yet
1 Vector Processing: Solutions
16 pages
CS02DOS Unit 1UNit - 1 Introduction
No ratings yet
CS02DOS Unit 1UNit - 1 Introduction
6 pages
Coa Syllabus
No ratings yet
Coa Syllabus
2 pages
Unit1 RMD PDF
No ratings yet
Unit1 RMD PDF
27 pages
MCSE-103 Advanced Computer Architecture
No ratings yet
MCSE-103 Advanced Computer Architecture
9 pages
COA Unit-2 Notes (P3)
No ratings yet
COA Unit-2 Notes (P3)
13 pages
Unit 3
No ratings yet
Unit 3
64 pages
AMD Sea Islands Instrucdddd D Sation Set Architecture
No ratings yet
AMD Sea Islands Instrucdddd D Sation Set Architecture
314 pages
Data-Level Parallelism Presentation (1) Morning 6 35AM
No ratings yet
Data-Level Parallelism Presentation (1) Morning 6 35AM
87 pages
Advanced Computer Architecture Slides
No ratings yet
Advanced Computer Architecture Slides
105 pages
Mcpu
No ratings yet
Mcpu
6 pages
Advanced Computer Architecture Solutions
No ratings yet
Advanced Computer Architecture Solutions
18 pages
15.1 Processors, Parallel Processing and Virtual Machines
No ratings yet
15.1 Processors, Parallel Processing and Virtual Machines
25 pages
SIMD-Accelerated Regular Expression Matching
No ratings yet
SIMD-Accelerated Regular Expression Matching
7 pages
Computer Organization
No ratings yet
Computer Organization
2 pages
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
From Everand
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
Sam Steed
No ratings yet
Practical High Performance Computing: Definitive Reference for Developers and Engineers
From Everand
Practical High Performance Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

CH 1 Intro To Parallel Architecture

Uploaded by

CH 1 Intro To Parallel Architecture

Uploaded by

PCA: Chapter 1: Introduction to parallel architecture

Chapter 1: Introduction to parallel architecture

Introduction: Need of parallel architecture, application trends, Technology trends,

❖ Need of parallel architecture

• High-Performance Computing (HPC): Parallel architecture plays a pivotal role in

• Real-time Applications: Applications such as real-time gaming, video rendering, and

1|P r ep are d b y : Priy an ka Mo re

2. Handling Big Data:

• Machine Learning: Training deep learning models, such as neural networks, is

3. Scalability and Resource Efficiency:

• Scalability: Parallel architecture allows systems to scale up easily by adding more

• Energy Efficiency: Parallel architectures can be more energy-efficient than running a

*Example: ARM's big.LITTLE architecture combines high-performance cores with

4. Parallelism in Scientific Research:

2|P r ep are d b y : Priy an ka Mo re

• Genomic Sequencing: Genomic research, like DNA sequencing, generates colossal

5. Enhanced Security and Cryptography:

• Cryptography: Encryption and decryption processes often require parallelism to

6. Evolution of Hardware and Technology:

• Processor Evolution: The development of multi-core processors, GPUs, and

3|P r ep are d b y : Priy an ka Mo re

4|P r ep are d b y : Priy an ka Mo re

• Scientific and engineering computing

5|P r ep are d b y : Priy an ka Mo re

6|P r ep are d b y : Priy an ka Mo re

7|P r ep are d b y : Priy an ka Mo re

8|P r ep are d b y : Priy an ka Mo re

❖ Principles of pipelining and array processing

9|P r ep are d b y : Priy an ka Mo re

In this a stream of instructions can be executed by

Vector (Array) Processing

• A Scalar processor is a normal processor, which works on simple instruction at a time,

i. Computer with vector processing capabilities are in demand in specialized applications.

❖ Classification of pipelined processors

The pipelined processors are classified based on the following factors:

- 26-stage pipeline in Cyber-305

II. Based on Pipeline configuration (Ramamoorthy and Li's Classification)

a. Unifunction vs. Multifunction Pipelines

b. Static vs Dynamic Pipelines

• Static pipeline assumes only one functional configuration at a time.

• It can either be unifunctional or multifunctional.

• A unifunction pipeline must be static.

• Dynamic pipeline permits several functional configurations to exist simultaneously.

• The dynamic configuration requires more elaborate control and sequencing

III. Based on types of instruction and data

❖ Instruction Pipeline design

• A stream of instructions can be executed by a pipeline in an overlapped manner. A

❖ Performance evaluation factors

Instruction Count: The size of a program is determined by its instruction count in

You might also like