0% found this document useful (0 votes)
59 views18 pages

CH 1 Intro To Parallel Architecture

The document provides an introduction to parallel architecture. It discusses the growing need for parallel architecture due to demands for improved performance and ability to process large volumes of data swiftly. Some key points: 1) Parallel architecture uses multiple processors simultaneously to perform tasks for benefits like exponentially higher performance, faster responses for real-time applications, and ability to handle big data and train machine learning models swiftly. 2) Major applications of parallel architecture include high-performance computing, real-time applications, data analytics, machine learning, scientific research, and cryptography. 3) The evolution of multi-core CPUs and specialized hardware like GPUs has made parallel computing more accessible and driven demand from increasingly complex applications.

Uploaded by

digvijay dhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views18 pages

CH 1 Intro To Parallel Architecture

The document provides an introduction to parallel architecture. It discusses the growing need for parallel architecture due to demands for improved performance and ability to process large volumes of data swiftly. Some key points: 1) Parallel architecture uses multiple processors simultaneously to perform tasks for benefits like exponentially higher performance, faster responses for real-time applications, and ability to handle big data and train machine learning models swiftly. 2) Major applications of parallel architecture include high-performance computing, real-time applications, data analytics, machine learning, scientific research, and cryptography. 3) The evolution of multi-core CPUs and specialized hardware like GPUs has made parallel computing more accessible and driven demand from increasingly complex applications.

Uploaded by

digvijay dhole
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

PCA: Chapter 1: Introduction to parallel architecture

Chapter 1: Introduction to parallel architecture

Introduction: Need of parallel architecture, application trends, Technology trends,


Architectural trends, Architectures and its classification, Evolution of parallel processors,
current & future trends towards parallel processors. Principles of pipelining and array
processing. Classification of pipelined processors, Instruction Pipeline design, performance
evaluation factors

❖ Need of parallel architecture

In today's rapidly evolving digital landscape, the need for parallel architecture has become
increasingly crucial. Parallel architecture refers to the use of multiple processors working
together simultaneously to perform computational tasks. This approach has gained immense
significance due to the growing demands for enhanced performance, improved efficiency, and
the ability to process large volumes of data swiftly. In this note, we will explore the need for
parallel architecture, providing insights through bullet points, case studies, and examples.

1. Performance Enhancement:

• High-Performance Computing (HPC): Parallel architecture plays a pivotal role in


high-performance computing, where complex simulations and data-intensive tasks are
prevalent. By employing multiple processors, HPC systems can deliver exponentially
higher performance compared to single-processor systems.

Case Study: The Large Hadron Collider (LHC) at CERN, one of the most significant
scientific experiments, utilizes a massive parallel computing cluster to analyse vast
amounts of data generated from particle collisions. This parallel processing enables the
LHC to make ground breaking discoveries in particle physics.

• Real-time Applications: Applications such as real-time gaming, video rendering, and


AI-driven tasks require immediate responses. Parallel architecture allows these
applications to distribute processing loads across multiple cores, resulting in smoother
and faster user experiences.

*Example: Modern gaming consoles, like the Xbox Series X and PlayStation 5, employ
multi-core processors to deliver superior graphics and real-time gameplay.

1|P r ep are d b y : Priy an ka Mo re


PCA: Chapter 1: Introduction to parallel architecture

2. Handling Big Data:

• Data Analytics: With the exponential growth of data, organizations need to process
and analyze vast datasets swiftly. Parallel architecture is indispensable for distributed
data analytics, allowing organizations to extract valuable insights from their data.

Case Study: Google's BigQuery, a data warehouse that can scan and analyse terabytes
of data within seconds, utilizes parallel processing to handle massive datasets and
deliver real-time queries.

• Machine Learning: Training deep learning models, such as neural networks, is


computationally intensive. Parallel architecture accelerates the training process by
distributing the workload across multiple GPUs or TPUs (Tensor Processing Units).

*Example: The NVIDIA Tesla V100, a GPU designed for AI and deep learning, employs
thousands of cores to parallelize tasks, significantly reducing the time required to train
complex machine learning models.

3. Scalability and Resource Efficiency:

• Scalability: Parallel architecture allows systems to scale up easily by adding more


processors. This flexibility is critical for businesses and data centres, as they can adapt
to changing workloads without overhauling their infrastructure.

*Example: Cloud computing platforms like Amazon Web Services (AWS) and Microsoft
Azure offer scalable, parallel computing instances that can be adjusted to meet specific
resource demands.

• Energy Efficiency: Parallel architectures can be more energy-efficient than running a


single, high-power processor. In many cases, parallel processors can distribute tasks
efficiently, reducing overall power consumption.

*Example: ARM's big.LITTLE architecture combines high-performance cores with


power-efficient cores in a single chip, allowing mobile devices to switch between them
as needed, thereby conserving battery life.

4. Parallelism in Scientific Research:

2|P r ep are d b y : Priy an ka Mo re


PCA: Chapter 1: Introduction to parallel architecture

• Genomic Sequencing: Genomic research, like DNA sequencing, generates colossal


datasets. Parallel computing accelerates the analysis and comparison of genetic
information.

*Case Study: The Human Genome Project used a parallel computing approach to map
the entire human genome. This monumental scientific achievement was made possible
through the parallel processing of DNA sequences.

• Climate Modelling: Climate scientists use parallel architecture to run complex climate
models, simulating various climate scenarios and understanding the impact of climate
change.

*Case Study: The European Centre for Medium-Range Weather Forecasts (ECMWF)
employs a supercomputer with parallel architecture to run global weather forecasts and
climate simulations.

5. Enhanced Security and Cryptography:

• Cryptography: Encryption and decryption processes often require parallelism to


ensure data security. Cryptographic algorithms benefit from parallel architecture to
process data more securely and quickly.

*Example: The Advanced Encryption Standard (AES), widely used for data encryption,
can be accelerated through parallel processing, making it suitable for securing data
transmissions in real-time.

6. Evolution of Hardware and Technology:

• Processor Evolution: The development of multi-core processors, GPUs, and


specialized hardware accelerators has made parallel computing more accessible and
affordable.

*Example: Intel's Core i9 processors and AMD's Ryzen processors are consumer-grade
CPUs with multiple cores, making parallel processing capabilities readily available for
personal computers.

• Specialized Hardware: The advent of specialized hardware, like GPUs and TPUs, has
enabled parallel processing for specific tasks, such as graphics rendering, AI, and deep
learning.

3|P r ep are d b y : Priy an ka Mo re


PCA: Chapter 1: Introduction to parallel architecture

*Example: NVIDIA's GeForce RTX 30 series GPUs are designed for parallel
processing, making them popular choices for gaming, content creation, and AI
applications.

In conclusion, the need for parallel architecture has grown exponentially as our digital world
becomes more data-driven and reliant on high-performance computing. The examples and case
studies provided underscore the critical role of parallel architecture in various domains, from
scientific research to real-time applications, data analytics, and security. With the continuous
evolution of hardware and technology, parallel architecture is poised to remain a fundamental
component of the computing landscape, enabling us to meet the demands of the future
effectively.

❖ Application Trends

• The need for making applications work faster is something we see in every part of
computing. As computer technology gets better, we can do more things with our
applications. But this also means these applications become more demanding and need
even better technology. So, it's like a never-ending cycle where we keep improving
computer technology to make applications run faster.
• This cycle of improvement is what pushes us to make microprocessors perform better
and better. Microprocessors are like the brains of computers, and we keep making them
more powerful. This also puts extra pressure on parallel architecture because it's used
for the most demanding applications that need a lot of computing power.
• To give you an idea of how much better things are getting, if a regular computer gets
50% faster every year, a computer with a hundred processors working together is like
having the power that regular computers will have in ten years. And if you have a
thousand processors, it's like having the power of computers almost twenty years in the
future.
• Because different applications need different levels of performance, computer
companies make different types of computers. Some are not very powerful and are used
by most people, while others are super powerful and are used for the most demanding
applications. This creates what we call a "platform pyramid," where most people use
the less powerful computers, and a smaller group uses the super powerful ones. The

4|P r ep are d b y : Priy an ka Mo re


PCA: Chapter 1: Introduction to parallel architecture

pressure to make computers even more powerful is greatest at the top of the pyramid,
where the most demanding applications are.
• Before microprocessors came into the picture, we used fancy technologies and different
ways of organizing computers to make them faster. But nowadays, the best way to make
computers way faster than what we have now is by using multiple processors. The
applications that need the most power are written to run on multiple processors at the
same time. So, the need for better performance is highest for parallel architectures and
the applications that use them.
• Both architects and application developers want to know how using parallelism makes
applications run faster. We can measure this improvement in performance by something
called "speedup on processors."

When you have one specific problem to solve, the machine's performance on that
problem is just the opposite of how long it takes to solve it. So, in a special situation,
we can say the following:

• Scientific and engineering computing


- Numerous fields, particularly in computational science and engineering, rely on
enhanced computer performance. Computers are vital for simulating complex,
expensive-to-investigate real-world phenomena, enabling cost-effective analysis.
As computers become more powerful, we can address increasingly intricate issues
through simulations.
- Parallel computers are essential in various scientific fields like physics, chemistry,
and more. Industries like petroleum, automotive, aeronautics, and pharmaceuticals
rely on them for modelling and simulations. These applications also often need
complex visualizations, which parallel computing can handle. In 1995, a parallel
computer system made the first full-length computer-animated movie, "Toy Story,"
by bringing together affordable technology and powerful processors.
• Commercial computing

5|P r ep are d b y : Priy an ka Mo re


PCA: Chapter 1: Introduction to parallel architecture

In the business world, high-end computers now often use parallel technology. They
might not need as much parallel power as in scientific work, but they use it a lot.
Multi-processor systems have been the top choice for business computing since the
1960s. In this area, how fast and powerful a computer is directly affects how big of
a business it can support. We check this using test, like those for online transaction
processing (OLTP) sponsored by the Transaction Processing Performance Council
(TPC). These tests measure how many transactions a computer can handle in a
minute.

❖ Technology Trends

• The need for parallelism to achieve better performance becomes clearer when we
consider technological advancements. Relying solely on faster single processors may
not suffice, making parallel architectures more attractive. Furthermore, the challenges
in parallel computer architecture resemble those in traditional computers, such as
resource allocation and data handling.
• The big technological change is that computer parts are getting smaller and faster. This
means we can fit more of them in the same space. Also, the area where we can put these
parts is getting larger. So, the speed of computers goes up as the parts get smaller, and
we can add more parts because of the bigger space. In the end, using lots of parts at the
same time (parallelism) will likely make computers faster than just making each part
run faster.
• This idea is confirmed when we look at commercial microprocessors. If you see Figure
1-5, you'll notice that the clock speed (how fast it works) of important microprocessors
goes up by about 30% every year, and the number of transistors (tiny on-off switches)
goes up by about 40% each year. So, if we look at how powerful a chip is (how many
transistors it uses every second), the number of transistors has made a chip about ten
times more powerful than increasing the clock speed over the past two decades. This
means that microprocessors are doing better on standard tests at a much higher rate.

6|P r ep are d b y : Priy an ka Mo re


PCA: Chapter 1: Introduction to parallel architecture

• As technology advances, more parts of a computer can fit onto a single chip, including
memory and support for connecting devices (I/O). Modern high-end microprocessors
for computers and GPUs have surpassed billions of transistors, with some flagship
CPUs and GPUs featuring more than 20 billion transistors.
• The processors need data from memory faster. This is achieved through parallelism by
sending more data at once. Designs across computers, from PCs to servers, are adapting
to this requirement by using wider memory paths and better organization. Advanced
DRAM designs transfer numerous bits in parallel within the chip, then move them
quickly through a narrower path. These designs also retain recent data in fast on-chip
buffers, similar to processor caches, to speed up future data access. Utilizing parallelism
and data locality is essential for advancing memory technology.

❖ Architectural Trends

• Technology progress shapes what's possible, while computer architecture turns that
potential into actual performance and capability. Having more transistors (tiny
switches) can boost performance in two ways: parallelism and locality. Parallelism
means doing multiple things at once, reducing the time it takes to complete tasks. But
it needs resources to support all those simultaneous activities. Locality involves keeping
data close to the processor, which speeds things up, but this also requires resources.
The best performance usually comes from a balance between using parallelism and
maintaining locality.

7|P r ep are d b y : Priy an ka Mo re


PCA: Chapter 1: Introduction to parallel architecture

• The early days of microprocessors benefited from an easy form of parallelism: bit-level
parallelism in every operation. The sharp change in microprocessor growth in following
Figure shows that the widespread use of 32-bit operations, along with the use of caches,
made a big difference.

• During the mid-80s to mid-90s, the focus was on making instructions in computers
work faster. They figured out how to do the basic steps of processing instructions (like
understanding what an instruction means, doing math, and finding data) in a single step.
Thanks to caches (a type of high-speed memory), they could also quickly get the
instructions and data they needed most of the time. The RISC approach showed that,
with careful planning, they could organize the steps of instruction processing so that
they could do an instruction almost every cycle, on average.
• In the mid-80s, microprocessor-based computers used separate chips for different tasks.
As technology improved, these tasks were combined into a single chip for better
communication. This single chip handled math, memory, decisions, and floating-point
operations. They also started working on multiple instructions simultaneously, known
as "superscalar execution," which used the chip's resources more effectively. This
approach involved fetching and processing more instructions at once, making
computers faster and more efficient.
• To boost a processor's speed through instruction-level parallelism, it needs a steady
flow of instructions and data. To meet this demand, larger on-chip caches were added,
using more transistors. But having both the processor and cache on the same chip

8|P r ep are d b y : Priy an ka Mo re


PCA: Chapter 1: Introduction to parallel architecture

allowed for quicker data access. Still, as more instructions were processed, delays
caused by control transfers and cache misses became more significant.

❖ Principles of pipelining and array processing


Pipelining:
• Pipelining is the process of accumulating instruction from the processor through a
pipeline. It allows storing and executing instructions in an orderly process. It is also
known as pipeline processing.
• Pipelining is a technique where multiple instructions are overlapped during execution.
Pipeline is divided into stages and these stages are connected with one another to form
a pipe like structure. Instructions enter from one end and exit from another end.
• Pipelining increases the overall instruction throughput.
• In pipeline system, each segment consists of an input register followed by a
combinational circuit. The register is used to hold data and combinational circuit
performs operations on it. The output of combinational circuit is applied to the input
register of the next segment.

• Pipeline system is like the modern day assembly line setup in factories. For example,
in a car manufacturing industry, huge assembly lines are setup and at each point, there
are robotic arms to perform a certain task, and then the car moves on ahead to the next
arm.
• Types of pipelining:
It is divided into 2 categories:
1. Arithmetic Pipeline
2. Instruction Pipeline

9|P r ep are d b y : Priy an ka Mo re


PCA: Chapter 1: Introduction to parallel architecture

1. Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for
floating point operations, multiplication of fixed-point numbers etc. For example: The
input to the Floating-Point Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating-point numbers), while a and b
are exponents.
The floating point addition and subtraction is done in 4 parts:
a. Compare the exponents.
b. Align the mantissas.
c. Add or subtract mantissas
d. Produce the result.

Registers are used for storing the intermediate results between the above operations.

2. Instruction Pipeline

In this a stream of instructions can be executed by


overlapping fetch, decode and execute phases of an instruction cycle. This type of
technique is used to increase the throughput of the computer system.

An instruction pipeline reads instruction from the memory while previous instructions
are being executed in other segments of the pipeline. Thus, we can execute multiple
instructions simultaneously. The pipeline will be more efficient if the instruction cycle
is divided into segments of equal duration.

Advantages of Pipelining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.

Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.

10 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

Vector (Array) Processing

• A Scalar processor is a normal processor, which works on simple instruction at a time,


which operates on single data items. But in today's world, this technique will prove to
be highly inefficient, as the overall processing of instructions will be very slow.
• There is a class of computational problems that are beyond the capabilities of a
conventional computer. These problems require vast number of computations on
multiple data items, that will take a conventional computer(with scalar processor) days
or even weeks to complete.
• Such complex instructions, which operates on multiple data at the same time, requires
a better way of instruction execution, which was achieved by Vector processors.
• Scalar CPUs can manipulate one or two data items at a time, which is not very efficient.
Also, simple instructions like ADD A to B, and store into C are not practically efficient.
• Addresses are used to point to the memory location where the data to be operated will
be found, which leads to added overhead of data lookup. So until the data is found, the
CPU would be sitting ideal, which is a big performance issue.
• Hence, the concept of Instruction Pipeline comes into picture, in which the instruction
passes through several sub-units in turn. These sub-units perform various independent
functions, for example: the first one decodes the instruction, the second sub-unit fetches
the data and the third sub-unit performs the math itself. Therefore, while the data is
fetched for one instruction, CPU does not sit idle, it rather works on decoding the next
instruction set, ending up working like an assembly line.
• Vector processor, not only use Instruction pipeline, but it also pipelines the data,
working on multiple data at the same time.
• A normal scalar processor instruction would be ADD A, B, which leads to addition of
two operands, but what if we can instruct the processor to ADD a group of
numbers(from 0 to n memory location) to another group of numbers(lets say, n to k
memory location). This can be achieved by vector processors.
• In vector processor a single instruction, can ask for multiple data operations, which
saves time, as instruction is decoded once, and then it keeps on operating on different
data items.
• Applications of Vector Processors

11 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

i. Computer with vector processing capabilities are in demand in specialized applications.


The following are some areas where vector processing is used:
ii. Petroleum exploration.
iii. Medical diagnosis.
iv. Data analysis.
v. Weather forecasting.
vi. Aerodynamics and space flight simulations.
vii. Image processing.
viii. Artificial intelligence.

❖ Classification of pipelined processors

The pipelined processors are classified based on the following factors:

I. Levels of processing
II. Pipeline configuration
III. Types of instruction and data
I. Based on Levels of processing (Handler’s Classification)
According to the level of processing Handler proposed three classification schemes:
a. Arithmetic pipeline
b. Processor Pipeline
c. Instruction Pipeline
a. Arithmetic Pipeline
An arithmetic pipeline generally breaks an arithmetic operation into multiple
arithmetic steps that can be executed one by one in segments in Arithmetic Logic
Unit. In arithmetic pipeline, the ALU of a computer is segmented for pipeline
operations in various data formats.

For example,
- 4-stage pipeline in Star-100
- 8-stage pipeline in TI-ASC
- 14-stage pipeline in Cray-1

12 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

- 26-stage pipeline in Cyber-305


b. Processor Pipeline
In processor pipeline processing, the same data stream is processed by a cascade of
processors. Each processor performs a specific task. The data stream passes the first
processor with results stored in a memory block which is also accessible by second
processor. The second processor processes this result and passes it to third and so
on. This pipeline processor is not much popular. There is no practical example
found for processor pipeline.

c. Instruction Pipeline
In instruction pipeline, the execution of a stream of instructions can be pipelined by
overlapping the execution of the current instruction with the fetch, decode and
operand fetch of subsequent instructions. This technique is also known as look
ahead. Example: Almost all high-performance computers nowadays are equipped
with instruction pipeline processor.

II. Based on Pipeline configuration (Ramamoorthy and Li's Classification)


According to pipeline configuration and control strategies, Li and Ramamoorthy
have proposed the following pipeline classification:
a. Unifunction vs. Multifunction Pipelines
b. Static vs Dynamic Pipelines

a. Unifunction vs. Multifunction Pipelines

13 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

Unifunction Pipeline:

A pipeline with fixed and dedicated function is called a Unifunction pipeline. For
example, Floating point adder

The Cray-1 has 12 unifunctional pipeline units for various scalar, vector, fixed point
and floating-point operations.

Multifunction Pipeline:

A pipeline that performs different functions either at different times or at the same
time, by interconnecting different subsets of stages in the pipeline is called a
Multifunction pipeline. For example: TI-ASC has multifunction pipeline
processors.

b. Static vs Dynamic Pipelines

Static Pipeline:

• Static pipeline assumes only one functional configuration at a time.

• It can either be unifunctional or multifunctional.

• Static pipelines are preferred when instructions of same type are to be executed
continuously.

• A unifunction pipeline must be static.

Dynamic Pipeline:

• Dynamic pipeline permits several functional configurations to exist simultaneously.


A dynamic pipeline must be multi-functional

• The dynamic configuration requires more elaborate control and sequencing


mechanisms than static pipelining.

III. Based on types of instruction and data


According to the types of instruction and data, following pipeline types are
identified under this classification:
a. Scalar Pipelines:
This type of pipeline processes scalar operands of repeated scalar instructions
(i.e processes scalar operands under the control DO LOOP). Instructions in a

14 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

small DO loop are often prefetched into the instruction buffer. The required
scalar operands for repeated scalar instructions are moved into a data cache in
order to continuously supply the pipeline with operands.
Example: IBM-360
b. Vector Pipelines:
This type of pipeline processes vector instruction over vector operands.
Computers having vector instructions are often called vector processors. The
design of a vector pipeline is expanded from that of a scalar pipeline.
Example: STAR-100, Cray-1

❖ Instruction Pipeline design

• A stream of instructions can be executed by a pipeline in an overlapped manner. A


typical instruction execution consists of a sequence of operations, including instruction
fetch, decode, operand fetch, execute, and write-back phases. Pipelined Instruction
Processing A typical instruction pipeline is shown below

• The fetch stage (F) fetches instructions from a cache memory, ideally one per cycle.
• The decode stage (D) reveals the instruction function to be performed and identifies the
resources needed. Resources include general-purpose registers buses, and functional
units.
• The issue stage (I) reserves resources. The operands are also read from registers during
the issue stage.
• The instructions are executed in one or several execute stages (E). Three execute stages
are shown in Fig.
• The last writeback stage (W) is used to write results into the registers. Memory load or
store operations are treated as part of execution.
• Figure (b) illustrates the issue of instructions following the original program order. The
shaded boxes correspond to idle cycles when instruction issues are blocked due to
resource latency or conflicts or due to data dependencies.

15 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

• The total time required is 17 Clock cycles. This time is measured beginning at cycle 4
when the first instruction starts execution until cycle 20 when the last instruction starts
execution.

• Figure (c) shows an improved timing after the instruction issuing order is changed to
eliminate unnecessary delays due to dependence.
• The idea is to issue all four load operations in the beginning.
• Both the add and multiply instructions are blocked fewer cycles due to this data
prefetching. The reordering should not change the end results.
• The time required is being reduced to 11 cycles, measured from cycle 4 to cycle I4.

❖ Performance evaluation factors

1. Clock Period
The CPU of digital computer is driven by a clock with a constant cycle time (in nano
seconds).
Clock Rate: The inverse of the cycle time is the clock rate.
f = 1/T in megahertz.

16 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

Instruction Count: The size of a program is determined by its instruction count in


terms of the number of machine instructions to be executed in the program.
Cycles Per Instruction (CPI): CPI is the parameter for measuring the time needed to
execute each instruction. Different machine instructions may require different clock
cycles to execute.
2. Speedup
How much speed up performance we get through pipelining.
Here, n: Number of tasks to be performed
In conventional Machine (Non-Pipelined) ,
tn: Clock cycle
t1: Time required to complete the n tasks
t1 = n * tn
In pipelined Machine (k stages)
tp: Clock cycle (time to complete each sub operation)
tk: Time required to complete the n tasks
tk = (k+n-1)* tp
So, Speedup (Sk) is calculated as,

Example
- 4-stage pipeline
- sub operation in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS
- Pipelined System
(k + n - 1)*tp = (4+99) * 20 = 2060nS
- Non-Pipelined System
n*k*tp = 100*80 = 8000nS
- Speedup
Sk=8000/2060 = 3.88
- 4-Stage Pipeline is basically identical to the system with 4 identical function units

17 | P r e p a r e d b y : P r i y a n k a M o r e
PCA: Chapter 1: Introduction to parallel architecture

3. Efficiency:
The efficiency of a pipeline can be measured as the ratio of busy time span to the total
time span including the idle time. Let c be the clock period of the pipeline, the efficiency
E can be denoted as:
E = (n. m. c) / m. [m. c + (n-1).c] = n / [(m + (n-1)]
As n → ∞, E becomes 1.

4. Throughput:
Throughput of a pipeline can be defined as the number of results that have been
achieved per unit time. It can be denoted as:
T = (n/ [m + (n-1)]) / c = E / c
Throughput denotes the computing power of the pipeline. Maximum speed up,
efficiency and throughput are the ideal cases.

18 | P r e p a r e d b y : P r i y a n k a M o r e

You might also like