0% found this document useful (0 votes)
591 views

Vector Processing and Pipelining

This document discusses parallel processing and pipelining. It describes four types of parallel processing: SISD, SIMD, MISD, and MIMD. SIMD involves a single control unit and multiple processing units operating on different data simultaneously. Pipelining involves dividing a process into sequential stages and executing each stage concurrently across dedicated hardware. Pipelining can increase throughput by allowing new tasks to begin before previous tasks finish. The document provides an example comparing sequential and pipelined laundry processes. It also discusses pipeline performance metrics like speedup.

Uploaded by

praveenpin2
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
591 views

Vector Processing and Pipelining

This document discusses parallel processing and pipelining. It describes four types of parallel processing: SISD, SIMD, MISD, and MIMD. SIMD involves a single control unit and multiple processing units operating on different data simultaneously. Pipelining involves dividing a process into sequential stages and executing each stage concurrently across dedicated hardware. Pipelining can increase throughput by allowing new tasks to begin before previous tasks finish. The document provides an example comparing sequential and pipelined laundry processes. It also discusses pipeline performance metrics like speedup.

Uploaded by

praveenpin2
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

Chapter 9

Pipeline and Vector


Processing

Dr. Bernard Chen Ph.D.


University of Central Arkansas
Spring 2009
Parallel processing
 A parallel processing system is able to perform
concurrent data processing to achieve faster
execution time

 The system may have two or more ALUs and be


able to execute two or more instructions at the
same time

 Goal is to increase the throughput – the


amount of processing that can be accomplished
during a given interval of time
Parallel processing
classification

Single instruction stream, single data stream – SISD

Single instruction stream, multiple data stream –


SIMD

Multiple instruction stream, single data stream –


MISD

Multiple instruction stream, multiple data stream –


MIMD
Single instruction stream, single data
stream – SISD

 Single control unit, single computer, and a


memory unit

 Instructions are executed sequentially. Parallel


processing may be achieved by means of
multiple functional units or by pipeline
processing
Single instruction stream, multiple
data stream – SIMD

 Represents an organization that includes many


processing units under the supervision of a
common control unit.

 Includes multiple processing units with a single


control unit. All processors receive the same
instruction, but operate on different data.
Multiple instruction stream, single
data stream – MISD

 Theoretical only

 processors receive different instructions, but


operate on same data.
Multiple instruction stream,
multiple data stream – MIMD
 A computer system capable of processing
several programs at the same time.

 Most multiprocessor and multicomputer


systems can be classified in this category
Pipelining: Laundry
Example
 Small laundry has one
washer, one dryer and one
operator, it takes 90 A B C D
minutes to finish one load:

 Washer takes 30 minutes


 Dryer takes 40 minutes
 “operator folding” takes 20
minutes
Sequential Laundry
6 PM 7 8 9 10 11 Midnight
Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e 90 min
r
D
 This operator scheduled his loads to be delivered to the laundry every 90 minutes
which is the time required to finish one load. In other words he will not start a new
task unless he is already done with the previous task
 The process is sequential. Sequential laundry takes 6 hours for 4 loads
Efficiently scheduled laundry: Pipelined
Laundry
Operator start work ASAP
6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
40 40 40
T
a A
s
k
B
O
r
d C
e
r
D
 Another operator asks for the delivery of loads to the laundry every 40 minutes!?.
 Pipelined laundry takes 3.5 hours for 4 loads
 Multiple tasks operating
Pipelining Facts simultaneously
 Pipelining doesn’t help
latency of single task, it
helps throughput of
6 PM 7 8 9 entire workload
Time
 Pipeline rate limited by
slowest pipeline stage
T  Potential speedup =
a 30 40 40 40 40 20
Number of pipe stages
s
k A  Unbalanced lengths of
pipe stages reduces
O speedup
r B  Time to “fill” pipeline
d and time to “drain” it
e The washer reduces speedup
r C waits for the
dryer for 10
minutes
D
9.2 Pipelining
• Decomposes a sequential process into
segments.
• Divide the processor into segment processors
each one is dedicated to a particular segment.
• Each segment is executed in a dedicated
segment-processor operates concurrently with
all other segments.
• Information flows through these multiple
hardware segments.
9.2 Pipelining
 Instruction execution is divided into k
segments or stages
 Instruction exits pipe stage k-1 and

proceeds into pipe stage k


 All pipe stages take the same amount of

time; called one processor cycle


 Length of the processor cycle is determined

by the slowest pipe stage

k segments
9.2 Pipelining
 Suppose we want to perform the
combined multiply and add
operations with a stream of
numbers:

 Ai * Bi + Ci for i =1,2,3,…,7
9.2 Pipelining
 The suboperations performed in
each segment of the pipeline are
as follows:

 R1  Ai, R2  Bi
 R3  R1 * R2 R4  Ci
 R5  R3 + R4
Pipeline Performance

 n:instructions n is equivalent to number of loads in


 k: stages in the laundry example
pipeline k is the stages (washing, drying and
 τ : clockcycle folding.
 Tk: total time Clock cycle is the slowest task time

Tk = (k + (n − 1))τ

T1 nk n
Speedup = =
Tk k + (n − 1) k
SPEEDUP
 • Consider a k-segment pipeline operating on n data
sets. (In the above example, k = 3 and n = 4.)

 > It takes k clock cycles to fill the pipeline and get the
first result from the output of the pipeline.

 After that the remaining (n - 1) results will come out at


each clock cycle.

 > It therefore takes (k + n - 1) clock cycles to


complete the task.
SPEEDUP
 If we execute the same task
sequentially in a single processing
unit, it takes (k * n) clock cycles.
 • The speedup gained by using the
pipeline is:
 S = k * n / (k + n - 1 )
SPEEDUP
 S = k * n / (k + n - 1 )

For n >> k (such as 1 million data sets on a 3-


stage pipeline),
 S~k
 So we can gain the speedup which is equal
to the number of functional units for a large
data sets. This is because the multiple
functional units can work in parallel except
for the filling and cleaning-up cycles.
Example: 6 tasks, divided
into 4 segments
1 2 3 4 5 6 7 8 9

T1 T2 T3 T4 T5 T6

T1 T2 T3 T4 T5 T6

T1 T2 T3 T4 T5 T6

T1 T2 T3 T4 T5 T6

You might also like