W 04 Parallel Processing
W 04 Parallel Processing
Parallel Processing
ACKNOWLEDGEMENT
4. Parallel Processing
Ex: 5 + 3
The instruction ADD R1, R2, R3 run on single-cycle processor:
(add the values stored in register R2 and register R3, then store
the result in register R1). assume: R2 contains the number 5. and
R3 contains the number 3.
In the example:
instr. 1
instr. 2
instr. 1
instr. 2
1. Datapath
• Five (or more) instructions are in the path.
2. Instructions may have
• data and control flow dependences
• I.e. units of work are not independent- One may have to
stand and wait for another
3. Control
• Must correspond to multiple instructions
R(i) D( j )
2. Anti-dependence (WAR)
• j cannot write its result until i
has read its sources
• Conditional branches
• Branch must execute to determine which instruction to fetch
next
• Instructions following a conditional branch are control
dependent on the branch instruction
A
Task Order
C
D
A
Task Order
C
D
30+40+20=90
B • But speed up for
average task
C execution time;
Cola
Auto
Pipelining
• An implementation technique
whereby multiple instructions are
overlapped in execution.
e.g., B wash while A dry
A
• Essence: Start executing one B
instruction before completing the
previous one.
• Significance: Make fast CPUs.
Balanced Pipeline
40min
T1 A
T2 B A
T3 C B A
T4 D C B
Balanced Pipeline
40min
T1 A
T2 B A
T3 C B A
T4 D C B
Balanced Pipeline
40min
T1 A
T2 B A
T3 C B A
T4 D C B
Balanced Pipeline
One task/instruction
per 40 mins
• Equal-length pipe stages
e.g., Wash, dry, fold = 40 mins
per unpipelined laundry time = 40x3 mins
3 pipe stages – wash, dry, fold
• Performance
40min
Time per instruction by pipeline =
T1 A Time per instr on unpipelined machine
T2 B A Number of pipe stages
T3 C B A
T4 D C B Speed up by pipeline =
Number of pipe stages
Single Cycle Processors
Multicycle Processors
• Multicycle implementation:
Cycle: 1 2 3 4 5 6 7 8 9 1 1 1 1
Instr: 0 1 2 3
i F D X MW
i+1 F D X
i+2 F D X M
i+3 F
i+4
Cycle: 1 2 3 4 5 6 7 8 9 1 1 1 1
Instr: 0 1 2 3
i F D X MW
i+1 F D X MW
i+2 F D X M W
i+3 F D X MW
i+4 F D X MW