0% found this document useful (0 votes)
30 views74 pages

Processor Organization & Pipelining

The document discusses processor organization, focusing on CISC (Complex Instruction Set Computers) and RISC (Reduced Instruction Set Computers) architectures. It outlines the characteristics, advantages, and disadvantages of both architectures, emphasizing the differences in instruction complexity and execution efficiency. Additionally, it introduces pipelining as a technique to improve CPU performance by overlapping instruction execution phases.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views74 pages

Processor Organization & Pipelining

The document discusses processor organization, focusing on CISC (Complex Instruction Set Computers) and RISC (Reduced Instruction Set Computers) architectures. It outlines the characteristics, advantages, and disadvantages of both architectures, emphasizing the differences in instruction complexity and execution efficiency. Additionally, it introduces pipelining as a technique to improve CPU performance by overlapping instruction execution phases.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

www.covenantuniversity.edu.

ng

Raising a new Generation of Leaders

CSC 227
Computer Architecture and Organization I

MODULE5: PROCESSOR ORGANIZATION


AND PIPELINING
Introduction
• A processor is the logic circuitry that responds to and processes the
basic instructions that drive a computer.
• CPU instructions are numbers stored in memory.
• Instruction set is a set of instructions a programmer can give to a machine
to perform operations.
• The instructions are specific to CPU architecture.
• Basic operations: read instruction from memory, decode, execute ,write
back.
Fundamentals
From the architecture point of view, the microprocessor chips can be
classified into two categories:

1. Complex Instruction Set Computers (CISC) and


2. Reduce Instruction Set Computers (RISC) .
CISC: Complex Instruction Set Computers
• CISC existed close to the beginning of general computing.

• Since the earliest machines were programmed in assembly language and


memory was slow and expensive, IBM designed an instruction set to allow
early programmer to easily program a hundred complex instruction rather
than thousands of individual instructions.

• CISC was developed to make compiler development simpler. It shifts most


of the burden of generating machine instructions to the processor.

• For example, instead of having to make a compiler write long machine


instructions to calculate a square-root, a CISC processor would have a
built-in ability to do this.
CISC Architecture
• Complex instruction set computing is a CPU design where
single instructions can execute several low-level operations (such as a load
from memory, an arithmetic operation, and a memory store) or are capable of
multi-step operations or addressing modes within single instructions.

• Called “complex” because of the complex work performed per instruction.


• Concept: Encode the intention directly.
• Eg:” add X and Y and put the result in Z” (for X,Y,Z memory address)
• Problem: Some instruction take more time then others.
• Examples: x86, s390.
• Small number of general purpose registers

5
cont….
• Computers typically use CISC while tablets, smartphones and other devices
use RISC.

• So, the higher efficiency of the RISC architecture makes it desirable in these
applications where cycles and power are usually in short supply.

• In CISC instructions are executed by microcode.

• A CISC instruction set typically includes many instructions with different sizes
and execution cycles, which makes CISC instructions harder to pipeline.
Characteristic of CISC Processors
• A CISC instruction can be thought to contain many different type of
instructions bundled into one simple instruction.
• A large number of instructions - typically from 100 to 250 instructions.
• Some instructions that perform specialized tasks and are used infrequently.
• A large variety of addressing modes - typically 5 to 20 different modes.
• Variable-length instruction formats
• Instructions that manipulate operands in memory.

7
Properties of a CISC Processor
1. Richer instruction set, some simple, some very complex.
2. Instructions generally take more than 1 clock to execute.
3. Instructions of a variable size.
4. Instructions is an interface with memory in multiple mechanisms with
complex addressing modes.
5. No pipelining.
6. Microcode control make CISC instruction set possible & flexible.
7. Work well with simpler compiler.
Advantage
• Microprogramming is as easy as assembly language to implement, and
much less expensive than hardwiring a control unit.

• As each instruction became more capable, fewer instructions could be


used to implement a given task. This made more efficient use of the
relatively slow main memory.

• Because micro-program instruction sets can be written to match the


constructs of high-level languages, the compiler does not have to be as
complicated.
Disadvantage
• Complex instructions are infrequently used by programmers and compilers.

• Memory references, loads and stores, are slow and account for a significant
fraction of all instructions.

• Procedure and function calls are a major bottleneck, passing arguments,


storing and retrieving values in registers.

• Instruction set & chip of new generation hardware become more complex
with each generation of computers.
CISC Instruction Example
A CISC could multiply 5 by 10 as follows:

Mov ax,10
Mov bx,5
Mul bx
RISC: Reduced Instruction Set Computer
History
The first RISC projects came from IBM, Stanford, and UC-Berkeley in the late
70s and early 80s.
The IBM 801, Stanford MIPS, and Berkeley RISC 1 and 2 were all designed
with a similar philosophy which has become known as RISC.

When designers create a new generation of processors, improving


performance is the key goal. There are three main factors that affect
performance; they are :
• How fast you can crank up the clock.
• How much work you can do per cycle.
• How many instructions you need to perform a task.
RISC Architecture
• Called “reduced” because of the reduction of work performed by an
instruction.
• It is a type of microprocessor architecture that utilizes a small, highly-
optimized set of instructions, rather than a more specialized set of
instructions often found in other types of architectures.
• RISC's original goals was to limit the number of instructions on the chip so
that each could be allocated enough transistors to make it execute in one
cycle.
• Small set of instructions of a typical RISC processor consists mostly of
register-to-register operations, with only simple load and store operations
for memory access.
cont….
• Thus each operand is brought into a processor register with a load
instruction
• All computations are done among the data stored in processor registers.
• Results are transferred to memory by means of store instructions.
• Concepts: Break operation into simpler sub operations.
• Eg: instruction:
• load X,
• load Y,
• add X and Y,
• store Z
Characteristic of RISC Processors
• Simplifies the instruction set..
• The use of only a few addressing modes results from the fact that almost all
instructions have simple register addressing.
• Other addressing modes may be included, such as immediate operands.
• By using a relatively simple instruction format, the instruction length can be
fixed and aligned on word boundaries.
• An important aspect of RISC instruction format is that it is easy to decode.
• Shorter Instructions - Breaking the complex instruction into several short
simpler instructions
cont….
• It has the ability to execute one instruction per clock cycle.
➢ This is done by overlapping the fetch, decode and execute phases of two or
three instructions by using a procedure referred to as pipelining.
• The advantage of register storage as opposed to memory storage is that
registers can transfer information to other registers much faster than the
transfer of information to and from memory.
• Relatively few instructions
• Relatively few addressing modes.
• Memory access limited to load and store instructions
• Can run several instructions simultaneously.
Properties of a RISC Processor
1. Simple primitive instructions and addressing modes.
2. Instructions execute in one clock cycle.
3. Uniformed length instructions and fixed instruction format.
4. Instructions interface with memory via fixed mechanisms(load/store).
5. Pipelining.
6. Hardwired control.
7. Complexity pushed to the compiler.
Advantage
• Speed: RISC processors often achieve 2 to 4 times the performance of CISC
processors using comparable semiconductor technology and the same
clock rates.

• Simpler hardware. Because the instruction set of a RISC processor is so


simple, it uses up much less chip space and simple hardware
requirements.

• Shorter design cycle. Since RISC processors are simpler than


corresponding CISC processors, they can be designed more quickly, and
can complete there work in 1 clock cycle
Disadvantage
• Code Quality: The performance of a RISC processor depends greatly on the code
that it is executing. If the programmer (or compiler) does a poor job of instruction
scheduling, the processor can spend quite a bit of time stalling (waiting for the
result of one instruction before it can proceed with a subsequent instruction).

• Code expansion: Since CISC machines perform complex actions with a single
instruction, where RISC machines may require multiple instructions for the same
action, code expansion can be a problem.

• System Design: Another problem that faces RISC machines is that they require very
fast memory systems to feed their instructions. RISC-based systems typically
contain large memory caches, usually on the chip itself. This is known as a first-
level cache.
RISC Instruction Example
• In RISC the microprocessor's designers might make sure that add executes
in one clock.
• Then a compiler could multiply a and b by adding a to itself b times or b to
itself a times.

Mov ax,0
Mov bx,10
Mov cx,5
Begin:
Add ax,bx
Loop Begin
loop cx times
RISC 5 Stage Pipelining
Fivestage “RISC” load-‐store architecture
1. Instruction fetch (IF)
• Get instruction from memory, increment PC.
2. Instruction Decode (ID)
• Translate opcode into control signals and read registers.
3. Execute (EX)
• Perform ALU operation, compute jump/branch target
4. Memory (MEM)
• Access memory if needed
5. Writeback (WB)
• Update register file
RISC 5 Stage Pipelining
Example: CISC and RISC Instructions
Comparisons between CISC and RISC Processors
• Instructions utilize more cycles in CISC than RISC.
• CISC has more complex instructions than RISC.
• CISC typically has fewer instructions than RISC.
• CISC implementations tend to be slower than RISC implementations.
• RISC design is approximately twice as cost-effective as CISC.
• RISC architectures are designed for a good cost/performance, whereas CISC
architectures are designed for a good performance on slow memories.
Comparisons between CISC and RISC Processors
cont….
CISC RISC
Emphasis on hardware Emphasis on software

Includes multi-clock, complex


Single-clock, reduced instruction only
instructions

Memory-to-memory: Register to register:


"LOAD" and "STORE" "LOAD" and "STORE"
incorporated in instructions are independent instructions

Slower since instruction can take Faster since instructions usually take
more than 1 cycle 1 instruction cycle

Main objective is less code. Main objective is speed.


cont….
CISC RISC

More software oriented since the


More hardware oriented. compiler deals with translations.

Instruction size is mostly varied in


Instruction size is always a set size.
size.

Addressing Modes can be complex Addressing Modes are simple.


Examples of CISC and RISC Processors
Improving System Performance
through Pipelining

29
Pipelining - Introduction
In a typical system speedup is achieved through parallelism at all
levels:
Multi-user, multitasking,multi-processing, multi-programming,
multi-threading, compiler optimizations.
• Pipelining : is a technique for overlapping operations during
execution.Today this is a key feature that makes fast CPUs.
• Different types of pipeline: instruction pipeline, operation
pipeline, multi-issue pipelines.

30
What is Pipelining? - 1

• Pipeline processing is an implementation technique where


arithmetic sub-operations or the phases of a computer instruction
cycle overlap in execution

• Pipelining exploits parallelism among instructions by overlapping


them - called Instruction Level Parallelism (ILP)

• Pipelining is a technique of decomposing a sequential process


into sub-operations, with each sub-process being executed in a
special dedicated segment that operates concurrently with all other
segments

3
31
What is a Pipeline? - 2

• Pipeline is like an automobile assembly line.


• A pipeline has many steps or stages or segments.
• Each stage carries out a different part of instruction or operation.
• The stages are connected to form a pipe.
• An instruction or operation enters through one end and progresses
through the stages to exit at the other end.

3
32
Pipeline Characteristics
• Throughput: Number of items (cars, instructions, operations) that
exit the pipeline per unit time. Ex: 1 inst /clock cycle, 10 cars/ hour,
10 floating point operations /cycle.
• Stage time: The pipeline designer’s goal is to balance the
length of each pipeline stage(balanced pipeline). In general,
Stage time = Time per instruction on non-pipelined machine/
number of stages
• In many instances, stage time = max (times for all stages).
C P I : Pipeline yields a reduction in cycles per instruction
• CPI approx = stage time.

3
33
Pipeline Analogy – The Laundry

• Ann, Brian, Cathy, Dave each have one A B C D


load of clothes to wash, dry, and fold

• Washer takes 30 minutes

• Dryer takes 30 minutes

• “Folder” takes 30 minutes

• “Stasher” takes 30 minutes to put clothes


into drawers

3
34
Pipeline Analogy – The Laundry - Sequential Operation
6 PM 7 8 9 10 11 12 1 2 AM

30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Time
A
B
C
D

• Time Required: 8 hours for 4 loads

3
35
Pipeline Analogy – The Laundry - Overlapping Tasks

6 PM 7 8 9 10 11 12 1 2 AM

30 30 30 30 30 30 30 Time
A
B
C
D

• Time Required: 3.5 Hours for 4 Loads

3
36
Pipeline Analogy – The Laundry (cont’d)

• Pipelining doesn’t help latency of single task, it helps throughput of


entire workload
• Pipeline rate is limited by slowest pipeline stage
• Multiple tasks operating simultaneously
• Potential speedup = Number pipe stages
• Unbalanced lengths of pipe stages reduces speedup
• Time to “fill” pipeline and time to “drain” it reduces speedup

37
Pipeline Analogy – The Laundry (cont’d)
• Key idea: break big computation up into pieces

1 nanosecond = 10^-9 second


1 picosecond = 10^-12 second

1ns

• Separate each piece with a pipeline register

200ps 200ps 200ps 200ps 200ps

Pipeline
Register

38
Pipelining Analogy – Grading of Exam

• Grading the Final exam for a class of 100 students:


▪ 5 problems, five people grading the exam
▪ Each person ONLY grades one problem
▪ Pass the exam to the next person as soon as one finishes his part
▪ Assume each problem takes 12 min to grade
‒ Each individual exam still takes 1 hour to grade
‒ But with 5 people, all exams can be graded five times quicker

39
Pipelining Analogy – Grading of Exam

• The load instruction has 5 stages:


▪ Five independent functional units to work on each stage
‒Each functional unit is used only once
▪ The 2nd load can start as soon as the 1st finishes its fetch stage
▪ Each load still takes five cycles to complete
▪ The throughput, however, is much higher

40
Pipelining Analogy – Grading of Exam
buffer
Input
Tasks Stage 1 Stage 2 K – stage pipeline Stage k

• Let n be number of tasks or exams (or instructions)


• Let k be number of stages for each task
• Let T be the time per stage
• Time per task = T . k
• Total Time per n tasks for non-pipelined solution = T .k .n
• Total Time per n tasks for pipelined solution = T .k + T .(n-1)
• Speedup = pipelined perform/non-pipelined performance
= Total Time non-pipelined/ Total Time for pipelined
= k .n / k + n-1 = k approx. when n > > k
41
41
Single-Cycle vs Pipelined Execution
Non-Pipelined
Instruction 0 200 400 600 800 1000 1200 1400 1600 1800
Order Time
Instruction REG REG
lw $1, 1 0 0 ( $ 0 ) Fetch
ALU MEM
RD WR
Instruction REG REG
lw $2 , 2 0 0 ( $ 0 ) ALU MEM
800ps Fetch RD WR
Instruction
lw $3 , 3 0 0 ( $ 0 ) Fetch
800ps
800ps

Pipelined
Instruction 0 200 400 600 800 1000 1200 1400 1600
Order Time
Instruction REG REG
lw $ 1 , 1 0 0 ( $ 0 ) Fetch ALU MEM
RD WR
Instruction REG REG
lw $ 2 , 200($0) ALU MEM
200ps Fetch RD WR
Instruction REG REG
lw $ 3 , 300($0) Fetch
ALU MEM
RD WR
200ps
200ps 200ps 200ps 200ps 200ps

42
Performance Issues in Pipelining
• Speedup :How much speed up performance we get through pipelining.
▪ n: Number of tasks to be performed

• Conventional Machine (Non-Pipelined)

▪ tn: Clock cycle


▪ t1: Time required to complete n tasks
▪ t1 = n * tn

43
Performance Issues in Pipelining – cont’d
• Pipelined Machine (k stages)
▪ tp: Clock cycle (time to complete each sub operation)
▪ tk: Time required to complete the n tasks
▪ tk = (k + (n - 1)) * tp

• Speedup
▪ S k: Speedup

• S k = n*t n / (k + (n – 1))*tp

44
Performance Issue - Example
Example
- 4-stage pipeline
- sub operation in each stage; tp = 20ns
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80ns

Pipelined System
(k + (n – 1))*tp = (4 + 99) * 20 = 2060ns

Non-Pipelined System
n*tp = 100 * 80 = 8000ns

Speedup
Sk = 8000 / 2060 = 3.88

4-Stage Pipeline is basically identical to the system with 4 identical function units

45
Performance Issue – Example (cont’d)

Ii Ii+1 I i+2 I i+3

Multiple P1 P2 P3 P4
Functional Units

46
Pipeline Performance – Example2

47
Pipeline Performance – Example2
• Design1:
• Average instruction execution time = clock cycletime *CPI
• = 10ns * (4 *0.4 + 4 *0.2+ 5*0.4) = 10 *(1.6+0.8+2.0)
• = 44ns
• Design 2:
• Average instruction time at steady state is clock cycle time:
• = 10ns + 1ns (for setup and clock skew) = 11ns
• Speed up = 44/11 = 4

48
Pipeline Performance – Example3
• Assume times for each functional unit of a pipeline to be: 10ns, 8ns,
10ns,10ns and 7ns.
• Overhead 1ns per stage.Compute the speed of the data path.
• Pipelined:Stage time = MAX(10,8,10,10,7) + overhead
• • = 10 + 1 = 11ns.
• This is the average instruction execution time at steady state.
• Non-pipelined:10+8+10+10+7 = 45ns
• Speedup = 45/11= 4.1 times

4949
Performance Issue – (cont’d)
• Efficiency: The efficiency of a pipeline can be measured as the ratio
of busy time span to the total time span including the idle time.
• Let c be the clock period of the pipeline, the efficiency E can be
denoted as:
• E = (n. m. c) / m. [m. c + (n-1).c] = n / [(m + (n-1)]
• As n-> ∞ ,E becomes 1.

5050
Performance Issue – (cont’d)
• Throughput: Throughput of a pipeline can be defined as the number of results
that have been achieved per unit time.
• It can be denoted as:

▪ T = (n / [m + (n-1)]) / c = E / c

• Throughput denotes the computing power of the pipeline.

• Maximum speedup, efficiency and throughput are the ideal cases.

5151
Speedup - Example
• Consider an unpipelined processor. Assume that it has a 1 ns clock cycle and it
uses 4 cycles for ALU operations and branches, and 5 cycles for memory
operations, assume that the relative frequencies of these operations are 40%,
20%, and 40%, respectively. Suppose that due to clock skew and setup,
pipelining the processor adds 0.2ns of overhead to the clock. Ignoring any
latency impact, how much speedup in the instruction execution rate will we gain
from a pipeline?
Average instruction execution time
= 1 ns * ((40% + 20%)*4 + 40%*5)
= 4.4ns
Speedup from pipeline
= Average instruction time unpiplined/Average instruction time pipelined
= 4.4ns/1.2ns = 3.7

5252
Pipeline Hazards/Limitations
• Hazards reduce the performance from the ideal speedup gained by
pipelines:
• Structural hazard: Resource conflict.
• Hardware cannot support all possible combinations of instructions in
simultaneous overlapped execution.
• Data hazard:
• When an instruction depends on the results of the previous instruction.
• Control hazard:
• Due to branches and other instructions that affect the PC.

5353
Pipeline Stalls
• A stall is the delay in cycles caused due to any of the hazards
mentioned above.
• Speedup :
• 1/(1+pipeline stall per instruction)* Number of stages
• Number of cycles needed to initially fill up the pipeline could be
included in computation of average stall per instruction

5454
Structural Hazards
• When more than one instruction in the pipeline needs to access a
resource, the datapath is said to have a structural hazard.
• Examples of resources: register file, memory, ALU.
• Solution: Stall the pipeline for one clock cycle when the conflict is
detected. This results in a pipeline bubble.
• Figures 4 & 5 illustrate the memory access conflict and how it is
resolved by stalling an instruction.
• Problem: one memory port.

5555
Structural Hazards and Stalls - Conflicts

56
Structural Hazards and Stalls - Solution

57
Structural Hazards and Stalls - Bubble

58
Structural Hazard - Example
• Machine with load hazard: Data references constitute 4 0 % of the mix.
Ideal CPI is 1. Clock rate is 1.05 of the machine without hazard.
Which machine is faster, the one with hazard (machine A) or without
the hazard (machine B)? Prove.
• Solution: Hazard affects 4 0 % of the B’s instruction.
• Average instruction time for machine A: C P I * clock cycle time = 1
* x = 1.0x

5959
Structural Hazard - Solution
• Average inst time for machine B:
1) CPI has been extended.
= 4 0 % of the times 1 more cycle
2) Clock rate is faster: 1.05 times: less than machine A. By how much?
• Avg instruction time for machine B: (1 + 40/100*1) * (clock cycle
time /1.05)
= 1.4 * x/1.05 = 1.3x
• Proved that A is faster.

6060
Data Hazard - Example
• Consider the instruction sequence:
• A D D R1,R2,R3 ;result is in R1
• SUB R4,R5,R1
• AND R6,R1,R7
• OR R8,R1,R9
• XOR R10,R1,R11
• All instructions use R1 after the first instruction.

6161
Data Hazard - Solution

• Usually solved by data or register forwarding (bypassing or short-


circuiting).
• How? The data selected is not really used in ID but in the next stage:
ALU.
• Forwarding works as follows:
• ALU result from EX/MEM buffer is always fed back to ALU input
latches.
• If the forwarding hardware detects that its source operand has a
new value, the logic selects the newer result than the value read from
the register file.
6262
Data Hazard – Solution (cont’d)
• The results need to be forwarded not only from the immediate
previous instruction but also from any instruction that started up to
three cycles before.
• The result from E X / M E M (1 cycle before) and M E M / W B (2 cycles
before) are forwarded to the both ALU inputs.
• Writing into the register file is done in the first half of the cycle and
read is done in the second half.(3 cycles before)

6363
Data Hazard – E x a m p l e

Example 1 :
add $s0, $t0, $t1
sub $t2, $s0, $t3
In the example, the second instruction is dependent on the result in $s0 of the
first instruction:
if $s0 = -5 before add
$s0 = 8 after add
then the value 8 should be used in the second instruction sub.

Draw the multiple clock cycle pipeline diagram for the execution:

6464
Data Hazard – E x a m p l e ( c o n t ’d )

Execution Time CC1 CC2 CC3 CC4 CC5 CC6


order
add $s0, $t0, $t1 IF -----ID------EX------MEM----WB
value in $s0 -5 -5 -5 -5 -5/8 8
sub $t2, $s0, $t3 IF -----ID-------EX------MEM-----WB

In which clock cycle add writes to $s0? Clock 5


In which clock cycle sub reads $s0? Clock 3

For sub instruction, the value in $s0 has to be read in its ID stage (CC3).
However, the value in $s0 in CC3 is still -5 and not the correct value
8. We can only have the correct value in $s0 at the end of clock 5 (CC5).
The dependency goes from CC5 ---> CC3 (backward)

6565
Data HazardStalls
• All data hazards cannot be solved by forwarding:
• LW R1,0(R2)
• SUB R4,R1,R5
• AND R6,R1,R7
• OR R8,R1,R9
• Unlike the previous example, data is available until MEM/WB. So
subtract ALU cycle has to be stalled introducing a (vertical) bubble.

6666
Data Hazards and Stalls

67
Data Hazards and Stalls

68
Data Hazards – Time Stage Diagram

69
Data HazardClassification
• RAW - Read After Write. Most common: solved by data forwarding.
• WAW - Write After Write :Inst i (load) before inst j (add).
• Both write to same register. But inst i does it before inst j. DLX avoids
this by waiting for WB to write to registers. So no WAW hazard in
DLX.
• WAR - Write after Read: inst j tries to write a destination before it is
read by I, so I incorrectly gets its value. This cannot happen in DLX
since all inst read early (ID) but write late (WB).
• But WAW happens in complex instruction sets that have auto-
increment mode and require operands to be read late cycle
experience WAW.

7070
Data Hazard Stalls
Describe each of the following categories of Data Hazards: RAW, WAR,
WAW. Using (I – iii) below state which is RAW, WAR or WAW and indicate
how each occurs using an arrow.
R3  R1 op R2 ii) R3  R1 op R2 iii) R3  R1 op R2
R5  R3 op R4 R1  R4 op R5 R3  R6 op R7

7171
Limitations to Speedup - 1

• Data dependency between successive tasks: There may be


dependencies between the instructions of two tasks used in the
pipeline.
• For example:
▪ One instruction cannot be started until the previous instruction returns
the results, as both are interdependent.
▪ Another instance of data dependency will be when that both
instructions try to modify the same data object. These are called
data hazards.

7272
Limitations to Speedup - 2

Resource Constraints: When resources are not available at the time of


execution then delays are caused in pipelining.
For example:
1) If one common memory is used for both data and instructions and there
is need to read/write and fetch the instruction at the same time, then only
one can be carried out and the other has to wait.
2) Limited resource like execution unit, which may be busy at the required
time.

7373
Limitations to Speedup - 3
•Branch Instructionsand Interrupts in the program :
•A program is not a straight flow of sequential
instructions.

•There may be branch instructions that alter the normal flow of program,
which delays the pipelining execution and affects the performance.

•Similarly, there are interrupts that postpones the execution of next


instruction until the interrupt has been serviced.

•Branches and the interrupts have damaging effects on the pipelining.

7474

You might also like