0% found this document useful (0 votes)
30 views39 pages

Pipelining Updated

Uploaded by

ajjualmighty4955
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views39 pages

Pipelining Updated

Uploaded by

ajjualmighty4955
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Pipelining

Amudhan AN

Course Instructor : Amudhan AN 1


What is Pipelining

Pipelining is a technique used in computer architecture to increase instructio


n throughput by overlapping the execution of multiple instructions.
It involves dividing the instruction processing into separate stages, with each
stage handling a different part of the instruction's execution.
This allows multiple instructions to be processed simultaneously, improving
overall efficiency and performance.

Course Instructor : Amudhan AN 2


Advantages:
1. Increased Instruction Throughput: Pipelining allows multiple instructions to be processed simult
aneously, significantly increasing the number of instructions completed per unit of time. This leads to
faster overall execution of programs.
2. Efficient CPU Resource Utilization: Each stage of the pipeline can work on different parts of mult
iple instructions concurrently, ensuring that the CPU is not idle and its resources are used more efficie
ntly.
3. Improved Performance: By overlapping the execution of instructions, pipelining reduces the instr
uction cycle time, leading to improved performance and faster execution of programs. This makes syst
ems more responsive and capable of handling complex tasks more effectively.
4. Reduced Instruction Latency: Although the individual instruction latency might not change signif
icantly, the overall latency for a sequence of instructions decreases due to the parallel processing of m
ultiple instructions.
5. Simplified Instruction Execution: Breaking down instruction execution into smaller, manageable
stages simplifies the design and implementation of the CPU. This modular approach makes it easier to
design, debug, and optimize each stage individually.
Course Instructor : Amudhan AN 3
Course Instructor : Amudhan AN 4
Stages of the Pipeline
Fetch:
• The Instruction Fetch (IF) stage is where the CPU retrieves the next instruction
to be executed from memory.
• Program Counter (PC): Holds the address of the next instruction.
• Instruction Memory: The instruction is fetched from this memory location.
• Increment PC: After fetching, the PC is incremented to point to the next instru
ction.
• Operations:
1.Fetch Instruction: Use the address in the PC to get the instruction from memory.
2.Update PC: Increment the PC to point to the next instruction.

Course Instructor : Amudhan AN 5


Example:
• PC = 0x0000
• Instruction = Memory[PC]
• PC = PC + 4

Course Instructor : Amudhan AN 6


Decode
• Definition:
• The Instruction Decode (ID) stage interprets the fetched instruction and
prepares the necessary operands for execution.
• Key Points:
• Control Unit: Decodes the instruction to determine what action is neede
d.
• Register File: Reads the necessary operands from the registers.
• Immediate Values: Extracts any immediate values if present.
• Operations:
1.Decode Instruction: Control unit interprets the opcode.
2.Read Operands: Fetch the necessary operands from the register file.
3.Sign Extend: For immediate values, extend the sign if needed.
Course Instructor : Amudhan AN 7
Opcode = Instruction[31:26]
Rs = Instruction[25:21]
Rt = Instruction[20:16]
Immediate = SignExtend(Instruction[15:0])

Opcode (short for operation code) is a unique code that specifies the operation to be performed by
a computer's processor. It's essentially the instruction part of a machine language instruction.
For example, in the instruction "ADD A, B", "ADD" is the opcode, specifying the addition operation.
The registers "A" and "B" are the operands, the data on which the operation is performed.
Key points about opcodes:
•Unique Identifier: Each opcode corresponds to a specific operation.
•Binary Representation: Opcodes are typically represented in binary format.
•Part of Machine Language: They are the fundamental building blocks of machine language.
•Instruction Set Architecture: The set of opcodes supported by a processor is defined by its
instruction set architecture (ISA).

Course Instructor : Amudhan AN 8


This slides briefs only on the use of registers in various stage.
A register file is a collection of registers, each capable of storing a fixed number of bits of data. It's a high-speed storage element within a
processor, used to store temporary data during program execution. Think of it as a small, high-speed memory that's directly accessible by
the processor.
Why is it Important in a 5-Stage Pipeline?
In a 5-stage pipeline, the register file plays a crucial role in efficiently passing data between different stages. Here's how:
Instruction Decode (ID) Stage:
The instruction is decoded to determine the source and destination registers.
The register file reads the values from the source registers.
Execute (EX) Stage:
The ALU uses the values from the register file to perform arithmetic or logical operations.
The result of the operation is stored in a temporary register.
Memory Access (MEM) Stage:
For load instructions, the memory address is calculated using values from the register file.
For store instructions, the data to be stored is obtained from the register file.
Write Back (WB) Stage:
The final result of the operation is written back to the destination register in the register file.
Key Points:
High Speed: Register files are designed to provide very fast access to data, significantly impacting overall processor performance.
Organization: They are typically organized as an array of registers, each with its own address.
Read/Write Ports: Multiple read and write ports allow for concurrent access to different registers, enhancing pipeline efficiency. 9
Execute:
• Definition:
• The Execution (EX) stage performs the operation specified by the decoded instr
uction.
• Key Points:
• Arithmetic Logic Unit (ALU): Performs arithmetic and logical operations.
• Branch Calculations: Determines the branch target address if the instruction is
a branch.
• ALU Control: Selects the appropriate ALU operation based on the instruction t
ype.
• Operations:
1.ALU Operation: Perform the required arithmetic or logical operation.
2.Branch Evaluation: Calculate the branch target if it’s a branch instruction.

Course Instructor : Amudhan AN 10


ALU Operation: Result = ALU(RsValue, RtValue)
•Context: This takes place in the Execution (EX) stage of the pipeline.
•ALU: Arithmetic Logic Unit, responsible for performing arithmetic and logical operation
s.
•RsValue: Value from the source register Rs.
•RtValue: Value from the source register Rt.
•Operation: The ALU performs a specified operation (like addition, subtraction, etc.) on
RsValue and RtValue, then stores the result in Result.

Example:
•Instruction: ADD R1, R2, R3
•Rs: Register R2
•Rt: Register R3
•Operation: Result = R2 + R3 (Value from R2 plus value from R3).

Course Instructor : Amudhan AN 11


Branch Address Calculation: BranchAddr = PC + (SignExtend(Immediate) << 2)
•Context: This calculation is used to determine the target address for branch instructions.
•PC (Program Counter): Holds the address of the current instruction.
•Immediate: A value embedded in the instruction, often representing an offset.
•SignExtend (Immediate): Extends the immediate value to match the bit-
width of the target address, preserving its sign.
•Shift Left (<< 2): Multiplies the immediate value by 4, which aligns it with the instruction word size
(assuming 32-bit instructions).
•Operation: Adds the shifted, sign-
extended immediate value to the PC to compute the target address of the branch.

Example:
•Instruction: BEQ R1, R2, offset
•PC: 0x1000
•Immediate (offset): 0x0004
•Sign-Extended Immediate: 0x0004
•Shifted Immediate: 0x0004 << 2 = 0x0010
•Branch Address: PC + 0x0010 = 0x1010
Course Instructor : Amudhan AN 12
Memory
• Definition:
• The Memory Access (MEM) stage is used for load and store instructions
to access memory.
• Key Points:
• Load/Store Unit: Handles memory read or write operations.
• Data Memory: Reads or writes data from/to memory.
• Memory Address: The effective address is calculated and used.
• Operations:
1.Memory Read: Load the data from the calculated address.
2.Memory Write: Store the data to the calculated address
Course Instructor : Amudhan AN 13
Example:
• If Load instruction: Data = Memory[Address]
• If Store instruction: Memory[Address] = RtValue

Course Instructor : Amudhan AN 14


Write Back :
• Definition:
• The Write Back (WB) stage writes the result of the instruction back to the
register file.
• Key Points:
• Register File: The destination register is updated with the result.
• Result Selection: Choose between ALU result and memory data for write-
back.
• Final Stage: Completes the instruction execution cycle.
• Operations:
1.Write Result: Write the result to the specified destination register.
Example:
Register[Destination] = Result
Course Instructor : Amudhan AN 15
Pipelining stages in various ARM archs

Course Instructor : Amudhan AN 16


Src: https://www.geeksforgeeks.org/pipelining-in-arm/

Course Instructor : Amudhan AN 17


Instruction Cycle:
The complete sequence of stages needed to execute an instruction—
from fetching to write-back.
Clock Cycle:
The time taken to complete one cycle of the clock, which synchronizes the operation
s of the CPU.
Role in Pipelining: Each stage of the pipeline typically completes its part of the instru
ction cycle in one clock cycle.
Parallel Processing: Multiple instructions are processed simultaneously, each at diffe
rent stages, thereby utilizing each clock cycle efficiently
Course Instructor : Amudhan AN 18
• Example: Pipelining with Clock Cycles
• Instruction 1 (I1):
• Cycle 1: Fetch
• Cycle 2: Decode
• Cycle 3: Execute

• Instruction 2 (I2):
• Cycle 2: Fetch
• Cycle 3: Decode
• Cycle 4: Execute

Course Instructor : Amudhan AN 19


Disadvantages of pipelining

• Pipeline Hazards: These are the conditions that result in


interruption of the pipelines leading to delay in the execution of
instructions.
• Increased Complexity: Mention that with addition of the
pipelines, the total design of this processor escalates as well.
• Stalling: Data dependences may require instructions to wait
and hence result in pipeline stalls.

Course Instructor : Amudhan AN 20


Types of Data Hazards:
• Read After Write (RAW) - True Dependence:
• Occurs when an instruction needs to read a value that has not yet been writ
ten by a previous instruction.

Instruction 1: ADD R2, R3, R4 ; R2 = R3 + R4


Instruction 2: SUB R5, R2, R6 ; R5 = R2 - R6

• If I2 is executed before I1 writes to R2, I2 will read the old value of R2

Course Instructor : Amudhan AN 21


• Write After Read (WAR) - Anti-Dependence:
• Occurs when an instruction writes to a register that a previous instruc
tion needs to read.

I1: MOV R4, R1 ; R4 = R1


I2: ADD R1, R2, R3 ; R1 = R2 + R3

• If I2 is executed before I1 reads R1, I1 will read the wrong value of R1

Course Instructor : Amudhan AN 22


• Write After Write (WAW) - Output Dependence:
Occurs when two instructions write to the same register, potentially causing the w
rong value to be written last.

I1: ADD R1, R2, R3 ; R1 = R2 + R3


I2: SUB R1, R4, R5 ; R1 = R4 - R5

Suppose I2 executes before I1 writes its result to R1.

I2 will write its result to R1.


I1 then writes its result to R1, overwriting the value written by I2.

In a program where the value calculated by I2 is needed before the execution of I1,
this order can lead to incorrect results.
Course Instructor : Amudhan AN 23
Handling Data Hazards:
• Forwarding (Bypassing):
• Uses hardware to pass the result of an instruction directly to a
subsequent instruction needing it, bypassing the normal write-back
stage.

• Example: The result of I1 is forwarded directly to the next instruction


I2 needing the result without waiting for it to be written back to the
register file.

• This happens through interface register or Buffer registers.

Course Instructor : Amudhan AN 24


Course Instructor : Amudhan AN 25
• Pipeline Stalls:

• Introduces delays in the pipeline to wait for the necessary data to be


available.

• Insert NOP (no-operation) instructions to wait for ADD to complete


before executing

Course Instructor : Amudhan AN 26


• Register Renaming:

• Dynamically assigns different physical registers to eliminate WAR and


WAW hazards.

• Example: Instead of using the same logical register, use different


physical registers to store intermediate results.

Course Instructor : Amudhan AN 27


Control Hazards in Pipelining

• Control hazards, also known as branch hazards, occur when the pipeline
makes incorrect predictions about the flow of control instructions such a
s branches, jumps, and calls. These hazards can disrupt the flow of instru
ctions through the pipeline, leading to delays and inefficiencies.

Course Instructor : Amudhan AN 28


Instruction
Address Instruction

2000 I1
2004 I2 BEQ Label
2008 I3
Jump case

2050 BI1(Label part)

1 2 3 4 5 6 7
I1 IF ID EX Mem Wb Ok
ID
I2 IF (PC:250) EX Mem Wb Ok
Prob based
Me on Previous
I3 IF ID Ex m WB decode stage
I4
BI1 IF ID
Course Instructor : Amudhan AN Ex Mem WB 29
Solution: Stall
1 2 3 4 5 6 7
I1 IF ID EX Mem Wb Ok
ID
I2 IF (PC:250) EX Mem Wb Ok
Till Decode
Introduc of previous
e delay - - - - - stage
I4
BI1 IF ID Ex Mem WB

I1: BEQ R0, R1, label ; Branch if R0 equals R1


I2: NOP ; Delay slot (No Operation)

Course Instructor : Amudhan AN 30


Solution 2
Pipeline Flushing:
1. Description: Discard instructions in the pipeline that were fetched based on inc
orrect predictions.
2. Example: If a branch is taken, flush all subsequent instructions in the pipeline t
hat were fetched assuming the branch was not taken.

1 2 3 4 5 6 7
I1 IF ID EX Mem Wb Ok
EX
I2 IF ID (PC:250) Mem Wb Ok
Prob based
Branch on on Previous
condition I3 IF ID Ex Mem WB decode stage
I4 IF ID Ex Mem WB

Course Instructor : Amudhan AN 31


BI1 IF ID Ex Mem WB
1. Initialize Prediction Table:
Create a table to store branch history and prediction outcomes (e.g., a Branch History Table with 2-
bit counters for each branch).
2. Fetch Instruction:
Fetch the next instruction to be executed.
3. Check for Branch:
If the instruction is a branch, proceed to predict its outcome.
If not, execute it as a normal instruction.
4. Predict Branch Outcome:
Use the branch history table to predict whether the branch will be taken or not taken.
Based on the prediction, fetch the next set of instructions.
5. Execute Branch Instruction:
Execute the branch instruction to determine the actual outcome.
6. Update Prediction Table:
Sol3 : Branch
If the prediction was correct, strengthen the prediction in the table. Prediction Algorithm
If the prediction was incorrect, update the table to reflect the actual outcome.
7. Handle Mispredictions:
If the prediction was incorrect:
Flush the incorrect instructions from the pipeline.
Fetch the correct set of instructions based on the actual outcome of the branch.
8. Continue Pipeline Execution:
Course Instructor : Amudhan AN 32
Repeat the process for subsequent instructions and branches.
Structural Hazards in Pipelining
• Definition:
• Structural hazards occur when hardware resources are insufficient to
support all concurrent operations in the pipeline. This can cause confli
cts and delays in instruction execution.
• Example:
• If two instructions require the same resource (e.g., memory, ALU) at t
he same time, a structural hazard occurs.

Course Instructor : Amudhan AN 33


1 2 3 4 5 6
I1 IF(Mem) ID EX Mem Wb
I1 and I4 trying to access the
same memory location
I2 IF(Mem) ID EX Mem Wb
I3 IF(Mem) ID Ex Mem
I4 IF(Mem) ID Ex

1 2 3 4 5 6

I1 IF(Mem) ID EX Mem Wb

I2 IF(Mem) ID EX Mem Wb
Wb
I3 IF(Mem) ID Ex Mem

I4 Course Instructor : Amudhan AN - - IF(Mem) ID Ex 34


Problem 1

• Consider a five stage pipelining with cycle time of 6 ns .


• Calculate the execution time of 100 instructions.
• Calculate the speed up due to pipelining
• Also find the utilization.

Course Instructor : Amudhan AN 35


1. Execution Time
Non-Pipelined Execution:
Execution Time per Instruction: 5 stages * 6 ns = 30 ns per instruction
Total Time for 100 Instructions: 100 * 30 ns = 3000 ns
Pipelined Execution:
Time to fill the pipeline: 5 * 6 ns = 30 ns
Time to execute remaining 95 instructions: 95 * 6 ns = 570 ns
Total Pipelined Execution Time: 30 ns + 570 ns = 600 ns

2. Speedup due to Pipelining


Speedup = Non-Pipelined Time / Pipelined Time
Speedup = 3000 ns / 600 ns = 5
Course Instructor : Amudhan AN 36
3. Utilization of the Pipeline
Pipeline Utilization = (Number of stages actively used / Total number of stages) * 100%

During the full 100 instruction execution, each stage of the pipeline is actively used after
the pipeline is filled.

Utilization Calculation: After the pipeline is filled, all 5 stages are utilized for 96 cycles (5
cycles to fill + 95 more cycles).

Percentage Utilization = (96 cycles * 5 stages) / (100 instructions * 5 stages) * 100%


Utilization = (480 / 500) * 100% = 96%

Course Instructor : Amudhan AN 37


Problem 2
Consider a 5 stage pipeline. Delay of each stage is 10,16,12,11 and 14 ns.
Calculate the execution time of 100 instructions and speed up due to
pipelining.

Course Instructor : Amudhan AN 38


1. Find the Cycle Time
In a pipelined architecture, the cycle time is determined by the stage with the maximum delay, becaus
e every stage must operate within the same cycle time to maintain synchronization.
Stage Delays: 10 ns, 16 ns, 12 ns, 11 ns, 14 ns
Cycle Time: 16 ns (the maximum stage delay)

2. Execution Time for 100 Instructions


Non-Pipelined Execution:
Execution Time per Instruction: Sum of all stage delays = 10 + 16 + 12 + 11 + 14 = 63 ns
Total Time for 100 Instructions: 100 * 63 ns = 6300 ns
Pipelined Execution:
Time to fill the pipeline: 5 * 16 ns = 80 ns
Time to execute remaining 95 instructions: 95 * 16 ns = 1520 ns
Total Pipelined Execution Time: 80 ns + 1520 ns = 1600 ns

3. Speedup due to Pipelining


Speedup = Non-Pipelined Time / Pipelined Time
Speedup = 6300 ns / 1600 ns = 3.94 Course Instructor : Amudhan AN 39

You might also like