0% found this document useful (0 votes)
64 views47 pages

Pipelining

Pipelining is a technique that allows multiple stages of instruction execution to occur simultaneously, improving efficiency and throughput in processors. It involves breaking down processes into sub-operations executed in dedicated segments, with specific stages for fetching, decoding, executing, accessing memory, and writing back results. The document also discusses the concepts of speedup, efficiency, hazards, and vector processing in pipelined architectures.

Uploaded by

Manoj Kumar Sain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views47 pages

Pipelining

Pipelining is a technique that allows multiple stages of instruction execution to occur simultaneously, improving efficiency and throughput in processors. It involves breaking down processes into sub-operations executed in dedicated segments, with specific stages for fetching, decoding, executing, accessing memory, and writing back results. The document also discusses the concepts of speedup, efficiency, hazards, and vector processing in pipelined architectures.

Uploaded by

Manoj Kumar Sain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Pipelining

 The term Pipelining refers to a technique of decomposing a sequential process into


sub-operations, with each sub-operation being executed in a dedicated segment that
operates concurrently with all other segments.

 It allows different stages of instruction execution to operate simultaneously, much like


an assembly line in a factory.
Example

Ai* Bi + Ci for i = 1, 2, 3, ......., 7

 The operation to be performed on the numbers is decomposed into sub-


operations with each sub-operation to be implemented in a segment within a
pipeline.
 The sub-operations performed in each segment of the pipeline are defined as:

R1 ← Ai, R2 ← Bi Input Ai, and Bi


R3 ← R1 * R2, R4 ← Ci Multiply, and input Ci
R5 ← R3 + R4 Add Ci to product
Stages of Instruction Execution:
In most processors, executing an instruction involves several stages:
•Fetch (F): The instruction is fetched from memory.
•Decode (D): The instruction is decoded to understand the operation and operands.
•Execute (E): The operation is performed (e.g., addition, subtraction).
•Memory Access (M): Data is read from or written to memory, if required.
•Write Back (WB): The result of the operation is written back to the register.
In a non-pipelined processor, these stages are performed sequentially for each instruction.
In a pipelined processor, these stages overlap.
Total Execution Time in Pipelined Processor

For a pipeline with k stages and n instructions:

Total Time (Pipelined) = k + (n - 1)

Explanation:
- First instruction takes k cycles to fill the pipeline.
- Each subsequent instruction takes 1 cycle to complete.
- Total time = k cycles for the first instruction + (n - 1) cycles for the
remaining instructions.
Non-Pipelined Execution Time

In a non-pipelined system:

Total Time (Non-Pipelined) = n * k

Explanation:
- Each instruction takes k cycles to complete (each stage sequentially).
Speedup Formula

Speedup is the ratio of execution time in the non-pipelined system to that in the
pipelined system.

Speedup = (Total Time Non-Pipelined) / (Total Time Pipelined)

Substituting formulas:

Speedup = (n * k) / (k + (n - 1))
Throughput in Pipelined Processor

Throughput is the number of instructions completed per unit time.

Throughput = n / (k + (n - 1))

For large n, throughput approaches 1 instruction per cycle.


Efficiency of Pipeline

Efficiency measures how well the pipeline is utilized.

Efficiency = (Speedup) / (Number of Pipeline Stages)

Substituting formulas:

Efficiency = (n * k) / (k * (k + (n - 1)))

For large n, efficiency approaches 1 (100%).


Example Calculation

Given:
- k = 5 stages
- n = 10 instructions

Non-Pipelined Execution Time:


Total Time (Non-Pipelined) = 10 * 5 = 50 cycles

Pipelined Execution Time:


Total Time (Pipelined) = 5 + (10 - 1) = 14 cycles
Speedup:
Speedup = 50 / 14 ≈ 3.57

Throughput:
Throughput = 10 / (5 + (10 - 1)) = 10 / 14 ≈ 0.71 instructions per cycle
8 instructions and 5 stages

•Number of stages (k) = 5


•Number of instructions (n) = 8
1. Total Execution Time (Pipelined)

• Formula: Total Time (Pipelined) = k + (n - 1)


• Given: k = 5, n = 8
• Calculation:
• Total Time (Pipelined) = 5 + (8 - 1) = 5 + 7 = 12 cycles

2. Total Execution Time (Non-Pipelined)


• Formula: Total Time (Non-Pipelined) = n × k
• Given: k = 5, n = 8
• Calculation:
• Total Time (Non-Pipelined) = 8 × 5 = 40 cycles
3. Speedup

• Formula: Speedup = Total Time (Non-Pipelined) / Total Time (Pipelined)


• Calculation:
• Speedup = 40 / 12 ≈ 3.33

4. Efficiency
• Formula: Efficiency = Speedup / Number of Pipeline Stages (k)
• Given: Speedup = 3.33, k = 5
• Calculation:
• Efficiency = 3.33 / 5 ≈ 0.67 or 67%
Stage delay in pipelining
refers to the time it takes for a specific pipeline stage to complete its operation. Each
stage in a pipeline performs a distinct part of the instruction processing (e.g., instruction
fetch, decode, execution, memory access, write-back), and the delay of each stage
impacts the overall performance of the pipeline.
 The throughput (number of instructions
completed per unit time) is reduced because
the slowest stage bottlenecks the entire
pipeline.
 Efficiency decreases as faster stages spend
part of their time waiting for slower stages.

Cycle Time=Maximum (Stage Delay + Register Delay) across


all stages
Role of Pipeline Registers
Pipeline registers (also called inter-stage buffers) are placed between consecutive pipeline stages
to:
1.Synchronize Data Flow:
1. Pipeline registers store the intermediate results from one stage and pass them to the next
stage in the subsequent clock cycle.
2. This synchronization ensures that even if stages have different delays, the pipeline operates
in a coordinated manner.
2.Hold Results During Waiting Periods:
1. Faster stages write their outputs into the pipeline registers and wait for the clock edge
determined by the slowest stage.
2. This prevents data from being overwritten or lost while waiting for slower stages to
complete their tasks.
A 4-stage pipeline has the stage delays as 150, 120, 160 and 140 nanoseconds,
respectively. Registers that are used between the stages have a delay
of 5 nanoseconds each. Assuming constant clocking rate, the total time taken to
process 1000 data items on this pipeline will be:

Ans : 165.5 micro sec


Consider a non-pipelined processor with a clock rate of 2.5 GHz and average cycles per
instruction of four. The same processor is upgraded to a pipelined processor with five stages;
but due to the internal pipeline delay, the clock speed is reduced to 2 GHz. Assume that there
are no stalls in the pipeline. The speed up achieved in this pipelined processor is
Arithmetic Pipeline

An arithmetic pipeline is a specialized form of pipelining used for performing arithmetic


operations (e.g., addition, subtraction, multiplication, and division). It splits the complex
arithmetic computation into smaller, simpler stages that are executed in parallel, improving the
throughput and efficiency of the operation.

An arithmetic pipeline divides an arithmetic problem into various sub problems for execution
in various pipeline segments.
Floating point addition using arithmetic
pipeline :
3-2=1
The following sub operations are performed in
this case:
1.Compare the exponents.
mantissa associated with
2.Align the mantissas. the smaller exponent
must be shifted to the X = 0.9504 *
right. 103
3.Add or subtract the mantissas. Y = 0.08200 *
103
4.Normalize the result
Z=X+Y
=
1.0324 *
X=0.3214*10^3 and Y=0.4500*10^2 103

Z = 0.1324 *
104
Instruction Pipeline
 In an instruction pipeline, the instruction cycle is divided into multiple stages.
 Each stage of the pipeline is responsible for a specific task in the overall instruction execution
process.
 The key benefit of pipelining is increased throughput—the ability to process more
instructions per unit of time.
Basic Pipeline Stages
1.Fetch (IF - Instruction Fetch)
1. The processor fetches the instruction from memory.
2. The Program Counter (PC) holds the address of the next instruction to be fetched.
3. The instruction is retrieved from memory and placed in the instruction register (IR).
2.Decode (ID - Instruction Decode)
1. The instruction in the IR is decoded to determine which operation is to be performed.
2. The registers are read (if needed), and the instruction's operands are identified.
3. The control signals are generated for the execution phase, specifying the operations to be
performed.
3.Execute (EX - Execute)
1. The operation specified by the instruction is performed. This can involve:
1.Arithmetic or logical operations (e.g., addition, subtraction).
2.Address calculations (for memory operations).
3.Decision making (for branch instructions).
2. The Arithmetic Logic Unit (ALU) is often used in this stage.
4. Memory Access (MEM - Memory Access)
1. If the instruction involves memory (e.g., load or store), this stage accesses the
memory.
1.Load: The data is read from memory and placed in a register.
2.Store: Data is written from a register to memory
5. Write-back (WB - Write Back)
2. The result of the instruction (e.g., data from the ALU or memory) is written back into
the destination register.
3. This is the final stage of the instruction cycle.
Example
Example
•I1: ADD R1, R2, R3 (Add R2 and R3, store in R1)
•I2: SUB R4, R5, R6 (Subtract R5 from R6, store in R4)
•I3: LOAD R7, 100(R1) (Load data from memory address (R1+100) into R7)
•I4: MUL R8, R9, R10 (Multiply R9 and R10, store in R8)
•I5: STORE 200(R11), R12 (Store value from R12 into memory address (R11+200))
Vector Processing in Pipelining

Vector Processing refers to the processing of data in parallel, using vector


registers and vector instructions to perform operations on entire arrays or
vectors of data rather than individual scalar values.

When integrated into a pipelined architecture, vector processing enhances


performance by allowing multiple data elements to be processed simultaneously
in different stages of the pipeline.
Scalar Data:
•Definition: A scalar is a single data value or a single element of data. It represents a single
quantity or number, typically a single integer, floating-point value, or character. Scalars are
the most basic form of data.
•Example:
•A single integer: 5
•A single floating-point value: 3.14
•A single character: 'A'
Key Characteristics:
•It represents one value at a time.
•Operations like addition, subtraction, multiplication, etc., are typically applied to scalars in
simple arithmetic or logical operations.
Vector Data:
•Definition: A vector is an ordered collection or sequence of multiple data values, typically of the
same type. It is an array or list that can hold several scalar values.
•Example:
•A vector of integers: A = [1, 2, 3, 4]
•A vector of floating-point numbers: B = [1.1, 2.2, 3.3, 4.4]
Key Characteristics:
•It holds multiple values at once.
•It allows operations to be performed on entire sets of data in a parallel manner, with one
operation affecting multiple elements at once.
•Vectors are typically used to represent data in higher dimensions (e.g., 2D or 3D coordinates,
time series data, or arrays).
Key Concepts in Vector Processing Pipelines:
1.Vector Registers:
1. A vector register is a storage location that can hold a collection of data elements (usually
integers or floating-point values) that represent a vector. Each data element in the vector
can be processed in parallel.
2.Vector Instructions:
1. A vector instruction operates on an entire vector of data, performing an operation (like
addition or multiplication) on each corresponding element in two or more vectors.
Pipeline Stages:
•In a vector processor pipeline, the stages of the pipeline are designed to handle vector
operations. For example, a basic pipeline might have the following stages:
• IF (Instruction Fetch): Fetching vector instructions from memory.
• ID (Instruction Decode): Decoding the vector instructions, identifying the operation,
and determining the operands (vector registers).
• EX (Execution): Performing the vector operation (e.g., adding or multiplying vector
elements).
• MEM (Memory Access): Accessing memory for read/write operations.
• WB (Writeback): Writing the result back to the vector register.
Handling Vector Length:

1.Short Vectors: If the vector length is shorter than the vector register size, there will be

unused entries in the vector register.

2.Long Vectors: If the vector is too large to fit into the vector register, the processor must

divide the vector into smaller chunks and process them sequentially, with some possible

pipeline stalls for loading new chunks.


vector addition example, where two vectors A = [1, 2, 3, 4] and B = [5, 6, 7, 8] are
added element-wise, and the result is stored in C = [C[1], C[2], C[3], C[4]].
Vector Processing for Matrix Inner Product
Vectorized Computation (SIMD or SIMT Approach):

In vectorized processing, instead of performing these operations serially, vector processors

allow you to:

1.Load the elements of the row of A and the column of B into vector registers.

2.Perform the element-wise multiplication of corresponding elements in parallel.

3.Compute the sum of the products in parallel and store the result.
Hazards

hazards are issues that arise in pipelined instruction execution that can
lead to incorrect behavior or reduced performance. Hazards occur
because of dependencies or conflicts between instructions as they
execute concurrently in a pipeline.

Types of Hazard
1. Data Hazard
1.1 RAW(Read after Write)
1.2 WAR (write after Read)
1.3 WAW (write after write)
2. Structural Hazard
3. Control Hazard
Data Hazards
Data hazards occur when instructions that depend on the results of previous instructions are
executed concurrently, causing incorrect results.
Read After Write (RAW) - True Dependency:
•An instruction depends on the result of a previous instruction.
I1: R2 R2+R3
I2: R5R2+R4
I2 depends on the result of I1.
Write After Read (WAR) - Anti-dependency:
•An instruction writes to a register that a previous instruction reads from.

Write After Write (WAW) - Output Dependency:


•Two instructions write to the same register, and the order of writes matters.
Solutions:

•Forwarding/Bypassing: Use the output of a stage directly in subsequent stages instead of

waiting for it to be written back to the register.

•Pipeline Stalling: Insert NOP (no-operation) instructions or stalls until the hazard is resolved.

•Out-of-Order Execution: Reorder instructions to avoid dependencies.


Structural Hazards
Structural hazards occur when hardware resources required for an instruction are not available
because they are being used by another instruction. This happens if the hardware design does not
provide enough resources to execute multiple instructions concurrently.
Example:
•If a single memory module is shared between instruction fetch (IF) and data memory access
(MEM) stages, both cannot access memory simultaneously, leading to a stall.
Solutions:
•Use duplicated hardware resources (e.g., separate instruction and data memories, also known
as Harvard architecture).
•Use stalling: The pipeline pauses until the resource becomes available.
Control Hazards
Control hazards arise from branch and jump instructions, which affect the flow of
instructions in the pipeline. When the processor does not know the correct path to take, it
may execute incorrect instructions.
Solution
•Branch Prediction: Predict the outcome of branches (taken or not taken).
•Use dynamic predictors like 1-bit, 2-bit predictors, or global history-based predictors.
•Use Stall or any prediction circuit that predict the branch instruction before next
fetching.
Thank you for your patience.

Any Question!!!!!!

You might also like