0% found this document useful (0 votes)

76 views12 pages

Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining

The document discusses pipelining in computer processors. It describes: 1) How pipelining works by overlapping the execution of multiple instructions across different stages of the processor pipeline. This improves performance by increasing instruction throughput. 2) The five main stages of a RISC processor pipeline: fetch, decode, execute, memory, and writeback. 3) How pipelining improves performance compared to a single-cycle processor by allowing new instructions to begin execution every clock cycle rather than having to wait for previous instructions to fully complete. 4) Potential pipeline hazards like data hazards that can occur if instructions depend on results that are not ready yet, and techniques like stalling and forwarding to address

Uploaded by

Amrendra Kumar Mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views12 pages

Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining

Uploaded by

Amrendra Kumar Mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

1

Computer Organization: An Introduction to RISC Hardware

Chapter-6 Enhancing Performance with Pipelining 6 ENHANCING PERFORMANCE WITH PIPELINING 6.1 6.2 6.3 6.4 An Overview of Pipelining Single-Cycle versus Pipelined Performance Pipeline hazards Pipeline Control

6.1 An Overview of Pipelining Pipelining is an implementation technique in which multiple instructions are overlapped in execution. Today, pipelining is key to making processor fast. Pipelining improves performance by increasing instruction throughput, as opposed to decreasing the execution time of an individual instruction, but instruction throughput is the important metric because real programs execute billions of instructions. Many authors separate the pipeline into two categories: - Instructional pipeline, where different stages of an instruction fetch and execution are handled in a pipeline. - Arithmetic pipeline, where different stages of an arithmetic operation are handled along the stages of a pipeline. Instructional pipeline is used in advanced microprocessors where the microprocessor begins executing a second instruction before the first has been completed. That is, several instructions are in the pipeline simultaneously, each at a different processing stage. The pipeline is divided into stages and each stage can execute its operation concurrently with the other stages. When a stage completes an operation, it passes the result to the next stage in the pipeline and fetches the next operation from the preceding stages. The final results of each instruction emerge at the end of the pipeline in rapid succession. A RISC processor pipeline operates in much the same way, although the stages in the pipeline are different. While different processors have different numbers of stages, they are basically variations of these five, used in the MIPS R3000 processor: 1. 2. 3. 4. 5. fetch instructions from memory (F); read registers and decode the instruction (D); execute the instruction or calculate an address (E); access an operand in data memory (M); write the result into a register (W).

2 Each stage is separated by a register which stores the results of the previous stage so that the correct values are used in the next stage. The clock cycle length of the pipelined processor is the time it takes for the slowest stage of the pipeline to complete its function. Stage 1 - Instruction Fetch The Instruction Fetch (F) stage is responsible for obtaining the requested instruction from memory. The main components of the IF stage can be seen in the figure below. The instruction and the program counter (which is incremented to the next instruction) are stored in the IF/ID pipeline register as temporary storage so that may be used in the next stage at the start of the next clock cycle.

Figure 6.1 - Main Components of Stage 1 Stage 2 - Instruction Decode The Instruction Decode (D) stage is responsible for decoding the instruction and sending out the various control lines to the other parts of the processor. Figure 6.2 shows the structure of the ID stage. The instruction is sent to the control unit where it is decoded and the read registers are fetched from the register file. The ID stage is also where branching occurs.

Figure 6.2 - Main Components of Stage 2

3 Stage 3 - Execution The Execution (E) stage is where any calculations are performed. The main component in this stage is the ALU. The ALU is made up of arithmetic, logic and shifting capabilities.

Figure 6.3 - Main Components of Stage 3 Stage 4 - Memory The Memory (M) stage is responsible for storing and loading values to and from memory. It is also responsible for any input or output from the processor. If the current instruction is not of Memory type than the result from the ALU is passed through to the write back stage.

Figure 6.4 - Main Components of Stage 4 Stage 5 - Write Back The Write Back (W) stage is responsible for writing the result of a calculation, memory access or input into the register file.

Figure 6.5 - Main Components of Stage 5

4 6.2 Single-Cycle versus Pipelined Performance Ideally, each of the stages in a RISC processor pipeline should take 1 clock cycle so that the processor finishes an instruction each clock cycle and averages one cycle per instruction (CPI). Lets consider only 8 instructions: lw, sw, add, sub, and, or, slt, beq. The operation times for the major functional units are 2 ns for memory access, 2 ns for ALU operation, and 1 ns for register file read or write. As we said in Chapter 5, in the single-cycle model every instruction takes exactly 1 clock cycle, so the clock cycle period must be stretched to accommodate the slowest instruction. Total time (ns) for 8 instructions calculated in Table 6.1. Instruction Instr. fetch lw 2 sw 2 R-format 2 beq 2 Register read 1 1 1 1 ALU Memory operation access 2 2 2 2 2 2 Register write 1 1 Total time 8 7 6 5

We can see that the longest instruction is lw so the time required for every instruction is 8 ns. Figure 6.6 compares no pipelined and pipelined execution of three lw instructions. Thus, the time between the first and forth instructions in the no pipelined design is 3x8 = 24 ns. All pipeline stages take a single clock cycle, so the clock cycle must be long enough to accommodate the slowest operation. Just as the single-clock design must take the worst-case clock cycle of 8 ns even though some instructions can be as fast as 5 ns, the pipeline execution clock cycle must have the worst-case clock cycle of 2 ns even though some stages take only 1 ns. Pipelining still offers a fourfold performance improvement: the time between the first and the forth instructions is 3x2 = 6 ns. So, we can write a formula (for ideal conditions): Time between instructions (pipelined) = Time between instructions (no pipelined) / Number of pipe stages

5 2ns F 1ns D 2ns E 2ns M 1ns W 2ns F 1ns D 2ns E 2ns M 1ns W F 2ns

2ns F

2ns D F

2ns E D F

2ns M E D

2ns W M E

2ns W M

2ns

Figure 6.6. No pipelined execution in top vs. pipelined execution in bottom The pipeline designer's goal is to balance the length of each pipeline stage . If the stages are perfectly balanced, then the time per instruction on the pipelined machine is equal to Time per instruction on no-pipelined machine Number of pipe stages Under these conditions, the speedup from pipelining equals the number of pipe stages. Usually, however, the stages will not be perfectly balanced; besides, the pipelining itself involves some overhead. There are two disadvantages of pipeline architecture. The first is complexity. The second is the inability to continuously run the pipeline at full speed, i.e. the pipeline stalls. 6.3 Pipeline hazards Let us examine why the pipeline cannot run at full speed. There are phenomena called pipeline hazards which disrupt the smooth execution of the pipeline. The resulting delays in the pipeline flow are called bubbles. These pipeline hazards include:

data hazards arising from data dependencies; structural hazards from hardware conflicts; control hazards that come about from branch, jump, and other control flow changes.

6 Data Hazards Data Hazards occur when an instruction attempts to use a register whose value depends on the result of previous instructions that have not yet finished. In figure 6.7 result of the stage W of the addition must be used in stage D of the subtraction. Add $1, $2, $3 Sub $4, $1, $5 F D F E D M E W M W

Figure 6.7 - Example of a Data Hazard There are two main ways of dealing with such hazards: stalling and forwarding. Stalling Stalling involves halting the flow of instructions until the required result is ready to be used. It is the simplest way to resolve a data hazard. However, as can be seen in Figure 6.8 stalling wastes processor time by doing nothing while waiting for the result. Add $1, $2, $3 Stall Stall Stall Sub $4, $1, $5 F D F E D F M E D F W M E D F W M E D W M E

W M W

Figure 6.8 - Stalling the Pipeline Forwarding The forwarding method is best described through the use of an example. Figure 6.9 shows two instructions in the pipeline, you can see that the SUB instruction needs the result of the ADD instruction in the SUB's E stage but the ADD instruction does not write the result until the ADD's W stage. However, you can also see that the result for the ADD instruction is actually computed before the SUB instruction needs it so the result is forwarded from the E/M stage back to the E stage of the SUB instruction. Add $1, $2, $3 Sub $4, $1, $5 F D F E D M E W M W

Figure 6.9 - Forwarding

7 Reordering Code to Avoid Pipeline Stalls Example: Find the hazard in this code from the body of the swap procedure: # reg $5 has the address of v[k] sll $2,$5,2 # $2 k 4 add $2, $4, $2 # $2 = v+(k 4) lw $15, 0($2) # $15 v[k] lw $16, 4($2) # $16 = v[k+1] sw $16, 0($2) # v[k] gets $16 which is v[k+1] sw $15, 4($2) # v[k+1] gets $15 (temp, which is v[k]) The hazard occurs on register $16 between the second lw and the first sw. Swapping the two sw instructions removes this hazard: sll $2,$5,2 # $2 k 4 add $2, $4, $2 # $2 = v+(k 4) lw $15, 0($2) # $15 v[k] lw $16, 4($2) # $16 = v[k+1] sw $15, 4($2) # v[k+1] gets $15 which is v[k] sw $16, 0($2) # v[k] gets $16 (temp, which is v[k+1]) Note that we do not create a new hazard because there is still one instruction between the write of register $15 by the load and the read of register $15 in the store. Thus, on a machine with forwarding, the reordered sequence takes 4 clock cycles. Branch prediction Branch instructions are those that tell the processor to make a decision about what the next instruction to be executed should be based on the results of another instruction. Branch instructions can be troublesome in a pipeline if a branch is conditional on the results of an instruction which has not yet finished its path through the pipeline. For example: Loop : add $r3, $r2, $r1 sub $r6, $r5, $r4 beq $r3, $r6, Loop

8 The example above instructs the processor to add r1 and r2 and put the result in r3, then subtract r4 from r5, storing the difference in r6. In the third instruction, beq stands for branch if equal. If the contents of r3 and r6 are equal, the processor should execute the instruction labeled "Loop." Otherwise, it should continue to the next instruction. In this example, the processor cannot make a decision about which branch to take because neither the value of r3 or r6 has been written into the registers yet. The processor could stall, but a more sophisticated method of dealing with branch instructions is branch prediction. The processor makes a guess about which path to take - if the guess is wrong, anything written into the registers must be cleared, and the pipeline must be started again with the correct instruction. Some methods of branch prediction depend on stereotypical behavior. Branches pointing backward are taken about 90% of the time since backward-pointing branches are often found at the bottom of loops. On the other hand, branches pointing forward, are only taken approximately 50% of the time. Thus, it would be logical for processors to always follow the branch when it points backward, but not when it points forward. Other methods of branch prediction are less static: processors that use dynamic prediction keep a history for each branch and uses it to predict future branches. These processors are correct in their predictions 90% of the time. Still other processors forgo the entire branch prediction ordeal. The RISC System/6000 fetches and starts decoding instructions from both sides of the branch. When it determines which branch should be followed, it then sends the correct instructions down the pipeline to be executed.

Control Hazard A control hazard occurs whenever there is a change in the normal execution flow of the program. Events such as branches, interrupts, exceptions and return from interrupts. A hazard occurs because branches, interrupts etc are not caught until the instruction is decoded in the second stage. By the time it is decoded the following instruction is already entered into the pipeline and left unchecked an unwanted instruction would remain in the pipeline. There is really only one solution to this type of hazard. That is, to implement a hardware stall. The hardware stall simply flushes the offending instruction from the pipeline. Structural Hazard A structural hazard occurs when the hardware is unable to handle certain combinations of instruction simultaneously. For example, the IF stage under normal conditions will be accessing the memory on every clock cycle. When a load or store

9 word instruction is used the MEM stage tries to access the memory and because of the single memory architecture a conflict occurs. There are a couple of ways of dealing with such conflicts-stalling and prefetching. Stalling This method works in the same way as stalling for data hazards. Instead of accessing memory in the IF stage the load/store instruction is allowed to use memory and the processor is simply stalled until the load/store instruction is finished. The problem with this method like with data hazards is that it can take a long time if there are multiple load/store instructions in a row. Prefetching Prefetching involves fetching two instructions in the F stage and storing them in a small buffer. The buffer size that we have used is 4 instructions to save on the hardware required. Due to the fact that we fetch two instruction in the F stage, when a load/store instruction is used it is allowed to access memory and the instruction for the F stage is fetched from the buffer and no instructions are fetched from memory. This prefetching method is used to solve this particular hazard. NOTE: Prefetching works better than stalling only if the memory used is fast enough to do two accesses in one clock cycle. 6.4 Pipeline Control Figure 6.25 shows the control lines on the pipelined data-path. We borrow as much as we can from the control for the simple data-path in Chapter 5. To specify control for the pipeline, we need only set the control values during each pipeline stage. Since pipelining the data-path leaves the meaning of the control lines unchanged, we can use the same control values as before. Figure 6.10 has the same values as in Chapter 5, but now the nine control lines are grouped by pipeline stage. They have been shuffled into three groups corresponding to the last three pipeline stages. Instruc. R-type lw sw beq Ex/Addr. calculation stage Mem. stage WR stage RegDst ALUOp1 ALUOp0 ALUSrc Bran. MR MW RW MtoR 1 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 1 1 X 0 0 1 0 0 1 0 X X 0 1 0 1 0 0 0 X

10 Implementing control means setting the nine control lines to these values in each stage for each instruction. The simplest way to do it is to extend the pipeline registers to include control information. Since the control lines start with the EX stage, we can create the control information during instruction decode.

Figure 6.29 shows that these control signals are then used in appropriate pipeline stage as the instruction moves down the pipeline. Figure 6.30 shows the full data-path with the extended pipeline registers and with the control lines connected to the proper stage.

Pipeline: A Simple Implementation of A RISC Instruction Set
No ratings yet
Pipeline: A Simple Implementation of A RISC Instruction Set
16 pages
ACA Unit 2,7th Sem CSE
No ratings yet
ACA Unit 2,7th Sem CSE
13 pages
Computer Architecture 1
No ratings yet
Computer Architecture 1
8 pages
Pipe Lining
No ratings yet
Pipe Lining
23 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
Co - Unit Ii - Ii
No ratings yet
Co - Unit Ii - Ii
34 pages
Instruction Pipeline
No ratings yet
Instruction Pipeline
16 pages
COA Lecture 10
No ratings yet
COA Lecture 10
22 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Lecture 7 - PIPELINING
No ratings yet
Lecture 7 - PIPELINING
16 pages
Pipelining - Computer Architecture and Organization
No ratings yet
Pipelining - Computer Architecture and Organization
40 pages
Chap-10: Speed and Efficiency
No ratings yet
Chap-10: Speed and Efficiency
29 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Pipelining Basic Concept
No ratings yet
Pipelining Basic Concept
23 pages
Pipelined Architecture With Its Diagram
No ratings yet
Pipelined Architecture With Its Diagram
20 pages
SIMD Machines:: Pipeline System
No ratings yet
SIMD Machines:: Pipeline System
35 pages
Chapter # 03 Pipelining
No ratings yet
Chapter # 03 Pipelining
85 pages
COA Unit 3 Pipelining 31.5.23
No ratings yet
COA Unit 3 Pipelining 31.5.23
12 pages
Co Unit 4
No ratings yet
Co Unit 4
17 pages
Principles of Designing Pipelined Processor-1
No ratings yet
Principles of Designing Pipelined Processor-1
32 pages
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
No ratings yet
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
19 pages
4-Concept of Pipelining
No ratings yet
4-Concept of Pipelining
20 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Pipeline in ARM
No ratings yet
Pipeline in ARM
10 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Module 03
No ratings yet
Module 03
9 pages
Pipelining
No ratings yet
Pipelining
47 pages
Module 3 Pipelining
No ratings yet
Module 3 Pipelining
7 pages
Processor Organization (Part 2)
No ratings yet
Processor Organization (Part 2)
42 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Comparison Between Pipelining
No ratings yet
Comparison Between Pipelining
9 pages
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
No ratings yet
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
24 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
UNIT 3 Second Half Notes
No ratings yet
UNIT 3 Second Half Notes
28 pages
Pipeline Processing
No ratings yet
Pipeline Processing
28 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
Pipelining & Riscs: Pipelining Used Key Implementation Technique To Build Fast Processors. It
No ratings yet
Pipelining & Riscs: Pipelining Used Key Implementation Technique To Build Fast Processors. It
6 pages
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
No ratings yet
3 Pipelining Pipeline:: "Folder" Takes 20 Minutes
8 pages
Computer Organization and Architecture Pipelining Set Execution, Stages and Throughput
No ratings yet
Computer Organization and Architecture Pipelining Set Execution, Stages and Throughput
7 pages
Lecture 3.1.2 (Concept of Pipelining, Pipeline Hazards)
No ratings yet
Lecture 3.1.2 (Concept of Pipelining, Pipeline Hazards)
6 pages
Pipe Line1
No ratings yet
Pipe Line1
7 pages
Uni1-2 Pipelining
No ratings yet
Uni1-2 Pipelining
12 pages
Ddco5-240207065925-3db65dc3 (1) - Pages-Deleted
No ratings yet
Ddco5-240207065925-3db65dc3 (1) - Pages-Deleted
8 pages
COA Unit - V Notes
No ratings yet
COA Unit - V Notes
21 pages
Chap-06a Pipelining
No ratings yet
Chap-06a Pipelining
12 pages
Pipelining 2
No ratings yet
Pipelining 2
16 pages
Pipelining 2019
No ratings yet
Pipelining 2019
82 pages
Session6-Pipelining Approach
No ratings yet
Session6-Pipelining Approach
11 pages
2 - Performance Issue
No ratings yet
2 - Performance Issue
4 pages
2 Performance Issue
No ratings yet
2 Performance Issue
4 pages
PipeLining in Microprocessors
No ratings yet
PipeLining in Microprocessors
19 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
37 pages
Concept of Pipelining 3.1.3
No ratings yet
Concept of Pipelining 3.1.3
6 pages
Computer Architecture Pipe Line
No ratings yet
Computer Architecture Pipe Line
28 pages
OLP Notes
No ratings yet
OLP Notes
11 pages
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Beginning Software Engineering
From Everand
Beginning Software Engineering
Rod Stephens
4/5 (1)
CCNA Exam Excellence: Study Guide & Practice Tests
From Everand
CCNA Exam Excellence: Study Guide & Practice Tests
SUJAN
No ratings yet
Key Feature-1 Key Feature-2 Key Feature-3 Key Feature-4 Key Feature-5
No ratings yet
Key Feature-1 Key Feature-2 Key Feature-3 Key Feature-4 Key Feature-5
2 pages
Iball Airway 7.2Mp-18 Is Designed To Enable Users With Access To High
No ratings yet
Iball Airway 7.2Mp-18 Is Designed To Enable Users With Access To High
2 pages
Scrubd
No ratings yet
Scrubd
184 pages
Rules
No ratings yet
Rules
1 page
U10EC107 Amrendra Kumar Mishra Self Attendance Sheet Practicals 2013-14 B.Tech-Viii
No ratings yet
U10EC107 Amrendra Kumar Mishra Self Attendance Sheet Practicals 2013-14 B.Tech-Viii
3 pages
Syllabus For Electronics and Communication Engineering (EC)
No ratings yet
Syllabus For Electronics and Communication Engineering (EC)
7 pages
Chapter Seven
No ratings yet
Chapter Seven
29 pages
Single Cycle Processor Design: COE 233 Logic Design and Computer Organization
No ratings yet
Single Cycle Processor Design: COE 233 Logic Design and Computer Organization
41 pages
Fetch / Execute Cycle
100% (1)
Fetch / Execute Cycle
19 pages
Robotics
No ratings yet
Robotics
56 pages
Registers of 8085 Microprocessor
No ratings yet
Registers of 8085 Microprocessor
5 pages
CS122: Computer Architecture & Organization: Semester I, 2011
No ratings yet
CS122: Computer Architecture & Organization: Semester I, 2011
27 pages
2 Enccsyll
No ratings yet
2 Enccsyll
90 pages
Time Delay Calculations
100% (1)
Time Delay Calculations
16 pages
An Introduction To Assembly Programming With The ARM 32-Bit Processor Family
No ratings yet
An Introduction To Assembly Programming With The ARM 32-Bit Processor Family
15 pages
MIPS Processor Implementation
No ratings yet
MIPS Processor Implementation
92 pages
16-Instruction Set-19-01-2023
No ratings yet
16-Instruction Set-19-01-2023
13 pages
Call, Time: 3 Branch, AND
100% (1)
Call, Time: 3 Branch, AND
32 pages
Microprocessor Notes 3
No ratings yet
Microprocessor Notes 3
68 pages
Machine Structure
No ratings yet
Machine Structure
27 pages
Microprocessor Applications
No ratings yet
Microprocessor Applications
100 pages
Computer Architecture Assignment 1
No ratings yet
Computer Architecture Assignment 1
6 pages
CH 5 Computer Architecture
No ratings yet
CH 5 Computer Architecture
18 pages
Unit-1 Mca
No ratings yet
Unit-1 Mca
89 pages
Question Bank #1 - Practice Questions For CSE 331 & 453 MID Exam
No ratings yet
Question Bank #1 - Practice Questions For CSE 331 & 453 MID Exam
14 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
20 Advanced Processor Designs
No ratings yet
20 Advanced Processor Designs
28 pages
MPMC - 3.3 8051 Instruction Set and ALP - 3
No ratings yet
MPMC - 3.3 8051 Instruction Set and ALP - 3
105 pages
Computer Organisation
No ratings yet
Computer Organisation
17 pages
Assembly Language: Arithmetic and Logic Instructions
No ratings yet
Assembly Language: Arithmetic and Logic Instructions
26 pages
CSC 424 Assignment
100% (1)
CSC 424 Assignment
8 pages
Unit Iv Coa - PPT
No ratings yet
Unit Iv Coa - PPT
92 pages
cs8491 QB PDF
No ratings yet
cs8491 QB PDF
17 pages
MAES - MID - LECTURE 05 - v4
No ratings yet
MAES - MID - LECTURE 05 - v4
21 pages
Instruction Set Principles and Architectures: Computer Architecture Prof. Muhamed Mudawar
No ratings yet
Instruction Set Principles and Architectures: Computer Architecture Prof. Muhamed Mudawar
53 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
17 pages

Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining

Uploaded by

Computer Organization: An Introduction To RISC Hardware: 6.1 An Overview of Pipelining

Uploaded by

1

Computer Organization: An Introduction to RISC Hardware

Figure 6.2 - Main Components of Stage 2

Figure 6.5 - Main Components of Stage 5

Figure 6.9 - Forwarding

You might also like