0% found this document useful (0 votes)

226 views8 pages

Frtyuiop

The document contains the questions and answers for a homework assignment on instruction pipelines. It discusses topics like calculating the minimum time for a single instruction to complete through multiple stages, determining the number of pipeline stages and clock rate, calculating pipeline speedup, analyzing dependencies between instructions, and evaluating penalties for different types of branches.

Uploaded by

Anonymous jSTkQVC27b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

226 views8 pages

Frtyuiop

Uploaded by

Anonymous jSTkQVC27b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CSC506 Pipeline Homework due Wednesday, June 9, 1999

Question 1. An instruction requires four stages to execute: stage 1 (instruction fetch) requires 30 ns, stage 2 (instruction decode) = 9 ns, stage 3 (instruction execute) = 20 ns and stage 4 (store results) = 10 ns. An instruction must proceed through the stages in sequence. What is the minimum asynchronous time for any single instruction to complete? 30 + 9 + 20 + 10 = 69 ns.

Question 2. We want to set this up as a pipelined operation. How many stages should we have and at what rate should we clock the pipeline? We have 4 natural stages given and no information on how we might be able to further subdivide them, so we use 4 stages in our pipeline. We have a choice of what clock rate to use. The simplest choice would be to use a clock cycle that accommodates the longest stage in our pipe 30 ns. This would allow us to initiate a new instruction every 30 ns with a latency through the pipe of 30 ns x 4 stages = 120 ns. We could also pick a finer clock cycle that more closely matches the shortest stage (9 ns) but is integrally divisible into the other stages. A clock of 10 ns would be a good match and would require three clocks for the first stage, 1 clock for the second, 2 clocks for the third and 1 clock for the fourth. This would allow us to initiate a new instruction every 30 ns but provide a latency of 70 ns rather than 120. Either 30 ns or 10 ns is acceptable.

Question 3. For the pipeline in question 2, how frequently can we initiate the execution of a new instruction, and what is the latency? See answer to question 2.

Question 4. What is the speedup of the pipeline in question 2? Speedup per Stone's preferred definition is (30 + 9 + 20 + 10)/30 = 2.3 Speedup per best clocked definition is (30 + 10 + 20 + 10)/30 = 2.33

Question 5. Draw the reduced state-diagram and show the maximum-rate cycle using the following collision vector: 1 0 0 0 1 1

100011

7 2 4
101111 111011

7 3

2
111111

4 3
110011

The maximum-rate cycle is the sequence 3, 4, 3, 4, 3, . . . giving two operations initiated every seven cycles, or 0.29 ops/cycle. The greedy cycle is 2,2,7,2,2,7,2,2, . . . giving three operations initiated every 11 cycles, or 0.27 ops/cycle. This is a case where the greedy cycle is not the optimum.

Question 6. We have a RISC processor with register-register arithmetic instructions that have the format R1 R2 op R3. The pipeline for these instructions runs with a 100 MHz clock with the following stages: instruction fetch = 2 clocks, instruction decode = 1 clock, fetch operands = 1 clock, execute = 2 clocks, and store result = 1 clock. a) At what rate (in MIPS) can we execute register-register instructions that have no data dependencies with other instructions? b) At what rate can we execute the instructions when every instruction depends on the results of the previous instruction? c) We implement internal forwarding. At what rate can we now execute the instructions when every instruction depends on the results of the previous instruction?

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

1 1

2 1

3 2 1

4 2

5 3 2

6 3

7 4 3

8 4

9 5 4

10 5

11 6 5

12 6

1 1

2 1 2 1

3 2 3 2

4 3 4 3

5 4

a) No dependencies rate = 1 inst/2 cycles at 100 MHz clock = 50 MIPS.

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

1 1

2 1

3 2 1

4 2

5 3 2

6 3

7 4 3

8 4 wait 2

9 5 4

10 5

13 6 5

14 6

15 wait

wait wait

wait wait wait 3

wait wait

1 1

wait wait 1 1

wait wait wait 2 2 2

wait wait wait 3 3 3

b) Dependencies rate = 1 inst/4 cycles = 25 MIPS. The reservation table shows that, although we begin fetching instructions every two cycles, the Operand Fetch unit must wait until the prior instruction stores its result before it can retrieve one of its operands (e.g. Op Fetch for #2 must wait until Op Store for #1 completes). As a result, things begin backing up in the pipeline, and we produce one instruction output only every 4 cycles.

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

1 1

2 1

3 2 1

4 2

5 3 2

6 3

3 2 3 2 1 2 3 2 3 3

1 1

c) Dependencies with internal forwarding rate = 1 inst/2 cycles = 50 MIPS. If we implement internal forwarding, the operand fetch unit can bypass fetching the dependent operand and just rename the dependent operand input register to be the result of instruction 1. The result is available in time for the next calculation; we just have to point one of the inputs for instruction 2 execution to the internal register that receives the output of instruction 1 in order to get it. We can then proceed without waiting.

Question 7. Conditional branches are a problem with instruction pipelines. For the RISC processor described in question 6, we decide to implement branches by always assuming the branch will not be taken rather than implementing some form of branch prediction or speculative execution, and we do not implement internal forwarding. We don't know that the instruction is a branch until stage 2 (decode), we don't know the condition code setting (for instructions that set the condition code) until stage 5 (operand store) is complete, and we can't provide the target address (of a branch taken) to stage 1 until the end of stage 5. Assume a sequence of instructions where the condition code setting instruction immediately precedes the conditional branch. a) What penalty in lost cycles do we incur for the branch not taken? b) What penalty in lost cycles do we incur for the branch taken? c) We implement delayed branching and the conditional branch is a delayed conditional branch. What penalty in lost cycles do we incur for the delayed branch taken? d) We implement internal forwarding along with the delayed branch. What penalty in lost cycles do we incur for the delayed branch taken with internal forwarding?

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

1 CC

2 CC

3 BR CC

4 BR

5 NSI BR

6 NSI

7 2SI

8 2SI

9 3SI 2SI

10 3SI
hold

11 4SI 3SI 2SI NSI BR

12 4SI
hold hold

NSI hold wait wait BR

CC CC

NSI hold BR BR

CC CC

NSI

a) We have a data dependency between the CC instruction and the branch instruction. The operand fetch unit must wait 2 cycles until the CC is stored by the operand store unit before fetching it for use by the branch instruction. The penalty depends on how we implement the pipeline. If we force the operand fetch 2 cycle delay up the pipeline, we introduce a two cycle delay, even when the branch is not taken. Penalty of 2 cycles for a branch not taken.

However, we have two cycles of buffering in the pipeline one cycle in the instruction decode unit (it waits every other cycle) and one cycle in the operand fetch unit (it also waits every other cycle). If these units can each hold onto their results for a cycle until the next stage is available as shown in the reservation table, we take no penalty for this particular instruction pair. However, we will end up taking the 2 cycle penalty when the next two (and every succeeding) instruction pair with data dependencies come along. Penalty of 2 cycles for a branch not taken.

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

1 CC

2 CC

3 BR CC

4 BR

5 NSI BR

6 NSI

7 2SI

8 2SI

9 3SI 2SI

10 3SI
hold

11 wait

12 BT

13 BT

NSI hold wait wait BR

CC CC

NSI hold BR BR BR

CC CC

b) The operand fetch unit must still wait 2 cycles until the CC is available from the operand store unit. By the time we know the outcome of the branch instruction (clock 10), we have fetched the next three sequential instructions. We must stop the execution of NSI, dump the 2SI and 3SI instructions, and stop the instruction fetch unit from fetching the fourth sequential instruction after the branch, for 6 wasted clocks. Since the instruction fetch unit cant get the new program counter address for the branch target (BT) instruction until clock 11, it cant begin fetching the instruction at the target address until clock 12, so we make it wait one more cycle. The total penalty for the branch taken is 7 cycles. If you assumed that the instruction fetch unit could not be stopped after fetching 3SI and proceeded to fetch 4SI, the total penalty is 8 cycles.

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

1 CC

2 CC

3 BA CC

4 BA

5 NSI BA

6 NSI

7 2SI

8 2SI

9 3SI 2SI

10 3SI
hold

11 wait

12 BT

13 BT

NSI hold wait wait BA

CC CC

NSI hold BA BA NSI BA NSI NSI

CC CC

c) The difference here is that we do not need to stop the execution of NSI on a delayed branch. It can continue to completion, but we still need to dump the 2SI and 3SI instructions, and stop the instruction fetch unit from fetching the fourth sequential instruction after the branch. The instruction fetch unit still cant get the new program counter address until clock 11, and it cant begin fetching the instruction at the target address until clock 12, so the total penalty for the branch taken is only 5 cycles, the four we lost by fetching the 2SI and 3SI instructions, and the one it had to wait before proceeding with the new PC. Again, if you assumed that 4SI was fetched, it would be 6 cycles.

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

1 CC

2 CC

3 BA CC

4 BA

5 NSI BA

6 NSI

7 2SI NSI

8 2SI

9 BT

10 BT

11 2T

12 2T

13 3T

CC CC

BA CC BA CC

NSI BA NSI BA NSI NSI

d) Internal forwarding allows us to forward the condition code result directly from the CC Execute stage (clock 6) to the branch Execute stage (clock 7), so we dont delay the branch. We can also forward the Branch Target address directly from the output of the Branch Execute stage (clock 8) to the Instruction Fetch unit so we dont lose the branch Operand Store cycle in clock 9. We still need to dump the 2SI instruction that we pre-fetched, so the total penalty for the delayed branch taken with internal forwarding is only 2 cycles.

Question 8. What is a greedy cycle? The greedy cycle arises from initiating a new instruction into the pipeline at the first opportunity in each state. The greedy cycle is also the maximum-rate cycle in many cases, but not necessarily.

Question 9. Why would you implement a branch history table in a pipelined computer? A branch history table gives you a better guess than random on whether or not a conditional branch will be taken. The assumption is that recent history is a good predictor of the near future, the same idea that the LRU cache replacement algorithm is based on. If we have a long instruction pipeline, a good guess will reduce the number of times we have to discard instructions that we prefetch and start into the pipeline following a conditional branch.

Question 10. What do we mean when we say a computer is superscalar? A superscalar computer executes more than one instruction per clock tick. This is achieved by having more than one pipeline and allowing instructions without dependencies on one another to proceed in parallel through the separate pipelines.

Question 11. What problem is speculative execution trying to solve? Speculative execution is another strategy used to reduce the effects of conditional branches. Rather than guessing which way a branch will go and fetching instructions only along one path, we proceed to fetch, decode, and begin execution of instructions along both paths. Results from both instruction streams are tentative until we know which way the branch goes. When the outcome of the branch is known, the tentative results from the path not taken are discarded and the results from the path taken are made permanent.

Assignment Solution Week11
100% (1)
Assignment Solution Week11
5 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
Pipelining
No ratings yet
Pipelining
44 pages
Sample Problems Pipe&Memory
No ratings yet
Sample Problems Pipe&Memory
57 pages
Slides Chapter 6 Pipelining
No ratings yet
Slides Chapter 6 Pipelining
60 pages
11 Processor Structure and Function 20 3 18
No ratings yet
11 Processor Structure and Function 20 3 18
27 pages
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
No ratings yet
CS17303 Computer Architecture Notes On Lesson Unit IV - Sumathi
24 pages
Computer Architecture MA 305: Dr. Daya Sagar Gupta
No ratings yet
Computer Architecture MA 305: Dr. Daya Sagar Gupta
10 pages
CSN-221 Pipelines-Quiz: Enrollment No.: 18114031 Name - Hemil Panchiwala
No ratings yet
CSN-221 Pipelines-Quiz: Enrollment No.: 18114031 Name - Hemil Panchiwala
6 pages
PIPELINE
No ratings yet
PIPELINE
13 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
53 pages
DPCO Unit 4 2mark Q&A
No ratings yet
DPCO Unit 4 2mark Q&A
11 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
CO Assignment 4 Solution
100% (1)
CO Assignment 4 Solution
10 pages
Co MODULE 3 - Merged
No ratings yet
Co MODULE 3 - Merged
102 pages
Co - Unit Ii - Ii
No ratings yet
Co - Unit Ii - Ii
34 pages
Hafta 14
No ratings yet
Hafta 14
23 pages
Coa Lecture Unit 3 Pipelining
No ratings yet
Coa Lecture Unit 3 Pipelining
95 pages
4-The Processors
No ratings yet
4-The Processors
3 pages
SQRT
No ratings yet
SQRT
28 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Lect3 Pipeline
No ratings yet
Lect3 Pipeline
4 pages
Solution 2
No ratings yet
Solution 2
3 pages
Instruction Pipelining
No ratings yet
Instruction Pipelining
32 pages
Archi Second 2013 2014 JCE
No ratings yet
Archi Second 2013 2014 JCE
2 pages
# Tutorial 9 & 10
No ratings yet
# Tutorial 9 & 10
6 pages
Question Answers CS501
No ratings yet
Question Answers CS501
9 pages
Lecture 7 - PIPELINING
No ratings yet
Lecture 7 - PIPELINING
16 pages
Lecture-6-13.01.2025 HPC
No ratings yet
Lecture-6-13.01.2025 HPC
17 pages
PS4 Solution
No ratings yet
PS4 Solution
6 pages
Pipeline History
No ratings yet
Pipeline History
30 pages
Computer Organization
No ratings yet
Computer Organization
43 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
App C
No ratings yet
App C
50 pages
hw5 Soln
No ratings yet
hw5 Soln
4 pages
Dpco Unit 4
No ratings yet
Dpco Unit 4
21 pages
Pipeline
No ratings yet
Pipeline
36 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
49 pages
L10-L11-Instruction Pipelining
No ratings yet
L10-L11-Instruction Pipelining
38 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
SIMD Machines:: Pipeline System
No ratings yet
SIMD Machines:: Pipeline System
35 pages
Be Computer Engineering Semester 4 2019 May Computer Organization and Architecture Cbcgs
No ratings yet
Be Computer Engineering Semester 4 2019 May Computer Organization and Architecture Cbcgs
18 pages
CS-3010 (HPC) - CS Mid Sept 2023
No ratings yet
CS-3010 (HPC) - CS Mid Sept 2023
7 pages
Chapter4 2
No ratings yet
Chapter4 2
34 pages
Computer Architecture M2 (Part 3)
No ratings yet
Computer Architecture M2 (Part 3)
34 pages
ACA Question Bank
No ratings yet
ACA Question Bank
19 pages
Group 17 - 2151177
No ratings yet
Group 17 - 2151177
15 pages
CO Gate 2023
No ratings yet
CO Gate 2023
6 pages
Assignment5 Soln
No ratings yet
Assignment5 Soln
5 pages
Unit 7 - Basic Processing
No ratings yet
Unit 7 - Basic Processing
85 pages
Exam2 Practice Sol
No ratings yet
Exam2 Practice Sol
6 pages
Lecture 3: CPU Structure and Function
No ratings yet
Lecture 3: CPU Structure and Function
47 pages
Tuesday, October 31, 2023 10:53 PM: Discuss, The Schemes For Dealing With The Pipeline Stalls Caused by Branch Hazards
No ratings yet
Tuesday, October 31, 2023 10:53 PM: Discuss, The Schemes For Dealing With The Pipeline Stalls Caused by Branch Hazards
7 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Concrete Mix Design
No ratings yet
Concrete Mix Design
11 pages
Sfdsds
No ratings yet
Sfdsds
1 page
I Need A Book: This Is The Free Method
No ratings yet
I Need A Book: This Is The Free Method
2 pages
017 Inst
No ratings yet
017 Inst
5 pages
Edit Registration Form - F
No ratings yet
Edit Registration Form - F
1 page
JHGHJ
No ratings yet
JHGHJ
1 page
One Thing You Will Stumble Across Time After Time Is Gann S Master TIME Factor, It S in His Master Courses But It S Not Clearly Labeled For You
No ratings yet
One Thing You Will Stumble Across Time After Time Is Gann S Master TIME Factor, It S in His Master Courses But It S Not Clearly Labeled For You
1 page
Fdsfdajhk
No ratings yet
Fdsfdajhk
1 page
VCBC
No ratings yet
VCBC
1 page
BVCVBCB
No ratings yet
BVCVBCB
1 page
GHDGD
100% (1)
GHDGD
1 page
HGJGH
No ratings yet
HGJGH
1 page
List of Tools, Equipment and Materials Computer System Servicing NC Ii
No ratings yet
List of Tools, Equipment and Materials Computer System Servicing NC Ii
2 pages
EE2004 Test
No ratings yet
EE2004 Test
2 pages
AccountStatement - 19-04-2025 09 - 57 - 10-1-1
No ratings yet
AccountStatement - 19-04-2025 09 - 57 - 10-1-1
41 pages
Cpu Registers
No ratings yet
Cpu Registers
35 pages
Kampala International University
No ratings yet
Kampala International University
9 pages
Operating System - Assignment
No ratings yet
Operating System - Assignment
10 pages
Oferta Cartuse Compatibile Si Consumabile - Copiprint Brasov
No ratings yet
Oferta Cartuse Compatibile Si Consumabile - Copiprint Brasov
6 pages
ZR22w Data Sheet 051010
No ratings yet
ZR22w Data Sheet 051010
2 pages
AVR202: 16-Bit Arithmetics: 8-Bit Microcontroller Application Note
No ratings yet
AVR202: 16-Bit Arithmetics: 8-Bit Microcontroller Application Note
4 pages
MPIquestion Bank
No ratings yet
MPIquestion Bank
10 pages
ARM Architecture
No ratings yet
ARM Architecture
16 pages
ELC781 Assignment 2
No ratings yet
ELC781 Assignment 2
1 page
CS3351 DPCO Syllabus 2021R II Year AIDS
100% (1)
CS3351 DPCO Syllabus 2021R II Year AIDS
2 pages
Tickets Data
No ratings yet
Tickets Data
172 pages
Lecture:11-13 INTERRUPTS: Course Instructor: Dr. Devyani Gupta
No ratings yet
Lecture:11-13 INTERRUPTS: Course Instructor: Dr. Devyani Gupta
35 pages
Os-Notes 2011
No ratings yet
Os-Notes 2011
314 pages
Computer Organization Notes
No ratings yet
Computer Organization Notes
214 pages
MSI PM8M3-V Recap
No ratings yet
MSI PM8M3-V Recap
17 pages
Differences Classic GUI HMC and Enhanced GUI
No ratings yet
Differences Classic GUI HMC and Enhanced GUI
76 pages
Secondary Storage Devices: A Presentation by Amar Chand
No ratings yet
Secondary Storage Devices: A Presentation by Amar Chand
13 pages
The Printer Laser Printers: - Advantages - Disadvantages
No ratings yet
The Printer Laser Printers: - Advantages - Disadvantages
6 pages
Chapter 4. Threads & Concurrency
No ratings yet
Chapter 4. Threads & Concurrency
64 pages
Toner Cartridge - Reset Guide
100% (1)
Toner Cartridge - Reset Guide
35 pages
Solo Printing Equipment List. 2019
No ratings yet
Solo Printing Equipment List. 2019
1 page
USB Driver Installation Manual ENG 2205-E
100% (1)
USB Driver Installation Manual ENG 2205-E
8 pages
Hardware Notes
No ratings yet
Hardware Notes
49 pages
Computer Science Project Work Grade 11 Hseb Notes
No ratings yet
Computer Science Project Work Grade 11 Hseb Notes
42 pages
Module - 01 - Computer Fundamentals
No ratings yet
Module - 01 - Computer Fundamentals
18 pages
Report September 2021 (Pt. Bankaltimtara)
No ratings yet
Report September 2021 (Pt. Bankaltimtara)
12 pages
CHS Semi Detailed Lesson Plan
100% (2)
CHS Semi Detailed Lesson Plan
3 pages

Frtyuiop

Uploaded by

Frtyuiop

Uploaded by

CSC506 Pipeline Homework due Wednesday, June 9, 1999

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

a) No dependencies rate = 1 inst/2 cycles at 100 MHz clock = 50 MIPS.

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

wait wait wait 3

wait wait wait 2 2 2

wait wait wait 3 3 3

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

11 4SI 3SI 2SI NSI BR

NSI hold wait wait BR

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

NSI hold wait wait BR

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

NSI hold wait wait BA

NSI hold BA BA NSI BA NSI NSI

Op Inst Fetch Inst Decode Op Fetch Execute Op Store

NSI BA NSI BA NSI NSI

You might also like