0% found this document useful (0 votes)

63 views

No. of Cycles IF ID EXE MEM WB

The document contains the pipeline stages of different instructions over multiple cycles in tabular form. It also contains code snippets of a loop with instructions before and after optimizations like loop unrolling and scheduling instructions to reduce stalls. Finally, it contains questions related to cache organization, memory hierarchy performance, and estimating execution time based on cache hit rates and memory access latencies.

Uploaded by

xxx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

No. of Cycles IF ID EXE MEM WB

Uploaded by

xxx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Q1

(a)
No. of Cycles IF ID EXE MEM WB
1. 1, 2
2. 3, 4 1, 2
3. 5, 6 3, 4 1, 2
4. 3, 4 1, 2
5. 3, 4 2 1
6. 4, 5 3 2 1
7. 4, 5 3 2
8. 5, 6 4 3
9. 5, 6 4 3
10. 6 5 4
11. 6 5 4
12. 6 5
13. 1, 2 6 5
14. 3, 4 1, 2 6
15. 5, 6 3, 4 1, 2 6
16. 3, 4 1, 2
17. 3, 4 2 1
18. 4, 5 3 2 1
19. 4, 5 3 2
20. 5, 6 4 3
21. 5, 6 4 3
22. 6 5 4
23. 6 5 4
24. 6 5
25. 6 5
26. 6
27. 6

(b)
Instructions after 2 level loop- With stalls Optimized schedule of
unrolling (without false Instructions
dependencies)
Loop: LD R3, 40(R5) Loop: LD R3, 40(R5) Loop: DIV R2, R2, R5
DIV R2, R2, R5 DIV R2, R2, R5 LD R3, 40(R5)
ADD R2, R2, R3 2stalls SUB R8, R5, 2
ST R2, 20(R5) ADD R2, R2, R3 LD R6, 38(R5)
SUB R8, R5, 2 1stall ADD R2, R2, R3
LD R6, 38(R5) ST R2, 20(R5) DIV R7, R7, R8
DIV R7, R2, R8 SUB R8, R5, 2 ST R2, 20(R5)
ADD R2, R7, R6 LD R6, 38(R5) SUB R5, R5, 4
ST R2, 18(R5) DIV R7, R2, R8 ADD R7, R7, R6
SUB R5, R5, 4 2stalls 1stall
BEQ R5, R0, Loop ADD R2, R7, R6 BEQ R5, R0, Loop
1stall ST R7, 18(R5)
ST R2, 18(R5)
SUB R5, R5, 4
2 stalls
BEQ R5, R0, Loop
1stall

(c)
No. of Cycles IF ID EXE MEM WB
1. 1, 2
2. 3, 4 1, 2
3. 5, 6 3, 4 1, 2
4. 3, 4 1, 2
5. 3, 4 2 1
6. 6, 7 4, 5 3 2 1
7. 4, 5 3 2
8. 7, 8 5, 6 4 3
9. 5, 6 4 3
10. 8, 9 6, 7 5 4
11. 6, 7 5 4
12. 10, 11 8, 9 6, 7 5
13. 8, 9 6, 7 5
14. 8, 9 7 6
15. 9, 10 8 7 6
16. 9, 10 8 7
17. 10, 11 9 8
18. 10, 11 9 8
19. 11 10 9
20. 11 10 9
21. 11 10
22. 11 10
23. 11
24. 11

Q2
L1 cache:

Cache size = 128KB

Block size = 16B  offset bits = 4 bits
Total Block in cache = 128KB / 16B = 8K
Direct-Map cache, total sets = 8K  index bits = 13 bits
Tag bits = 36 – (4 + 13) = 19 bits
Size of tag array = 19 * 8K = 152K bits = 19KB

L2 cache:

Cache size = 4MB

Block size = 16B  offset bits = 4 bits
Total Block in cache = 4MB / 16B = 256K
4-way set associative cache, total sets = 64K  index bits = 16 bits
Tag bits = 36 – (4 + 16) = 16 bits
Size of tag array = 16 * 64K * 4 = 4M bits = 512KB

Q3
Let’s suppose a processor has CPI of 1

a) Design one
Instruction miss cycles 3% * 200 * I 6I cycles
Data miss cycles  8% * 25% * 200 * I 4I cycles
Total cycles per instruction1 + 4 + 611 cycles

Design two
Instruction miss cycles 5% * 200 * I 10I cycles
Data miss cycles 5% * 25% * 200 * I 2.5I cycles
Total cycles per instruction  1 + 2.5 + 1013.5 cycles

Design one is better than design two by almost 22.73 percent

b) Instruction miss cycles for L1cache 5% * 20 * I  1I

Instruction miss cycles for L2 cache 50% * 5% * 200 * I 5I
Total instruction miss cycles 6I
Data miss cycles for L1 cache  5% * 25% * 20 * I  0.25I
Data miss cycles for L2 cache 50% * 5% * 25% * 200 * I 1.25
Total data miss cycles  1.5I
Total cycles per instruction  1 + 6+ 1.57.5 cycles
Q4
a->2, b->2, c->2, d->1, f->3, g->5
a) (5*30% + 5*20% + 4*30% + 4*10% + 20*10%) * 10 6 = 6.1M cycles = 6100000 cycles

b) Clock Speed = 4GHz  Clock cycle time = 1/4G = 0.25ns

106 fetch instructions to L1  2*106 cycles (L1 hit time)
8% * 106 miss rate  0.08*106 * 250ns  20 * 106 ns (20*106 / 0.25) = 80 * 106 cycles
Total time = 2*106 + 80*106 = 82 * 106 cycles = 82000000 cycles
c) Clock Speed = 4GHz  Clock cycle time = 1/4G = 0.25ns
50% data instructions 5 * 105 to L1  2*5*105 106cycles (L1 hit time)
8%*5*105 miss rate  0.08*5*105*250ns 107 ns  (107 / 0.25) = 40 * 106 cycles
Total time = 106 + 40*106 = 41 * 106 cycles = 41000000 cycles

d) Total Memory Access cycles = Fetch Instruction + Memory Data

= 82000000 + 41000000
= 123 * 106 cycles = 123000000 cycles

f) 3% of 106 = 37.5% of 8% of 106& 5% of 106 = 62.5% of 8% of 106

1) Large off-chip
L2 hit = 15ns = 15/0.25 cycles = 60 cycles
Fetch instruction cycles = L1 hit + L2 hit + memory access
= 2*106 + 0.08*106*60 + 0.375*0.08*106*1000
= (2+4.8+30) * 106 = 36800000 cycles
50% Data instruction cycles = 2*0.5*10 6 + 0.08*0.5*106*60 + 0.375*0.08*0.5*106*1000
= (1+2.4+15) * 106 = 18400000 cycles
Total memory access cycles = 18400000 + 36800000 = 55200000 cycles

2) Small on-chip
L2 hit = 3ns = 3/0.25 cycles = 12 cycles
Fetch instruction cycles = L1 hit + L2 hit + memory access
= 2*106 + 0.08*106*12 + 0.625*0.08*106*1000
= (2+0.96+50) * 106 = 52960000 cycles
50% Data instruction cycles = 2*0.5*10 6 + 0.08*0.5*106*12 + 0.625*0.08*0.5*106*1000
= (1+0.48+25) * 106 = 26480000 cycles
Total memory access cycles = 26480000 + 52960000 = 79440000 cycles

g) 1) Large off-chip
For LD  (7*0.3 + 2*60*0.08*0.3 + 1000*0.375*0.08*0.3) * 10 6 13.98*106 cycles
For ST  (7*0.2 + 2*60*0.08*0.2 + 1000*0.375*0.08*0.2) * 10 6 9.32*106 cycles
For INT  (5*0.3 + 60*0.08*0.3 + 1000*0.375*0.08*0.3) * 10 611.94*106 cycles
For BR  (5*0.1 + 60*0.08*0.1 + 1000*0.375*0.08*0.1) * 10 6 3.98*106 cycles
For FL  (21*0.1 + 60*0.08*0.1 + 1000*0.375*0.08*0.1) * 10 65.58*106 cycles
Total cycles = 44.8 * 106 = 44800000
Total Execution Time = 44800000 * 0.25 * 10 -9 = 0.0112ns

2) Small on-chip
For LD  (7*0.3 + 2*12*0.08*0.3 + 1000*0.625*0.08*0.3) * 10 6 17.676*106 cycles
For ST  (7*0.2 + 2*12*0.08*0.2 + 1000*0.625*0.08*0.2) * 10 6 11.784*106 cycles
For INT  (5*0.3 + 12*0.08*0.3 + 1000*0.625*0.08*0.3) * 10 6 16.788*106 cycles
For BR  (5*0.1 + 12*0.08*0.1 + 1000*0.625*0.08*0.1) * 10 6 5.596*106 cycles
For FL  (21*0.1 + 12*0.08*0.1 + 1000*0.625*0.08*0.1) * 10 6 7.196*106 cycles
Total cycles = 59.04 * 106 = 59040000
Total Execution Time = 59040000 * 0.25 * 10 -9 = 0.01476ns

Option 2 is 1.318 times better than option 1. i.e.25% better

Manual Solution For RISC-V Edition
100% (5)
Manual Solution For RISC-V Edition
100 pages
Solution Manual COD
No ratings yet
Solution Manual COD
115 pages
CS 303: Software Engineering (D) : Quiz 1
No ratings yet
CS 303: Software Engineering (D) : Quiz 1
1 page
Assignment#2 Solution
No ratings yet
Assignment#2 Solution
8 pages
HW1SolSp25
No ratings yet
HW1SolSp25
11 pages
NguyenThuyLinh_20235965_Chapter5.cpp
No ratings yet
NguyenThuyLinh_20235965_Chapter5.cpp
7 pages
HW3 Sol PDF
No ratings yet
HW3 Sol PDF
5 pages
Lecture13
No ratings yet
Lecture13
114 pages
Open Book 331
No ratings yet
Open Book 331
33 pages
CSE 560 - Practice Problem Set 4 Solution
No ratings yet
CSE 560 - Practice Problem Set 4 Solution
3 pages
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
No ratings yet
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
78 pages
chapter4_2
No ratings yet
chapter4_2
34 pages
Final Exam - Fall 2008: COE 308 - Computer Architecture
No ratings yet
Final Exam - Fall 2008: COE 308 - Computer Architecture
8 pages
archmidsem2009sol
No ratings yet
archmidsem2009sol
5 pages
week6_performance_numericals
No ratings yet
week6_performance_numericals
38 pages
Introduction To Advanced Pipelining
No ratings yet
Introduction To Advanced Pipelining
64 pages
En m3 Ex Sol
No ratings yet
En m3 Ex Sol
35 pages
Tutorial Module 4
No ratings yet
Tutorial Module 4
9 pages
F10 E1 Solution
No ratings yet
F10 E1 Solution
5 pages
BFS using N-W Corner
No ratings yet
BFS using N-W Corner
28 pages
Midterm Solutions Mar 30
No ratings yet
Midterm Solutions Mar 30
6 pages
CH04 Solution
No ratings yet
CH04 Solution
24 pages
Lecture10 - chapter4-p2
No ratings yet
Lecture10 - chapter4-p2
46 pages
Solution 2
No ratings yet
Solution 2
3 pages
Ca Mid1 2017
No ratings yet
Ca Mid1 2017
9 pages
Chapter 12 Performance of Single-cycle and multi-cycle data path
No ratings yet
Chapter 12 Performance of Single-cycle and multi-cycle data path
27 pages
Superscalar Architecture
No ratings yet
Superscalar Architecture
156 pages
05 Instruction+Level+Parallelism
No ratings yet
05 Instruction+Level+Parallelism
11 pages
102
No ratings yet
102
72 pages
5th Exp
No ratings yet
5th Exp
9 pages
CH01 Solution PDF
No ratings yet
CH01 Solution PDF
8 pages
Homework 5
No ratings yet
Homework 5
6 pages
Lecture 16: Basic CPU Design
No ratings yet
Lecture 16: Basic CPU Design
20 pages
Comporg6 Sol1
No ratings yet
Comporg6 Sol1
4 pages
COA Digital-Cheatsheet
No ratings yet
COA Digital-Cheatsheet
4 pages
Adv Topic Compiler Supported ILPSlides
No ratings yet
Adv Topic Compiler Supported ILPSlides
18 pages
CS222 - COAL - SOLUTION - Final - Spring2023
No ratings yet
CS222 - COAL - SOLUTION - Final - Spring2023
12 pages
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
Sheet 8
No ratings yet
Sheet 8
13 pages
Lec02 Superscalar SW VLIW 22 23
No ratings yet
Lec02 Superscalar SW VLIW 22 23
34 pages
ps1 Sol
No ratings yet
ps1 Sol
11 pages
Activity 1 - 2021 Hpca
No ratings yet
Activity 1 - 2021 Hpca
4 pages
Advance Computer Architecture Homework 2 Solution
No ratings yet
Advance Computer Architecture Homework 2 Solution
8 pages
FemtoRV32 Piplined Processor Report
No ratings yet
FemtoRV32 Piplined Processor Report
25 pages
Ca Sol PDF
No ratings yet
Ca Sol PDF
8 pages
Ch#4 Part 1, 2,34
No ratings yet
Ch#4 Part 1, 2,34
70 pages
AdvTopicCompilerSupportedILP
No ratings yet
AdvTopicCompilerSupportedILP
17 pages
Slide 3
No ratings yet
Slide 3
65 pages
Computer Architecture: CSCE 350
No ratings yet
Computer Architecture: CSCE 350
41 pages
Coss
No ratings yet
Coss
2 pages
Revision Microprocessor PDF
No ratings yet
Revision Microprocessor PDF
26 pages
Compiler Techniques For Exposing ILP
No ratings yet
Compiler Techniques For Exposing ILP
4 pages
Computer Organization and Design RISC-V 1st Edition Patterson Solutions Manual - Download All Chapters Immediately In PDF Format
100% (3)
Computer Organization and Design RISC-V 1st Edition Patterson Solutions Manual - Download All Chapters Immediately In PDF Format
52 pages
Lecture 9: Case Study - MIPS R4000 and Introduction To Advanced Pipelining
No ratings yet
Lecture 9: Case Study - MIPS R4000 and Introduction To Advanced Pipelining
23 pages
Materials Needed:: Spring 2020
No ratings yet
Materials Needed:: Spring 2020
5 pages
Computer Organization and Design RISC-V 1st Edition Patterson Solutions Manual pdf download
100% (1)
Computer Organization and Design RISC-V 1st Edition Patterson Solutions Manual pdf download
37 pages
CG2028 Lecture 4
No ratings yet
CG2028 Lecture 4
40 pages
Introducing Autodesk Maya 2015: Autodesk Official Press
From Everand
Introducing Autodesk Maya 2015: Autodesk Official Press
Dariush Derakhshani
No ratings yet
Stuttgart Bonnet Knitting Pattern
From Everand
Stuttgart Bonnet Knitting Pattern
Agnese Iskrova
No ratings yet
Wheat Pixie Bonnet Knitting Pattern
From Everand
Wheat Pixie Bonnet Knitting Pattern
Agnese Iskrova
No ratings yet
Frankfurt Pixie Bonnet Knitting Pattern
From Everand
Frankfurt Pixie Bonnet Knitting Pattern
Agnese Iskrova
No ratings yet
CS 303: Software Engineering (D) : Quiz 2
No ratings yet
CS 303: Software Engineering (D) : Quiz 2
2 pages
CS 303: Software Engineering (E) : Quiz 2
No ratings yet
CS 303: Software Engineering (E) : Quiz 2
2 pages
Capturing Requirements: Zeeshan Ali Rana
No ratings yet
Capturing Requirements: Zeeshan Ali Rana
27 pages
CS 303: Software Engineering (E) : Quiz 1
No ratings yet
CS 303: Software Engineering (E) : Quiz 1
2 pages
CS 303: Software Engineering (D) : Quiz 2
No ratings yet
CS 303: Software Engineering (D) : Quiz 2
2 pages
Capturing Requirements: Zeeshan Ali Rana
No ratings yet
Capturing Requirements: Zeeshan Ali Rana
27 pages
Capturing Requirements: Zeeshan Ali Rana
No ratings yet
Capturing Requirements: Zeeshan Ali Rana
11 pages
Lecture 9-10 Project Management
No ratings yet
Lecture 9-10 Project Management
30 pages
Lecture 5-6-7-8 Software Process Models 20200926
No ratings yet
Lecture 5-6-7-8 Software Process Models 20200926
52 pages
Size of Software Industry:: USD 500 Billion
No ratings yet
Size of Software Industry:: USD 500 Billion
40 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
4 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
4 pages
National University of Computer and Emerging Sciences, Lahore Campus
No ratings yet
National University of Computer and Emerging Sciences, Lahore Campus
2 pages
Introduction To Object Oriented Analysis and Design
No ratings yet
Introduction To Object Oriented Analysis and Design
32 pages
Jeu D'instructions Du Microprocesseur MC6809 de MOTOROLA
No ratings yet
Jeu D'instructions Du Microprocesseur MC6809 de MOTOROLA
6 pages
PIC24FJXXXGA0XX Flash Programming Specification
No ratings yet
PIC24FJXXXGA0XX Flash Programming Specification
52 pages
Fabrication of CMOS Integrated Circuits: Dae Hyun Kim Eecs Washington State University
No ratings yet
Fabrication of CMOS Integrated Circuits: Dae Hyun Kim Eecs Washington State University
34 pages
Microprocessor and Microcontroller MCQ 6
No ratings yet
Microprocessor and Microcontroller MCQ 6
6 pages
Memory and Its Types
100% (2)
Memory and Its Types
4 pages
Eprom Interfacing
No ratings yet
Eprom Interfacing
4 pages
Computer Architecture Final 1 2022
No ratings yet
Computer Architecture Final 1 2022
2 pages
Correct Specifications From The Relevant Sources
No ratings yet
Correct Specifications From The Relevant Sources
22 pages
Average Access Time (AAT)
No ratings yet
Average Access Time (AAT)
6 pages
VLSI Verilog
No ratings yet
VLSI Verilog
17 pages
Ram
No ratings yet
Ram
6 pages
ERS MCQs (All Units)
No ratings yet
ERS MCQs (All Units)
45 pages
8086 and 8088 Microprocessor
No ratings yet
8086 and 8088 Microprocessor
6 pages
MCA Mini Project
No ratings yet
MCA Mini Project
13 pages
Laboratory Exercise 10: An Enhanced Processor
No ratings yet
Laboratory Exercise 10: An Enhanced Processor
6 pages
Relocation and External Fragmentation
No ratings yet
Relocation and External Fragmentation
5 pages
Ic Equivalent
67% (3)
Ic Equivalent
4 pages
Muzna, Aminullah Taj, Saad Hashmi: Edited by
No ratings yet
Muzna, Aminullah Taj, Saad Hashmi: Edited by
23 pages
Compare These Arm Architectures: ARM7TDMI and ARM9TDMI. Answer
No ratings yet
Compare These Arm Architectures: ARM7TDMI and ARM9TDMI. Answer
6 pages
Digital Integrated Circuits Problem Sheet 2: OL OH M
No ratings yet
Digital Integrated Circuits Problem Sheet 2: OL OH M
2 pages
EE3013 Course Overview 2024 Sem 2
No ratings yet
EE3013 Course Overview 2024 Sem 2
10 pages
Microcontroller Chapter 02
100% (1)
Microcontroller Chapter 02
24 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
28 pages
CPU Registers
100% (4)
CPU Registers
33 pages
Pic Microcontroller - Class Notes
81% (27)
Pic Microcontroller - Class Notes
26 pages
Powerpc Architecture: Power Stands For Performance Optimization With Enhanced Risc History
No ratings yet
Powerpc Architecture: Power Stands For Performance Optimization With Enhanced Risc History
15 pages
Exam1: Q1) What Is The Control Register in 80386DX in Protected Model?
No ratings yet
Exam1: Q1) What Is The Control Register in 80386DX in Protected Model?
3 pages
AcA Assignment VIDHI KISHOR
No ratings yet
AcA Assignment VIDHI KISHOR
6 pages
Price of Age Computers
No ratings yet
Price of Age Computers
46 pages
Panasonic FZ-1 Service Manual Supplement, CPD9401004S1, Parts Changes and Corrections of Schematic Diagrams
No ratings yet
Panasonic FZ-1 Service Manual Supplement, CPD9401004S1, Parts Changes and Corrections of Schematic Diagrams
2 pages

No. of Cycles IF ID EXE MEM WB

Uploaded by

No. of Cycles IF ID EXE MEM WB

Uploaded by

Q1

Cache size = 128KB

Cache size = 4MB

Design one is better than design two by almost 22.73 percent

b) Instruction miss cycles for L1cache 5% * 20 * I  1I

b) Clock Speed = 4GHz  Clock cycle time = 1/4G = 0.25ns

d) Total Memory Access cycles = Fetch Instruction + Memory Data

f) 3% of 106 = 37.5% of 8% of 106& 5% of 106 = 62.5% of 8% of 106

Option 2 is 1.318 times better than option 1. i.e.25% better

You might also like