CompArch Cheatsheet

The document discusses various aspects of computer architecture, including pipeline hazards, cache memory types, and branch prediction mechanisms. It explains solutions for structural hazards, such as interlocking and forwarding, as well as the concepts of spatial and temporal locality in cache memory. Additionally, it covers cache miss types, the memory wall problem, and the differences between bimodal and correlating branch predictors.

Uploaded by

PRISHA GUPTA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views2 pages

CompArch Cheatsheet

Uploaded by

PRISHA GUPTA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

LW 5 SW 4 R 4 BEQ 3 ADDI 4 J 3 CPI>1 T less than single cycle SCP throughput 1 instruction per 950ps Pipelined Length of pipeline

stage is 250ps Instruction latency is 5*250=1250

ps throughput 1 instruction per 250ps Solution for Structural Hazard Incorporate more resources Stall the operation Arbitration with interlocking RAW HAZARD SOLUTION
Interlocking -- a simple solutionDetect the hazardStall the pipelineDegrade the speedup Forwarding – a sophisticated solutionThe result of the ALU output of Inst1 in the EX stage can
immediately forward back to ALU input of EX stage as an operand for Inst2 Assume: Inst1 fetched prior to Inst2 Inst2 is data dependent on Inst1if Inst1 writes its output in a register,
Reg (or memory location) Inst2 reads as that as its input Assume: Inst1 fetched prior to Inst2 Inst2 is anti-dependent on Inst1if Inst1 reads data from a register Reg (or memory location) which is
subsequently overwritten by Inst2 Assume: Inst1 fetched prior to Inst2Inst2 is output dependent on Inst1 if both write in the same register Reg (or memory location) Inst2 writes its output after Inst1
Assume: Inst1 fetched prior to Inst2 Inst2 is control dependent on Inst1 if,Inst1 must completes before a decision can be made whether or not to execute Inst2 lwstall = ((rsD == rtE)
OR (rtD == rtE)) ANDMemtoRegE StallF = lwstall StallD = lwstallFlushE = lwstall // or set 0’s to all control signals Enable in the Datapath Purpose:Example: In a load-use hazard, the lwstall
signal will cause the IF and ID pipeline stages to stall. setting their enable signals to 0, freezing the registers in those stages.Clear in the Datapath: Flushing Example: During a branch
misprediction, the FlushE signal clears the EXE stage to avoid executing the wrong instruction.By detecting this dependency (lwstall), the processor can stall the pipeline for one
cycle to allow the lw instruction to finish, ensuring correct data is available for the dependent instruction.The pipeline should stall (lwstall = 1) Spatial Locality means that all those
instructions that are stored nearby to the recently executed instruction have a high chance of execution. It refers to the use of data elements(instructions) that are relatively close in storage
locations.Temporal Locality means that an instruction that is recently executed has a high chance of execution again. So the instruction is kept in cache memory such that it can be fetched easily and
takes no time to search for the same instruction.In an Inclusive Cache hierarchy, the L2 cache always contains all the data stored in L1.Inclusion Property: Whatever is present in L1 must also be
present in L2. This ensures consistency between cache levels and simplifies eviction policies.Exclusion Property: L1 and L2 have distinct content, which helps maximize cache space utilization
across levels. Blocks evicted from L1 are typically moved to L2, keeping the overall cache hierarchy efficient.In an Exclusive Cache hierarchy, data evicted from L1 moves to L2, and data fetched
into L1 is removed from L2 .Branchstall=BranchD AND RegWriteE AND (WriteRegE == rsD OR WriteRegE== rtD) OR BranchD AND MemtoRegM AND (WriteRegM == rsD OR WriteRegM ==
rtD) According to Equation of CPI, the total execution time is T3 = (100 ×109 instructions)(1.15 cycles/instruction)x (550 × 10−12 s / cycle) = 63.3 seconds.The branch-target buffer
(BTB) or branch-target (address) cache (BTAC) is a branch-prediction cache that stores the predicted address for the next instruction after a branch.For a loop with N iterations the
accuracy is N-2/N .The branch-target buffer (BTB) or branch-target (address) cache (BTAC) is a branch-prediction cache that stores the predicted address for the next instruction after a
branch Update the local history table: Push the decision into the MSB of LHT Update the Global history register: Push the decision into the MSB of GHR Update the Local and Global
prediction table: Based on the n-bit saturation counter Update the Choice table Follow the state table LPT if value of 3 bit saturating counter is greater then equal to 2, GPT branch is
predicted to be taken if the value of 2 bit counter>=2. CPT CHOOSES GPT if value of 2 bit counter is >=2 LPT outcome if value <2. cache capacity, C? C=b*S No of bits for Tag = A – log2
(b) What is S for a fully associative cache of capacity C words with block size b? S=C/b DIRECT C=b*S No of bits for Tag = A – log2 (b) S=C/b SET ASSOCIATIVE In terms of the parameters
described, what is the cache capacity, C?No of bits for Tag = A – log2(b) – log2 (S/N) S=C/b, N=S for fully associative cache S=C/b, N=1 for direct mapped cache Types of Cache
Misses*Compulsory Miss: Occurs when data is accessed for the first time and is not in the cache. It is unavoidable on first access.Conflict Miss: Happens when multiple memory blocks map tothe
same cache line, causing evictions and reloads due to cache associativity limitations.Capacity Miss: Occurs when the cache is too small to hold all the required data, leading to evictionS even if
associativity is not an issueDirect cache --- search is simple; but high conflict miss rate Fully Associative Cache --- Complex search, low conflict misses Set Associative Cache --- Less complex
search, reduces conflict miss rate 1 KB=103 210 MB=106 220 GB 109 230 FIFO Does it always match the temporal locality characteristic of the program? Some memory location such as global
variables can be accessed continuously. READ ARCH Cache unit sits in parallel with main memory Both the main memory and the cache see a bus cycle at the same time. Hence the name “look
aside.”Cache sees the processor's bus cycle before allowing it to pass on to the system bus. Hence “look through.” WRITE ARCH wb That is, when the processor starts a write cycle the cache
receives the data and terminates the cycle. The cache then writes the data back to main memory when the system bus is available. Wt The processor writes through the cache to main memory. The
cache may update its contents, however the write cycle does not end until the data is stored into main memory. This method is less complex and therefore less expensive to implement.Non Blocking
Cache *Get more than one cache request *It continues to accept accesses to a block even if a miss to the block is in progress *The first miss is the primary miss and the rest are secondary.*The
cache maintains such requests and sends the data to the secondary–MSHR,Early Restart and Critical word first.missing status holding register (MSHR)Inside the MSHR A bit indicating whether it is
free or busy Information regarding which missing line is attached to itOn a cache miss: Search MSHRs for a pending access to the same blockFound: Allocate a load/store entry in the same MSHR
entry Not found: Allocate a new MSHR No free entry: stall When a block returns from the next level in memory Check which loads/stores waiting for it Forward data to the load/store unit Deallocate
load/store entry in the MSHR entry Write block in cache or MSHR If last block, deallocate MSHR (after writing the block in cache)Memory Wall Problem The growing performance gap between the
speed of processors and the speed of memory access. As processors become faster, memory access times remain relatively slow, causing the processor to stall while waiting for data, leading to a
bottleneck in overall system performance.Bimodal Predictor:Structure: Uses a small table of 2-bit counters indexed by branch address. Each counter tracks recent branch outcomes.Key
Differences:History:Bimodal: Tracks individual branch history.Correlator: Tracks the global history of multiple branches.Prediction Logic:Bimodal: Uses simple counters for each
branch.Correlator: Uses a history-based table that tracks correlations between branches.Complexity:Bimodal: Simple, low-cost, and fast but less effective for complex patterns.Correlator: More
complex and resource-intensive but better for programs with correlated branches.Tags are Not available in MIPS Register File: Tags are used in cache memory to identify which memory block is
stored in a cache line. In contrast, the MIPS register file directly stores data values and does not need tags because registers are accessed using fixed register indices (like $t0, $t1) rather than
memory addresses. The control logic knows exactly which register to access, so no tag is required for identification.LHT 2k x m LPT 2m x n, m branches,n bit predictor,prog counter k bit

Unit 4
No ratings yet
Unit 4
72 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
Cache Optimizations
No ratings yet
Cache Optimizations
29 pages
Parallel & Distributed Computing
No ratings yet
Parallel & Distributed Computing
58 pages
Computer Arch 06
No ratings yet
Computer Arch 06
41 pages
Week 13 - Lecture 13 - Memory (Cont)
No ratings yet
Week 13 - Lecture 13 - Memory (Cont)
31 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Cache Org
No ratings yet
Cache Org
19 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
Cache 1 54
No ratings yet
Cache 1 54
54 pages
l08 Caches 2
No ratings yet
l08 Caches 2
39 pages
Coa Viva Questions
83% (6)
Coa Viva Questions
15 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Lec 4
No ratings yet
Lec 4
31 pages
DigitalLogic ComputerOrganization L22 CachesP3 Handout
No ratings yet
DigitalLogic ComputerOrganization L22 CachesP3 Handout
52 pages
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
No ratings yet
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
17 pages
Lecture 8 Cont. Cache Memory
No ratings yet
Lecture 8 Cont. Cache Memory
29 pages
Cache Writing & Performance
No ratings yet
Cache Writing & Performance
23 pages
EECS 470 Final Review
No ratings yet
EECS 470 Final Review
16 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
10 Caches
No ratings yet
10 Caches
34 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
76 pages
L07 MemoryII
No ratings yet
L07 MemoryII
27 pages
Cache
No ratings yet
Cache
36 pages
Unit II
No ratings yet
Unit II
9 pages
Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University
No ratings yet
Computer Science 246 Computer Architecture: Si 2009 Spring 2009 Harvard University
27 pages
L38 PDF
No ratings yet
L38 PDF
19 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Cache Design:: Making It Real
No ratings yet
Cache Design:: Making It Real
19 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
No ratings yet
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
20 pages
Caches - 6.004
No ratings yet
Caches - 6.004
8 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
R RRRRRRRR Final
No ratings yet
R RRRRRRRR Final
28 pages
2015Sp CS61C L16 Kavs Caches3
No ratings yet
2015Sp CS61C L16 Kavs Caches3
25 pages
Cache
No ratings yet
Cache
34 pages
Computer Architecture Solutions - OK
No ratings yet
Computer Architecture Solutions - OK
6 pages
CS530 Fall2015 Lecture6
No ratings yet
CS530 Fall2015 Lecture6
3 pages
Examlet 4 Review
No ratings yet
Examlet 4 Review
2 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
COA Digital-Cheatsheet
No ratings yet
COA Digital-Cheatsheet
4 pages
04 Cache Memory
No ratings yet
04 Cache Memory
36 pages
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
No ratings yet
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
13 pages
Comp Org Exam 3 Cheat Sheet
No ratings yet
Comp Org Exam 3 Cheat Sheet
3 pages
Lecture16 PDF
No ratings yet
Lecture16 PDF
4 pages
Understanding CPU Caching
No ratings yet
Understanding CPU Caching
7 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
Cache Memory
No ratings yet
Cache Memory
28 pages
10 Multi-Level Strategies: Assignments
No ratings yet
10 Multi-Level Strategies: Assignments
20 pages
Computer Architecture and Organization: Lecture14: Cache Memory Organization
No ratings yet
Computer Architecture and Organization: Lecture14: Cache Memory Organization
18 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
General Principles of Pipelining: Andrew Warfield CS313
No ratings yet
General Principles of Pipelining: Andrew Warfield CS313
25 pages
Cse-Vii-Advanced Computer Architectures (10cs74) - Solution
100% (1)
Cse-Vii-Advanced Computer Architectures (10cs74) - Solution
111 pages
Vector Processor
No ratings yet
Vector Processor
83 pages
CS6303 Computer Architecture 2
No ratings yet
CS6303 Computer Architecture 2
56 pages
ASIC Design Flow - SpecStep
No ratings yet
ASIC Design Flow - SpecStep
58 pages
High Performance Computing - CS 3010 - MID SEM Question by Subhasis Dash With Solution
No ratings yet
High Performance Computing - CS 3010 - MID SEM Question by Subhasis Dash With Solution
12 pages
Ripes A Visual Computer Architecture Simulator
100% (1)
Ripes A Visual Computer Architecture Simulator
8 pages
MIPS
No ratings yet
MIPS
70 pages
CST202 QP
No ratings yet
CST202 QP
2 pages
Homework 2
No ratings yet
Homework 2
8 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
Pipelined Architecture With Its Diagram
No ratings yet
Pipelined Architecture With Its Diagram
20 pages
Sifive E76 Manual 20G1.03.00
No ratings yet
Sifive E76 Manual 20G1.03.00
127 pages
Chapter 6
No ratings yet
Chapter 6
71 pages
Master Thesis Computer Architecture
100% (2)
Master Thesis Computer Architecture
8 pages
Coa Iat-2 QB Soln
No ratings yet
Coa Iat-2 QB Soln
16 pages
Review FinalExam
No ratings yet
Review FinalExam
10 pages
Tut10 Selected Ans
No ratings yet
Tut10 Selected Ans
7 pages
Pipelining
No ratings yet
Pipelining
44 pages
M3.3 Data Hazard
No ratings yet
M3.3 Data Hazard
12 pages
3.3.5 Reduced Instruction Set Computing Processors (RISC)
No ratings yet
3.3.5 Reduced Instruction Set Computing Processors (RISC)
11 pages
Full Solution Manual For Modern Processor Design by John Paul Shen and Mikko H. Lipasti PDF
No ratings yet
Full Solution Manual For Modern Processor Design by John Paul Shen and Mikko H. Lipasti PDF
18 pages
COA Chapter6
No ratings yet
COA Chapter6
9 pages
Computer Organization: Instructions and
No ratings yet
Computer Organization: Instructions and
25 pages
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
No ratings yet
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
14 pages
MID SEM Makeup QP July 2021
No ratings yet
MID SEM Makeup QP July 2021
4 pages
Assignment 2 Solution
0% (1)
Assignment 2 Solution
4 pages
This Unit: Superscalar Execution: - Idea of Instruction-Level Parallelism - Superscalar Scaling Issues
No ratings yet
This Unit: Superscalar Execution: - Idea of Instruction-Level Parallelism - Superscalar Scaling Issues
13 pages
Comp Arch Syllabus 6-29-2017
No ratings yet
Comp Arch Syllabus 6-29-2017
4 pages

CompArch Cheatsheet

Uploaded by

CompArch Cheatsheet

Uploaded by

LW 5 SW 4 R 4 BEQ 3 ADDI 4 J 3 CPI>1 T less than single cycle SCP throughput 1 instruction per 950ps Pipelined Length of pipeline

stage is 250ps Instruction latency is 5*250=1250

You might also like