The document discusses various aspects of computer architecture, including pipeline hazards, cache memory types, and branch prediction mechanisms. It explains solutions for structural hazards, such as interlocking and forwarding, as well as the concepts of spatial and temporal locality in cache memory. Additionally, it covers cache miss types, the memory wall problem, and the differences between bimodal and correlating branch predictors.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
13 views2 pages
CompArch Cheatsheet
The document discusses various aspects of computer architecture, including pipeline hazards, cache memory types, and branch prediction mechanisms. It explains solutions for structural hazards, such as interlocking and forwarding, as well as the concepts of spatial and temporal locality in cache memory. Additionally, it covers cache miss types, the memory wall problem, and the differences between bimodal and correlating branch predictors.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2
LW 5 SW 4 R 4 BEQ 3 ADDI 4 J 3 CPI>1 T less than single cycle SCP throughput 1 instruction per 950ps Pipelined Length of pipeline
stage is 250ps Instruction latency is 5*250=1250
ps throughput 1 instruction per 250ps Solution for Structural Hazard Incorporate more resources Stall the operation Arbitration with interlocking RAW HAZARD SOLUTION Interlocking -- a simple solutionDetect the hazardStall the pipelineDegrade the speedup Forwarding – a sophisticated solutionThe result of the ALU output of Inst1 in the EX stage can immediately forward back to ALU input of EX stage as an operand for Inst2 Assume: Inst1 fetched prior to Inst2 Inst2 is data dependent on Inst1if Inst1 writes its output in a register, Reg (or memory location) Inst2 reads as that as its input Assume: Inst1 fetched prior to Inst2 Inst2 is anti-dependent on Inst1if Inst1 reads data from a register Reg (or memory location) which is subsequently overwritten by Inst2 Assume: Inst1 fetched prior to Inst2Inst2 is output dependent on Inst1 if both write in the same register Reg (or memory location) Inst2 writes its output after Inst1 Assume: Inst1 fetched prior to Inst2 Inst2 is control dependent on Inst1 if,Inst1 must completes before a decision can be made whether or not to execute Inst2 lwstall = ((rsD == rtE) OR (rtD == rtE)) ANDMemtoRegE StallF = lwstall StallD = lwstallFlushE = lwstall // or set 0’s to all control signals Enable in the Datapath Purpose:Example: In a load-use hazard, the lwstall signal will cause the IF and ID pipeline stages to stall. setting their enable signals to 0, freezing the registers in those stages.Clear in the Datapath: Flushing Example: During a branch misprediction, the FlushE signal clears the EXE stage to avoid executing the wrong instruction.By detecting this dependency (lwstall), the processor can stall the pipeline for one cycle to allow the lw instruction to finish, ensuring correct data is available for the dependent instruction.The pipeline should stall (lwstall = 1) Spatial Locality means that all those instructions that are stored nearby to the recently executed instruction have a high chance of execution. It refers to the use of data elements(instructions) that are relatively close in storage locations.Temporal Locality means that an instruction that is recently executed has a high chance of execution again. So the instruction is kept in cache memory such that it can be fetched easily and takes no time to search for the same instruction.In an Inclusive Cache hierarchy, the L2 cache always contains all the data stored in L1.Inclusion Property: Whatever is present in L1 must also be present in L2. This ensures consistency between cache levels and simplifies eviction policies.Exclusion Property: L1 and L2 have distinct content, which helps maximize cache space utilization across levels. Blocks evicted from L1 are typically moved to L2, keeping the overall cache hierarchy efficient.In an Exclusive Cache hierarchy, data evicted from L1 moves to L2, and data fetched into L1 is removed from L2 .Branchstall=BranchD AND RegWriteE AND (WriteRegE == rsD OR WriteRegE== rtD) OR BranchD AND MemtoRegM AND (WriteRegM == rsD OR WriteRegM == rtD) According to Equation of CPI, the total execution time is T3 = (100 ×109 instructions)(1.15 cycles/instruction)x (550 × 10−12 s / cycle) = 63.3 seconds.The branch-target buffer (BTB) or branch-target (address) cache (BTAC) is a branch-prediction cache that stores the predicted address for the next instruction after a branch.For a loop with N iterations the accuracy is N-2/N .The branch-target buffer (BTB) or branch-target (address) cache (BTAC) is a branch-prediction cache that stores the predicted address for the next instruction after a branch Update the local history table: Push the decision into the MSB of LHT Update the Global history register: Push the decision into the MSB of GHR Update the Local and Global prediction table: Based on the n-bit saturation counter Update the Choice table Follow the state table LPT if value of 3 bit saturating counter is greater then equal to 2, GPT branch is predicted to be taken if the value of 2 bit counter>=2. CPT CHOOSES GPT if value of 2 bit counter is >=2 LPT outcome if value <2. cache capacity, C? C=b*S No of bits for Tag = A – log2 (b) What is S for a fully associative cache of capacity C words with block size b? S=C/b DIRECT C=b*S No of bits for Tag = A – log2 (b) S=C/b SET ASSOCIATIVE In terms of the parameters described, what is the cache capacity, C?No of bits for Tag = A – log2(b) – log2 (S/N) S=C/b, N=S for fully associative cache S=C/b, N=1 for direct mapped cache Types of Cache Misses*Compulsory Miss: Occurs when data is accessed for the first time and is not in the cache. It is unavoidable on first access.Conflict Miss: Happens when multiple memory blocks map tothe same cache line, causing evictions and reloads due to cache associativity limitations.Capacity Miss: Occurs when the cache is too small to hold all the required data, leading to evictionS even if associativity is not an issueDirect cache --- search is simple; but high conflict miss rate Fully Associative Cache --- Complex search, low conflict misses Set Associative Cache --- Less complex search, reduces conflict miss rate 1 KB=103 210 MB=106 220 GB 109 230 FIFO Does it always match the temporal locality characteristic of the program? Some memory location such as global variables can be accessed continuously. READ ARCH Cache unit sits in parallel with main memory Both the main memory and the cache see a bus cycle at the same time. Hence the name “look aside.”Cache sees the processor's bus cycle before allowing it to pass on to the system bus. Hence “look through.” WRITE ARCH wb That is, when the processor starts a write cycle the cache receives the data and terminates the cycle. The cache then writes the data back to main memory when the system bus is available. Wt The processor writes through the cache to main memory. The cache may update its contents, however the write cycle does not end until the data is stored into main memory. This method is less complex and therefore less expensive to implement.Non Blocking Cache *Get more than one cache request *It continues to accept accesses to a block even if a miss to the block is in progress *The first miss is the primary miss and the rest are secondary.*The cache maintains such requests and sends the data to the secondary–MSHR,Early Restart and Critical word first.missing status holding register (MSHR)Inside the MSHR A bit indicating whether it is free or busy Information regarding which missing line is attached to itOn a cache miss: Search MSHRs for a pending access to the same blockFound: Allocate a load/store entry in the same MSHR entry Not found: Allocate a new MSHR No free entry: stall When a block returns from the next level in memory Check which loads/stores waiting for it Forward data to the load/store unit Deallocate load/store entry in the MSHR entry Write block in cache or MSHR If last block, deallocate MSHR (after writing the block in cache)Memory Wall Problem The growing performance gap between the speed of processors and the speed of memory access. As processors become faster, memory access times remain relatively slow, causing the processor to stall while waiting for data, leading to a bottleneck in overall system performance.Bimodal Predictor:Structure: Uses a small table of 2-bit counters indexed by branch address. Each counter tracks recent branch outcomes.Key Differences:History:Bimodal: Tracks individual branch history.Correlator: Tracks the global history of multiple branches.Prediction Logic:Bimodal: Uses simple counters for each branch.Correlator: Uses a history-based table that tracks correlations between branches.Complexity:Bimodal: Simple, low-cost, and fast but less effective for complex patterns.Correlator: More complex and resource-intensive but better for programs with correlated branches.Tags are Not available in MIPS Register File: Tags are used in cache memory to identify which memory block is stored in a cache line. In contrast, the MIPS register file directly stores data values and does not need tags because registers are accessed using fixed register indices (like $t0, $t1) rather than memory addresses. The control logic knows exactly which register to access, so no tag is required for identification.LHT 2k x m LPT 2m x n, m branches,n bit predictor,prog counter k bit