0% found this document useful (0 votes)
59 views4 pages

COA Digital-Cheatsheet

This document provides a summary of key concepts in computer organization and cache memory. It defines structural hazards as simultaneous use of hardware resources. It describes different types of data hazards like RAW and how forwarding can resolve them. It discusses control hazards from branching and how early branch resolution and prediction can help. Finally, it summarizes cache performance metrics and how parameters like block size can impact hit rate and miss penalty.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views4 pages

COA Digital-Cheatsheet

This document provides a summary of key concepts in computer organization and cache memory. It defines structural hazards as simultaneous use of hardware resources. It describes different types of data hazards like RAW and how forwarding can resolve them. It discusses control hazards from branching and how early branch resolution and prediction can help. Finally, it summarizes cache performance metrics and how parameters like block size can impact hit rate and miss penalty.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

lOMoARcPSD|38291606

CS2100 Finals Cheatsheet

Computer Organisation (National University of Singapore)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Muskan Gupta ([email protected])
lOMoARcPSD|38291606

Hazard and resolution


Structural Hazards
 Simultaneous use of hardware resource (e.g. memory unit used by
both load and fetch instruction)
 No issue for MIPS as data and instruction memory is separate
Data Hazards
Per-block overhead: Valid flag (1-bit) + Tag length
RAW (Read after Write) (Initially, all valid flags are unset)
 Register Writes first, then Reads Blocks in cache: 2M
 Without data forwarding: If dependent cycle is right before: 2 Bytes per block: 2N
cycle delay, 2 cycles before: 1 cycle delay For each Memory Address
 With data forwarding: If dependent cycle is dependent on lw: 1  Set Index = (val mod 2N+M)// 2N
 Word Index = (val mod 2N)//Bytesword
cycle delay otherwise: no delay  Tag = val // 2N+M
 Detect Load-Use hazard when ID/EX.instruction == Load && Set-Associative Cache
(ID/EX.rt == IF/ID.rs || ID/EX.rt == IF/ID.rt) A block maps to a unique set of N possible cache locations
Data Forwarding N-way SAC → N cache blocks per set
 Resolves all RAW Hazards except lw (need one stall) Bytes per block: 2M
Cache bocks = Sizecache / Sizeblock
Performance  sw after lw might not need to stall at all Sets = CacheBlocks / N = 2N
Single Cycle  Forward from EX/MEM to ALU for 1 Fully-Associative Cache
 One Instruction = 1 Clock Cycle  Forward from MEM/WB to ALU for 2
 Clock Cycle Time: Longest Latency Amongst all Control Hazards (Branching/Jumping)
Instructions (usually Lw)  Without ANY Control Measures: 3 cycle delay
 Total Execution Time =  Early branch resolution Move branch decision calculation from
EX/MEM to ID stage – stall 1 cycle instead of 3 (may cause
Number of Instructions x Clock Cycle Time further stall if reg. is written by previous instruction)
Multi Cycle o Involved in RAW with prev inst (not lw): stall 2
cycles Block can be placed anywhere, but need to search all blocks
 One Stage = 1 Clock Cycle No conflict miss anymore. Capacity miss = total miss – cold miss
 Cycle Time Decreases, Clock Frequency Increases o Involved in RAW with prev inst (lw): stall 3 cycles Cache Performance
 Different Instructions take variable number of clock o Not involved in any RAW: stall 1 cycle Larger Block Trade Off:
 Branch prediction (not taken): Guess the outcome and  Spatial Locality Advantage (Hit Rate Increases)
cycles (since not all stages are needed) speculatively execute instructions, if guess wrongly then flush
 Clock Cycle Time: Longest latency amongst all pipeline  Miss Penalty increases due to loading more
Stages  Temporal Locality Disadvantage at certain limit (Miss Rate
o With early branching: 1 cycle occur Increases)
 Total Execution Time = o Without early branching: 3 cycles occur Rule of thumb: Direct-mapped cache of size N has
I x Average CPI x Clock Cycle Time before instructions get flushed/not flushed almost the same miss rate as 2-way set associative cache of
 Delayed branch: X instructions following a branch will always size N/2
Pipeline be executed regardless of outcome (requires compiler re-ordering - Cold/Compulsory miss does not depend on size/associativity
 One Stage = 1 Clock Cycle of instructions to branch-delay slot(s), or add nop instructions)
Try to find un-linked instructions from before the branch. - For same cache size, Conflict miss decreases with increasing associativity
 Clock Cycle Time: Longest latency amongst all - Conflict miss is 0 for FA Cache
Stages + Td (time needed to store into pipeline) o With early branching: shift 1 instruction
- For same cache size, Capacity miss does not depend on associativity
o Without early branching: shift 3 instructions - Capacity miss decreases with increasing size
 Cycles needed for I inst: (I + N – 1) Cache (1GiB = 2 bytes, 1 KiB = 210 bytes)
30

 Total Execution Time = Block replacement policy


 Temporal locality: Same item tends to be re-referenced soon Least recently used (LRU): the usual policy, hard to track
(I + N – 1) x Clock Cycle Time  Spatial locality: Nearby items tend to be referenced soon First in first out (FIFO) – with second chance variant
 If N(instructions) >> N(stages)  Hit rate: fraction of memory accesses that are in cache Random replacement (RR)
 (avg. access time) = (hit rate) × (hit time) + (1 − (hit rate)) × Least frequently used (LFU)
Speedup(pipeline) = (Time(single cycle) / Time(pipe)) ~ N (miss penalty)
Performance
 Cache block/line: smallest unit of transfer between memory and
cache Performance = 1 / ResponseTime
Types of misses: Speedup n, between x and y:
 Cold/Compulsory: when the block has never been accessed
before
 Conflict: same index gets overwritten (direct & set assoc.)
 Capacity: cache cannot contain all blocks (full assoc.) CPU Time = Instructions / Program * Cycles / Instruction * Seconds / Cycle
Write Policy Factors affecting performance: Different compiler (affects Instruction Per
 Write-through: write data both to cache and main memory using Program), Different ISA (affects CPI)
Pipelining (Pipeline register contents) a write buffer to queue memory writes Cannot use CPI to determine performance/time, use total time!
 Write-back: write data to cache; write to main memory when Amdahl’s Law (Performance limited to non-speedup program portion)
 IF/ID: Instruction from memory & PC + 4 block is evicted using a “dirty bit” on each cache block P: % of program time that can be
 ID/EX: Data read from register files, 32-bit Sign Extended Imm, Write miss policy improved
& PC + 4  Write allocate: load block to cache, then follow write policy
 EX/MEM: Imm, & (PC + 4) + (Imm * 4), ALU Result, isZero  Write around: write directly to main memory Boolean Algebra
Signal * RD2 from register file Direct Mapped Cache Precedence of Not > And > Or
 Identity: A + 0 = A and A · 1 = A
 MEM/WB: ALU result, Memory Read Data & Write Register
 Complement: A + A’ = 1 and A · A’ = 0
Data (passed through all pipelines)  Commutative: A + B = B + A and A · B = B · A
Downloaded by Muskan Gupta ([email protected])
lOMoARcPSD|38291606

 Associative: A + (B + C) = (A + B) + C and A · (B · C) = (A · B) ·  Prime implicant: Implicant which is not a subset of any other code
C implicant - Priority Encoder can deal with the garbage inputs by assigning priorities
 Distributive: A + (B · C) = (A + B) · (A + C) and A · (B + C) = to inputs.
(A · B) + (A · C)  Essential prime implicant: Prime implicant with at least one ‘1’
- Add valid bit to
 Duality (not a real law): If we flip AND/OR operators and flip the that is not in any other prime implicant (must show in final eqn) deal with (if nothing
operands (0 and 1), the Boolean equation still holds  Simplified SOP expression – group ‘1’s on K-map switched on)
 Ide mpo tency:X+X=Xa ndX·X=X  Simplified POS expression – find SOP expression using ‘0’s on • Demultiplexer:
 One /Ze roEl eme nt :X+1=1a ndX·0=0 K-map, then negate resulting expression - One input data line
 Inv olution:( X’)’=X  Grouping 2N cells(only power-sizes are allowed) eliminates n - N selection lines
 Abs orption:X+( X· Y)=X variables - Directs data from
X·( X+Y)=X input to a selected
 EPIs are counted only by checking 1s, not Xs output line among
 Abs orption( var i
ant ):X+( X’·Y)=X+Y  K-maps help to obtain canonical SOP, but might not provide the 2N possibilities
X·( X’+Y)=X·Y simplest expression possible (need to use boolean algebra for that) Demultiplexer ≡
 De Mor g ans’(ca nb eus edon>2v ar
iabl
es):( X·Y) ’=X’+Y’ Decoder with enable
(X+Y) ’=X’·Y’ • Multiplexer:
 Cons ens us:(X·Y)+( X’·Z)+( Y·Z)=( X·Y)+( X’·Z) - Selects one of 2n inputs to a single output line, using n selection lines
(X+Y)·( X’+Z)·( Y+Z)=( X+Y)·( X’+Z) - To implement functions with n variables, pass variables to the n-bit selector
Lo gicGa t es and set 2n inputs to
Complete set of logic: Any set of gates appropriate constants from
sufficient for building any boolean function. truth table
 e.g. {AND, OR, NOT} - To implement functions
Lo gicCircuits with n + 1 variables, pass
 e.g. {NAND} (self-sufficient / first n variables to the n-
universal gate) = {Negative OR} Combi nati
onalc i
rc uit
:eachoutputdependsent
ire
lyon
bit selector and set each
 e.g. {NOR} (self-sufficient / presentinputs input appropriately to ‘0’,
Seque nti
alc i
rcuit:e a
choutputdependsonbo t
hpresent ‘1’, Z, or Z’ (Z is the last
universal gate) – only when both
inputsands t
ate variable)
inputs 0 will output be 1
•Hal f-
Adde rC=X·Y,S=X⊕Y
•Ful l-
adde rCout=X· Y+( X⊕Y) Cin,S=X⊕(
· Y⊕Z)=( X⊕Y) ⊕Z
With negated outputs, use NAND to simulate •4- bi
tparall
eladd erbyc a
scadi
ng4f ull-
adder
sviathe
ircar
ri
es
OR and NOR to simulate AND •Adde r-
cum- subtractorneedtoXORt heYwi thS(0/1de p
endingon Larger
add/s ubtract
)andpa s
sinSa sC-in(X–Y=X+( 1s-
Complemento fY)+ Components
1) - Remove a
•Magni t
udeCompar a
tor:input:2uns i
gnedval
uesAa ndB,output:"
A> decoder that gives
B" ,"A=B" ,"A<B" duplicate outputs
(w.r.t another
Circuit Delays decoder) by using
• For each component, time = max(∀tinput) + tcurrent component an OR gate with
• Propagation delay of ripple-carry parallel adders ∝ no. of bits the outputs from
the first decoder,
and the enable input of the second.
ALU Build

MSI Components
SOP 0 IS THE LEAST
SIGNIFICANT
expression – implement using 2-level AND-OR circuit or 2-level NAND INPUT!
circuit • Decoder (n-to-m-
POS expression – implement using 2-level OR-AND circuit or 2-level NOR line decoder):
circuit converts binary
data from n input lines to one of the m ≤ 2n output
lines (i.e. 2 x 4 )
Minterms & Maxterms
- Each output line represents a minterm
 Minterm/Maxterm of n - Active High - Generate minterms and
variables is a use OR on minterms to form a function.
product/sum term that Alternatively, use NOR on maxterms
contains n literals from all the variables -> n variables -> 2n - Active Low – AND the maxterms or
mindterms, 2n maxterms NAND the minterms
 Minterm: m0 = X’· Y’· Z’ - Can add an Enable signal
Larger decoders can be constructed from
 Maxterm: M0 = X + Y + Z smaller ones with an inverter (e.g. 3 x 8
 m0’ = M0 decoder built from 2 x 4)
 Functions can be sum of minterms or product of maxterms • Encoder: opposite of decoder Sequential Circuits
 Sum of 2 distinct Maxterms is 1 - Exactly ONE input should be ‘1’ Self-Correcting: any unused state can transit to a used state after a finite
 Product of 2 distinct minterms is 0 - If more than one input switched one, number of cycles
Kmap then X (don’t care values)
- Position of single active input line Synchronous: outputs change at specific time (with clock)
 Implicant: Product term with all ‘1’ or ‘X’, but with at least one among 2n possibilities is coded as a n-bit Asynchronous: outputs change at any time
‘1’
Downloaded by Muskan Gupta ([email protected])
lOMoARcPSD|38291606

Multivibrator: sequential circuits that operate/swing between


 HIGH and LOW state
 Bistable: 2 stable states (e.g. latch, flip-flop)
 Monostable / one-shot: 1 stable state
 Astable: no stable state (e.g. clock)
Memory element: device that can remember value indefinitely, or change
value on command from its inputs. Same input does not always give same
output!
 Pulse-triggered: activated by +ve/−ve pulses (e.g. latch)
 Edge-triggered: activated by rising/falling edge (e.g. flip-flop)

S-R latch (“Set-Reset”) (High: 2 cross-coupled NOR gates Low: NAND):

Gated S-R latch: Outputs change only when EN is HIGH (AND)


Memorised when EN is LOW

Gated D latch (“Data”): Can build from gated S-R latch (No invalid)

• S-R flip-flop: Similar to gated S-R latch


• D (data) flip-flop: Similar to gated D latch (No invalid Inputs)
• J-K flip-flop: J:“Set”, K:“Reset”, Toggle if both HIGH
• T flip-flop (“Toggle”): J-K flip-flop with tied inputs

J-K Flip Flop: Q and Q’ fed back to NAND gates

T Flip Flop: Tie both inputs of J-K together

Downloaded by Muskan Gupta ([email protected])

You might also like