Microprocessor System Design: Error Correcting Codes Principle of Locality Cache Architecture
Microprocessor System Design: Error Correcting Codes Principle of Locality Cache Architecture
Zeshan Chishti
Electrical and Computer Engineering Dept
Maseeh College of Engineering and Computer Science
ECE 485/585
Error Correction
Motivation
Failures/time proportional to number of bits
As DRAM cells size & voltages shrink, more vulnerable
Why was/is this not issue on your PC?
Failure rate was low
Few consumers would know what to do anyway
DRAM banks too large – so much memory that not likely to encounter an error
Servers (always) correct memory system errors (e.g. usually use ECC)
Sources
Alpha particles (impurities in IC manufacturing)
Cosmic rays (vary with altitude)
Bigger problem in Denver and on space-bound electronics
Noise
Need to handle failures throughout memory subsystem
DRAM chips, module, bus
DRAM chips don’t incorporate ECC
Store the ECC bits in DRAM alongside the data bits
Chipset (or integrated controller) handles ECC
ECE 485/585
Error Detection: Parity
ECE 485/585
Error Correction Codes (ECC)
Single bit error correction
requires n+1 check bits for 2n data bits
ECE 485/585
Error Correction Codes (ECC)
=1^0^0^0 = 1
1
ECE 485/585
Error Correction Codes (ECC)
An example: decoding and verifying
1
Sent ->
Recv’d ->
1
=1^0^0^0 = 1
1
1
R1011
ECE 485/585
Error Correction Codes (ECC)
Add another check bit – SECDED
Single Error Correction Double Error Detection
ECE 485/585
Error Correction Codes (ECC)
ECE 485/585
Cache Topics
Cache Basics
Memory vs. Processor Performance
The Memory Hierarchy
Registers, SRAM, DRAM, Disk
Spatial and Temporal Locality
Cache Hits and Misses
Cache Organization
Direct Mapped Caches
Two-Way, Four-Way Caches
Fully Associative (N-Way) Caches
Sector-mapped caches
Cache Line Replacement Algorithms
Cache Performance and Performance improvements
Cache Coherence
Intel Cache Evolution
Multicore Caches
Cache Design Issues
ECE 485/585
The Problem: Memory Wall
10,000
1,000
Performance CPU
100
10
Memory
Year
ECE 485/585
From Hennessy & Patterson, Computer Architecture: A Quantitative Approach (4 th edition)
Memory System Design Tradeoffs
SRAM
Complex basic cell circuit => fast access, but high cost per bit
DRAM
Simpler basic cell circuit => less cost per bit, but slower than
SRAMs
Flash memory and Magnetic disks
DRAMs provide more storage than SRAM but less than what is
necessary
Disks provide a large amount of storage, but are much slower
than DRAMs
No single memory technology can provide both large capacity
and fast speed at an affordable cost
ECE 485/585
A Solution: Memory Hierarchy
From Hennessy & Patterson, Computer
Architecture: A Quantitative Approach (4th
edition)
Processor
Control Tertiary
Secondary Storage
Storage (Tape)
Main (Disk)
Third Memory
Second Level
Registers
(DRAM)
Datapath Level Cache
On-Chip
Cache
Cache (SRAM)
(SRAM)
ECE 485/585
Intel Pentium 4 3.2 GHz Server
ECE 485/585
How is the Hierarchy Managed?
Registers Memory
Compiler
Programmer
Cache Memory
Hardware
Memory Disk
Operating System (Virtual Memory: Paging)
Programmer (File System)
ECE 485/585
Principle of Locality
Analysis of programs indicates that many instructions in localized
areas of a program are executed repeatedly during some period of
time, while other instructions are executed relatively less
frequently
These frequently executed instructions may be the ones in a loop,
nested loop or few procedures calling each other repeatedly
This is called “locality of reference”
ECE 485/585
Use of a Cache Memory
• A cache is a small but fast SRAM inserted between processor and main
memory
• Data in a cache is organized at the granularity of cache blocks
• When the processor issues a request for a memory address, an entire
block (e.g., 64 bytes) is transferred from the main memory to the cache
• Later references to same address can be serviced by the cache (temporal
locality)
• References to other addresses in this block can also be serviced by the
cache (spatial locality)
Higher locality => More requests serviced by the cache
ECE 485/585
Caching – Student Advising Analogy
ECE 485/585
Cache Organization
How is the Cache laid out?
The cache is made up of a number of cache lines
(sometimes called blocks)
Data is hauled into the cache from memory in “chunks”
(may be smaller than a line)
If CPU requests 4 bytes of data, cache gets entire line
(32/64/128 bytes)
Spatial locality says you’re likely to need that data anyway
Incur cost only once rather than each time CPU needs piece of
data
Ex: The Pentium P4 Xeon’s Level 1 Data Cache:
Contains 8K bytes
The cache lines are each 64 bytes
This gives 8192 bytes / 64 bytes = 128 cache lines
ECE 485/585
Simple Direct Mapped Cache
31 address 43 0
index
4
data
0
1
2
3 set
4
5
cache 6
lines 7
8
(“sets”) 9
10
11
12
13
14
15
Use least significant 4 bits to determine which slot to cache data in
But…228 different addresses could have their data cached in the same spot
ECE 485/585
Simple Direct Mapped Cache (cont’d)
31 address 43 0
tag index
4
valid tag data
0
1
2
3
4
5 set
6
7
8
9
10
11
12
13
14
15
Need to store tag to be sure the data is for this address and not another
(Only need to store the address minus the index bits – 28 bits in this example)
ECE 485/585
Cache Hits and Misses
• When the processor needs to access some data, that data may or
may not be found in the cache
• If the data is found in the cache, it is called a cache hit
• Read hit:
• The processor reads data from cache and does not need to go to
memory
• Write hit:
Cache has a replica of the contents of main memory, both cache
and main memory need to be updated
ECE 485/585
Cache Behavior – Reads
Read behavior
ECE 485/585
Cache Behavior - Writes
Policy decisions for all writes
Write Through
Replace data in cache and memory
Requires write buffer to be effective
Allows CPU to continue w/o waiting for DRAM
Write Back
Replace data in cache only
Requires addition of “dirty” bit in tag/valid memory
Write back on when:
Cache flush is performed
Line becomes victim and is cast out
ECE 485/585
Write Buffer for Write-Through
A Write Buffer is needed between cache and memory if
using Write Through policy to avoid having processor wait
Processor writes data into the cache and the write buffer
Memory controller write contents of the buffer to memory
Write Buffer is just a FIFO
Intel: “posted write buffer” PWB
Small depth
Store frequency << 1/DRAM write cycle
Cache
Processor DRAM
Write Buffer
ECE 485/585
Cache Behavior – Writes
Write behavior
bytes to be written
D V Tag
cache line
ECE 485/585
Casting Out a Victim
Depends upon policies
Write Through
Data in cache isn’t the only current copy (memory is up to date)
Just over-write victim cache line with new cache line (change tag
bits)
Write Back
Must check dirty bit to see if victim cache line is modified
If so, must write the victim cache line back to memory
Can lead to interesting behavior
A CPU “read” can cause memory “write” followed by “read”
Write back dirty cache line (victim)
Read new cache line
A CPU “write” can cause a memory “write” followed by a
“read”
Write back dirty cache line (victim)
Read new cache line into which data will be written in cache
ECE 485/585