2024 CSC 541 2.0 Memory Architecture
2024 CSC 541 2.0 Memory Architecture
1 2
3 4
Example: Memory of a 32‐bit computer Example:
A movie record represented in a 32‐bit system
Title (string): The Matrix
Playing time in minutes (integer): 136
Year (integer): 1999
7 8
SRAM DDR (Double Data Rate) SDRAM
• No need to refresh as in DRAM • An upgrade to standard SDRAM
• Cluster of 6 transistors for each bit of storage • Performs 2 transfers per clock cycle
• Very fast, low density, expensive without doubling clock rate
• Contents are retained as long as power is kept on • Low power consumption (2.5v) than
• Used to construct cache memory standard SDRAM
• Available in dual‐channel mode
• Best way to deal with soft errors is to increase system’s fault tolerance • If a (9‐bit) byte has an even no. of 1s, that byte must have an error
(implement ways of detecting and correcting errors) • Cannot tell which bit or bits have changed
• If 2 bits changed, bad byte could pass unnoticed
Techniques used for fault tolerance
• Multiple bit errors in a single byte are very rare
• Parity
• ECC (Error Correcting Code) Example (even parity):
Consider the 8‐bit number: 1 0 0 1 1 0 1 0
Parity checking No. of 1s = 4 (even)
• 9 bits are used in the memory chip to store 1 byte (12% increase in cost) Parity bit = 0
• Extra bit (parity bit) keeps tabs on other 8 bits 9‐bit number = 1 0 0 1 1 0 1 0 0
• Parity can only detect errors, but cannot correct them
ECC
Odd parity standard for error checking
• Successor to parity checking
• Parity generator/checker is a part of CPU or located in a special chip on • Can detect and correct memory errors
motherboard • Only a single bit error can be corrected though it can detect double‐bit
• Parity checker evaluates 8 data bits by adding the no. of 1s in the byte errors
• If an even no. of 1s is found, parity generator creates a 1 and stores it as • Most standard PC processors, chipsets and memory modules do not
the parity bit support ECC
• Used in server systems
• If the sum is odd, parity bit would be 0
11 12
The Hamming Single Error Correcting Code: Steps to calculate Hamming ECC for a byte (4 parity bits):
• Invented by Richard Hamming • Position of parity bits
• No. of additional parity bits are required for single error correction • Bit 1 (0001) check bits are at 1, 3, 5, 7, 9, 11, 13, 15 (rightmost bit of
• 4 check bits for a single byte address is 1 – ie is 0001, 0011, 0101, 0111, 1001, 1011, 1101, 1111)
• 7 check bits for a 4‐byte word • Bit 2 (0010) check bits are at 2, 3, 6, 7, 10, 11, 14, 15 (second bit to the
• 8 check bits for a 8‐byte word right is 1)
• Bit 4 (0100) check bits are at 4‐7, 12‐15 (third bit to the right is 1)
Steps to calculate Hamming ECC for a byte (4 parity bits): • Bit 8 (1000) check bits are 8‐15 (fourth bit to the right is 1)
• Start numbering bits from 1 on the left • Set parity bits to create even parity for each group
• Mark all bit positions that are power of 2 as parity bits (1, 2, 4, 8)
• All other bit positions are used for the data (3, 5, 6, 7, 9, 10, 11, 12)
13 14
Answer:
AMAT = cache access time + miss rate x miss penalty
= 1 + 0.05 x 80
= 5 clock cycles
= 10 ns
21 22
Answer:
TAG: 2 bits, LINE: 3 bits
Cache miss for memory access 11010 Cache miss for memory access 10000
Cache miss for memory access 00011 Cache miss for memory access 10010
23 24
Handling writes: • More complicated than a direct‐mapped cache
Write‐through • n cache entries have to be checked
• Data are written to both the cache and the memory at the same time • 2‐way, 4‐way caches perform well enough to make this extra circuitry
• Memory is up to date, Time consuming worthwhile
Write‐back • Direct‐mapped cache: one‐way set associative
• Data are only written to main memory when it is forced out of cache • A block is placed in exactly one location in the cache
Set associative caches • Fully associative cache: N‐way set associative
• N ‐ total no. of blocks in the cache
• A cache with n possible entries for each address is called an n‐way set
• A block can be placed any location in the cache
associative cache
• 4‐way set associative cache (2048 addresses)
25 26
When a new entry is brought into cache, which of present items should be
discarded?
LRU (Least Recently Used) algorithm is used
• The block that has been unused for the longest time is replaced
CPU time = (CPU clock cycles + Memory stall cycles) x Clock cycle time
Memory stall cycles = No. of misses x Miss penalty
= IC x Misses per instruction x Miss penalty
Example:
Miss rate of instruction cache = 2%
Miss rate of data cache = 4%
Processor has a CPI of 2 without any memory stalls
Miss penalty = 100 cycles for all misses
Frequency of data access = 36%
How much faster is the processor with perfect cache that never misses?
27 28
Answer: Secondary memory
Instruction miss cycles = IC x 0.02 x 100 = 2 x IC • Nonvolatile memory used to store programs / data between runs
Data miss cycles = IC x 0.36 x 0.04 x 100 = 1.44 x IC • Magnetic disk (and SSD) in desktop computers and servers
Memory stall cycles = 2 x IC + 1.44 x Ic = 3.44 x IC • Flash memory in mobile devices
CPU Timecache = (2 + 3.44) x IC x Cycle time = 5.44 x IC x Cycle time
CPU Timeperfect cache = 2 x IC x Cycle time Magnetic disks
Speedup = 5.44/2 = 2.72
• Disk platters are
divided in to
concentric rings
called tracks
• Tracks are divided
into sectors
• Cylinder: same set of
contiguous tracks on
each platter
29 30