0% found this document useful (0 votes)
41 views63 pages

06 - Memory System - I

Uploaded by

20jasmine.asami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views63 pages

06 - Memory System - I

Uploaded by

20jasmine.asami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

EECS 112 (Spring 2024)

Organization of Digital Computers

Chapter 05-01
Memory System
Memory Hierarchy and Cache Memory

Hyoukjun kwon
[email protected]
EECS 112 (Spring 2024)
Organization of Digital Computers

Section 1. Locality And Memory Hierarchy

2
Key Trade-off in the Memory Technology
On-chip Memory
Small and fast,
but costly memory

On-chip Memory e.g., Cache memory


Off-chip Memory
Off-chip Memory/Storage
Large and slow,
but cost-effective memory
e.g., DRAM and SSD

How can we enable an illusion


of fast and large memory in a
const-effective way?

Exploit common memory


access patterns in programs!

3
Principle of Locality
§ Key Observation
• Programs access a small proportion of their address space at any time

§ Temporal locality
• Items accessed recently are likely to be accessed again soon
• e.g., instructions in a loop, induction variables

§ Spatial locality
• Items near those accessed recently are likely to be accessed soon
• E.g., sequential instruction access, array data

4
Example C Code
int array_sum (int * ary, int len) {
int sum = 0;

for (int idx=0; idx < len; idx++) {


sum += ary[idx]; 1) Spatial Locality: “ary” is accessed sequentially
} from ary[0] to ary[len-1]
e.g., ary[0] and ary[1] are adjacent in the memory space
return sum;
2) Temporal Locality: “sum” is accessed many
} times over time

Both spatial and temporal locality patterns are easily found in many programs

5
Strategy: Memory Hierarchy

Cache
On-chip Memory Strategy 1
Use small and fast memories for
Off-chip Memory frequently accessed data
DRAM

Flash
Memory Strategy 2
(SSD) Use large and slow memories as
backing storage (for infrequently
accessed data)

6
Utilizing Memory Hierarchy
§ If accessed data is present in upper level
• Hit: access satisfied by upper level
o Hit ratio: hits/accesses

§ If accessed data is absent


Cache
• Miss: block copied from lower level
o Time taken for copying missing data: miss penalty
o Miss ratio: misses/accesses = 1 – hit ratio
• Then accessed data supplied from upper level

DRAM

Upper Level

7
EECS 112 (Spring 2024)
Organization of Digital Computers

Section 2. Cache Memory - Introduction

8
Cache Memory
§ Cache memory
• A small on-chip memory based on SRAM technology
• Closest memory element to the CPU (1 – a few cycles for access) other than the
register file

§ Example: Direct-Mapped Cache with four sets (rows)

Index Valid Tag Data


2’b00 Shared across all data whose address finishes with 2’b00

2’b01 Shared across all data whose address finishes with 2’b01

2’b10 Shared across all data whose address finishes with 2’b10

2’b11 Shared across all data whose address finishes with 2’b11

9
Terminologies

Index Valid Tag Data


Cache Set: A row in a cache 2’b00
(refers to the everything in a row)
2’b01

2’b10

2’b11

Valid bit: One-bit signal showing the


data stored in a block is valid or not Cache Block (== Cache Line):
The basic unit of data storage in cache.
Tag: A unique identifier for a group of Refers to the data payload
data stored in a cache block

10
Terminologies in Addressing
Index Valid Tag Data
0 Block Address Block Offset
1

… … Tag Index Offset


Address (t bits) (k-bits) (b-bits)
K-1

𝑡 = 32 − (𝑘 + 𝑏) 𝑘 = 𝑙𝑜𝑔! 𝐾 𝑏 = 𝑙𝑜𝑔! 𝐵
B-byte block

§ Offset: In which byte within a cache block contain the data?


§ Index: Which row should we utilize?
§ Tag: Remainder of the address bits; useful for distinguishing data with
the same index and offset

11
Example
Index Valid Tag Data
2’b00 Block Address Block Offset
2’b01
Tag Index Offset
2’b10 Address (t bits) (k-bits) (b-bits)
2’b11
𝑏 = 𝑙𝑜𝑔! 32=5
32-byte block (eight-words) 𝑘 = 𝑙𝑜𝑔! 4=2
𝑡 = 32 − (2 + 5)=25

Address: 32’b 0000 0111 1010 0010 1101 1000 1101 1100
Offset: 5’b11100
Index: 2’b10
Tag: 25’b0000011110100010110110001
12
Example
Index Valid Tag Data
2’b00 Block Address Block Offset
2’b01
Tag Index Offset
2’b10 Address (t bits) (k-bits) (b-bits)
2’b11
𝑏 = 𝑙𝑜𝑔! 32=5
32-byte block (eight-words) 𝑘 = 𝑙𝑜𝑔! 4=2
𝑡 = 32 − (2 + 5)=25

Address: 32’b 0000 0111 1010 0010 1101 1000 1101 1100
Offset: 5’b11100
Index: 2’b10 = 2nd row in the cache (note: the number starts from 0)
Tag: 25’b0000011110100010110110001
13
Example
Index Valid Tag Data
2’b00 Block Address Block Offset
2’b01
Tag Index Offset
2’b10 Address (t bits) (k-bits) (b-bits)
2’b11
𝑏 = 𝑙𝑜𝑔! 32=5
32-byte block (eight-words) 𝑘 = 𝑙𝑜𝑔! 4=2
𝑡 = 32 − (2 + 5)=25

Address: 32’b 0000 0111 1010 0010 1101 1000 1101 1100
Offset: 5’b11100 = 28-th byte within a block = 7-th word within a block (note: the number starts from 0)
Index: 2’b10
Tag: 25’b0000011110100010110110001
14
Direct Mapped Cache
§ Location determined by address
§ Direct mapped: only one choice
• Location = (Index) = (Block address) modulo (#Blocks in cache)
Block Address Block Offset

Tag Index Offset


Address (t bits) (k-bits) (b-bits)

n #Blocks is a power of 2
n Block address: Upper-bits of
an address less block offset
Block addresses

15
Block Address Block Offset

Direct-Mapped Cache Example Address


Tag
(t bits)
Index
(k-bits)
Offset
(b-bits)

2 bits 3 bits 2 bit


§ Assumptions: 8-blocks (slots), 1 word/block, direct mapped, 7-bit address

Cache State Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
000 N (Tag) | (Index)
22 (=5’b10110) 10 | 110 Miss 110
001 N
010 N
011 N
100 N
101 N
110 Y 10 Mem[1011000] - Mem[1011011]
111 N

16
Block Address Block Offset

Direct-Mapped Cache Example Address


Tag
(t bits)
Index
(k-bits)
Offset
(b-bits)

2 bits 3 bits 2 bit


§ Assumptions: 8-blocks (slots), 1 word/block, direct mapped, 7-bit address

Cache State Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
000 N (Tag) | (Index)
22 (=5’b10110) 10 | 110 Miss 110
001 N
26 (=5’b11010) 11 | 010 Miss 010
010 Y 11 Mem[1101000] - Mem[1101011]
011 N
100 N
101 N
110 Y 10 Mem[1011000] - Mem[1011011]
111 N

17
Block Address Block Offset

Direct-Mapped Cache Example Address


Tag
(t bits)
Index
(k-bits)
Offset
(b-bits)

2 bits 3 bits 2 bit


§ Assumptions: 8-blocks (slots), 1 word/block, direct mapped, 7-bit address

Cache State Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
000 N (Tag) | (Index)
22 (=5’b10110) 10 | 110 Miss 110
001 N
26 (=5’b11010) 11 | 010 Miss 010
010 Y 11 Mem[1101000] - Mem[1101011]
22 (=5’b10110) 10 | 110 Hit 110
011 N
100 N How to check cache hit
101 N (1) Compute Cache index (110)
110 Y 10 Mem[1011000] - Mem[1011011] (2) Check Cache[110]’s valid bit
(3) If valid, check the tag (10)
111 N
(4) If the tag matches, it’s a hit
(If not, it’s a miss)

18
Block Address Block Offset

Direct-Mapped Cache Example Address


Tag
(t bits)
Index
(k-bits)
Offset
(b-bits)

2 bits 3 bits 2 bit


§ Assumptions: 8-blocks (slots), 1 word/block, direct mapped, 7-bit address

Cache State Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
000 N (Tag) | (Index)
22 (=5’b10110) 10 | 110 Miss 110
001 N
26 (=5’b11010) 11 | 010 Miss 010
010 Y 11 Mem[1101000] - Mem[1101011]
22 (=5’b10110) 10 | 110 Hit 110
011 N
18 (5’b10010) 10 | 010 Miss 010
100 N
101 N It’s a miss because the tag doesn’t match!
110 Y 10 Mem[1011000] - Mem[1011011]
Cache[010] is already occupied!
111 N

Default Policy: Evict old value and keep the new value

19
Block Address Block Offset

Direct-Mapped Cache Example Address


Tag
(t bits)
Index
(k-bits)
Offset
(b-bits)

2 bits 3 bits 2 bit


§ Assumptions: 8-blocks (slots), 1 word/block, direct mapped, 7-bit address

Cache State Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
000 N (Tag) | (Index)
22 (=5’b10110) 10 | 110 Miss 110
001 N
26 (=5’b11010) 11 | 010 Miss 010
010 Y 10 Mem[1001000] - Mem[1001011]
22 (=5’b10110) 10 | 110 Hit 110
011 N
18 (5’b10010) 10 | 010 Miss 010
100 N
101 N
Cache[010] is replaced with a new data
110 Y 10 Mem[1011000] - Mem[1011011]
111 N

20
Cache Data Path
§ Address decoding
• Extract tag and index

§ Hit/Miss check logic


• Check if the following conditions
o Valid bit is 1
o Cache[idx].tag == address.tag

§ Data access logic


• Read the entire cache block
o Even if you need only one word within a
block, you must read the entire block first
o After reading a block, you access the word
you need within a block

21
Block Address Block Offset

Cache Size Address


Tag
(t bits)
Index
(k-bits)
Offset
(b-bits)

18 bits 10 bits 4 bit


Example: Total Size of a Cache
§ We have a direct-mapped cache with 16 KiB data and four-word blocks.
Then, what is the total size of the cache (data + all metadata)?
§ Valid bit size
• Because the cache has 16KiB data and four-word blocks (16 Bytes), the cache has 16KiB / 16B = 1024 rows.
• We need one valid bit for each row
• Therefore, the total size of valid bits is 1024 bits = 128 B = 0.125 KiB
§ Tag bit size
• Because the cache has four-word blocks (16 Bytes), the block offset bits are 𝑙𝑜𝑔! 16 = 4 bits.
• Because the cache has 1024 rows, index bits are 𝑙𝑜𝑔! 1024=10 bits
• Because the address is 32-bit in RV32I, the tag bits are 32 − 10 + 4 = 18 bits
• Because the cache has 1024 rows, the total tag bit size is 1024×18 = 18432bits = 2304 Bytes = 2.25 KiB

Total size = total valid bit size + total tag bit size + total data bit size <Note>
= 0.125KiB + 2.25 KiB + 16KiB = 18.375 KiB KiB = 1024 Bytes
KB = 1000 Bytes

22
Trade-off of Large Block Size
§ Pros: Large blocks can reduce miss rate
• Due to spatial locality
§ Cons 1: Cache Pollution
• The total size of cache is fixed => less number of rows in a cache with a large block
o More competition to each set Þ can increase the miss rate
• Accessing small data can evict large amount of useful data (cache pollution)

§ Cons 2: Large Cache Miss Penalty


• Large blocks require more cycles than small blocks to fill
• Optimizations
• Early-restart: Do not wait for entire cache block. As soon as
the target word arrives, resume the execution
• Critical word-first: Load the target word first and then other
words in the corresponding cache block

23
Operations on Cache Misses
§ On cache hit, CPU proceeds normally
• Memory stage can operate within one cycle (or a few)

§ On cache miss
• Step 1) Stall the CPU pipeline and wait for memory
• Step 2) Fetch block from next level of memory hierarchy

• Step 3 – Case 1) Instruction Cache Miss (Cache miss at the IF stage)


o After fetching instructions into the instruction cache, restart instruction execution (IF stage)

• Step 3 – Case 2) Data cache miss (Cache miss at the MEM stage)
o After fetching data into the data cache, complete data access in the MEM stage and resume the execution

24
EECS 112 (Spring 2024)
Organization of Digital Computers

Section 3. Cache Memory – Write Policy

25
Data Cache Read and Write Hits
§ Read Hit
• When the processor is executing a load instruction
• The data exist in the data cache

§ Write Hit
• When the processor is executing a store instruction
• The data exist in the data cache
• Update the existing data with a newer version

Potential Issue on Write Hit: Inconsistent values across cache and main memory

26
Write Policies on Cache Hit
§ Write-through
• Write the new value on both cache and main memory
• Pros: No value inconsistency across cache and main memory
• Cons: All writes involve costly main memory accesses
o Solution: Use “Write buffer.” The processor resume the execution while data
inside write buffer is being written onto main memory

§ Write-back
• Only update the data cache
• Update the main memory value when a “dirty” cache line is evicted
o “Dirty”: Indicates that a new data is written
• Pros: Fast; only access the cache
• Cons: Inconsistent values across cache and memory

27
Write Policies on Cache Miss
§ Write-allocate
• Fetch the data into the cache first, then perform write (follow the write-
hit policy afterwards)
o Well-suited with the write-back policy

§ Write-no-allocate
• When cache miss occurs on write, directly write to the main memory
o Write-back is not well-aligned with this approach

28
Cache Example: Intrinsity FastMATH
16KiB: 256 blocks × 16 words/block

29
Example: Intrinsity FastMATH
§ Embedded MIPS processor
• 12-stage pipeline
• Instruction and data access on each cycle

§ Split cache: separate I-cache and D-cache


• Each 16KB: 256 blocks × 16 words/block
• D-cache: write-through or write-back

§ SPEC2000 miss rates


• I-cache: 0.4%
• D-cache: 11.4%
• Weighted average: 3.2%

30
EECS 112 (Spring 2024)
Organization of Digital Computers

Section 3. Set Associative Cache

31
Problem of the Direct-Mapped Cache
§ Assumptions: 8-blocks (slots), 1 word/block, direct mapped, 7-bit address

Cache State Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
000 N (Tag) | (Index)
6 (=5’b00110) 00 | 110 Miss 110
001 N
010 N
011 N
100 N
101 N
110 Y 00 Mem[0011000] - Mem[0011011]
111 N Block Address Block Offset

Tag Index Offset


Address (t bits) (k-bits) (b-bits)

2 bits 3 bits 2 bit

32
Problem of the Direct-Mapped Cache
§ Assumptions: 8-blocks (slots), 1 word/block, direct mapped, 7-bit address

Cache State Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
000 N (Tag) | (Index)
6 (=5’b00110) 00 | 110 Miss 110
001 N
14 (=5’b01110) 01 | 110 Miss 110
010 N
011 N
100 N
101 N
110 Y 01 Mem[0111000] - Mem[0111011] Evict old data and write new data
111 N Block Address Block Offset

Tag Index Offset


Address (t bits) (k-bits) (b-bits)

2 bits 3 bits 2 bit

33
Problem of the Direct-Mapped Cache
§ Assumptions: 8-blocks (slots), 1 word/block, direct mapped, 7-bit address

Cache State Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
000 N (Tag) | (Index)
6 (=5’b00110) 00 | 110 Miss 110
001 N
14 (=5’b01110) 01 | 110 Miss 110
010 N
22 (=5’b10110) 10 | 110 Miss 110
011 N
100 N
101 N
110 Y 10 Mem[1011000] - Mem[1011011] Evict old data and write new data
111 N Block Address Block Offset

Tag Index Offset


Address (t bits) (k-bits) (b-bits)
Even if the cache has many empty slots, we cannot utilize them
2 bits 3 bits 2 bit

What is the source of the problem? 34


Associative Caches
Index Valid Tag Data
3’b000
Index Valid Tag Data Valid Tag Data
3’b001
2’b00
3’b010
2’b01
3’b011
2’b10
3’b100
2’b11
3’b101

3’b110 Way 0 Way 1


3’b111

§ Main Idea
• Employ multiple entries for each cache index (row)
• Reduce the number of sets (cache rows) to increase the number of ways with the same
data size in cache
§ N-way Associative Cache
• The number of ways = N
• The number of sets (rows) = 1/N of that of the direct-mapped cache with the same data size

35
2-Way Set Associative Cache
§ Assumptions: 4-blocks (slots), 1 word/block, 2-way associative, 7-bit address

Cache State (Way 0) Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
00 N (Tag) | (Index)
6 (=5’b00110) 001 | 10 Miss 10
01 N
10 Y 001 Mem[0011000]-Mem[0011011]
11 N

Cache State (Way 1)

Index V Tag Data (1 word = 4 Byte)


00 N Block Address Block Offset

01 N
Tag Index Offset
Address (t bits) (k-bits) (b-bits)
10 N
11 N 3 bits 2 bits 2 bit

36
2-Way Set Associative Cache
§ Assumptions: 4-blocks (slots), 1 word/block, 2-way associative, 7-bit address

Cache State (Way 0) Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
00 N (Tag) | (Index)
6 (=5’b00110) 001 | 10 Miss 10
01 N
14 (=5’b01110) 011 | 10 Miss 10
10 Y 001 Mem[0011000]-Mem[0011011]
11 N

Cache State (Way 1)

Index V Tag Data (1 word = 4 Byte)


00 N Block Address Block Offset

01 N
Tag Index Offset
Address (t bits) (k-bits) (b-bits)
10 Y 011 Mem[0111000]-Mem[0111011]
11 N 3 bits 2 bits 2 bit

37
2-Way Set Associative Cache
§ Assumptions: 4-blocks (slots), 1 word/block, 2-way associative, 7-bit address

Cache State (Way 0) Access History

Index V Tag Data (1 word = 4 Byte) Binary Address


Block Address Hit/miss Cache Index
00 N (Tag) | (Index)
6 (=5’b00110) 001 | 10 Miss 10
01 N
14 (=5’b01110) 011 | 10 Miss 10
10 Y 101 Mem[1011000]-Mem[1011011]
22 (=5’b10110) 101 | 10 Miss 10
11 N

Cache State (Way 1) Evict and write new data


Index V Tag Data (1 word = 4 Byte)
00 N
Cache utilization: 2X compared to the
direct-mapped cache
01 N
10 Y 011 Mem[0111000]-Mem[0111011]
How can we improve this further?
11 N

38
Set Associative Cache
§ Idea: Reduce cache misses by more flexible placement of blocks

§ n-way set associative


• Each set contains n entries (ways)
• Search all n entries in a set at once
o Requires n comparators for valid bit check and tag matching (light-weighted in
terms of silicon area and power)
o Typically, n is small (e.g., 8, 16, and 32)

§ Fully associative (When n == number of cache lines)


• Full flexibility; Allow a given block to go in any cache entry
• Requires all entries to be searched at once
o Requires comparators for each entry (expensive); thousands of comparators
when the cache size is large

39
Direct-Mapped vs Set-Associative Caches

40
Spectrum of Associativity
§ For a cache with 8 entries

Given an area (cache data size), change


the aspect ratio (# sets x # ways)

41
Associativity Example
§ Compare 4-block caches
• Direct mapped, 2-way set associative,
fully associative
• Block access sequence: 0, 8, 0, 6, 8

§ Direct mapped
Block Cache Hit/miss Cache content after access
address index 0 1 2 3
0 0 miss Mem[0]
8 0 miss Mem[8]
time 0 0 miss Mem[0]
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]

42
Associativity Example
§ 2-way set associative
Block Cache Hit/miss Cache content after access
address index Set 0 Set 1
0 0 miss Mem[0]
8 0 miss Mem[0] Mem[8]
0 0 hit Mem[0] Mem[8]
6 0 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]

§ Fully-associative
Block Hit/miss Cache content after access
address
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]

43
How Much Associativity Is Desired?
§ Increased associativity decreases miss rate
• But with diminishing returns
§ Simulation of a system with 64KB D-cache, 16-word blocks,
SPEC2000
• 1-way: 10.3%
• 2-way: 8.6%
• 4-way: 8.3%
• 8-way: 8.1%

44
Set Associative Cache Datapath

45
Replacement Policy
§ Direct mapped: no choice
§ Set associative
• Prefer non-valid entry, if there is one
• Otherwise, choose among entries in the set
§ Least-recently used (LRU)
• Choose the one unused for the longest time
o Simple for 2-way, manageable for 4-way, too hard beyond that
§ Random
• Gives approximately the same performance as LRU for high
associativity

46
EECS 112 (Spring 2024)
Organization of Digital Computers

Section 4. Performance with Cache

47
Main Memory Supporting Caches
§ Use DRAMs for main memory
• Fixed width (e.g., 1 word)
• Connected by fixed-width clocked bus
o Bus clock is typically slower than CPU clock
§ Example cache block read
• 1 bus cycle for address transfer
• 15 bus cycles per DRAM access
• 1 bus cycle per data transfer
§ For 4-word block, 1-word-wide DRAM
• Miss penalty = 1 + 4×15 + 4×1 = 65 bus cycles
• Bandwidth = 16 bytes / 65 cycles = 0.25 B/cycle

48
Measuring Cache Performance
§ Components of CPU time
• Program execution cycles
o Includes cache hit time
• Memory stall cycles
o Mainly from cache misses
§ With simplifying assumptions:

Memory stall cycles


Memory accesses
= ´ Miss rate ´ Miss penalty
Program
Instructio ns Misses
= ´ ´ Miss penalty
Program Instructio n
49
Cache Performance Example
§ Given
• I-cache miss rate = 2%
• D-cache miss rate = 4%
• Miss penalty = 100 cycles
• Base CPI (ideal cache) = 2
• Load & stores are 36% of instructions
§ Average miss cycles per instruction
• I-cache: 0.02 × 100 = 2
• D-cache: 0.36 × 0.04 × 100 = 1.44
§ Actual CPI = 2 + 2 + 1.44 = 5.44
• Ideal CPU is 5.44/2 =2.72 times faster

50
Average Memory Access Time
§ Hit time is also important for performance
§ Average memory access time (AMAT)
• AMAT = Hit time + Miss rate × Miss penalty

§ Example
• CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 cycles, I-cache
miss rate = 5%
• AMAT = 1 + 0.05 × 20 = 2ns
o 2 cycles per instruction

51
AMAT Example
§ Clock rate is 1 GHz, has two levels of cache: L1 and L2.
• L1: miss rate 3%, 1-cycle access time
• L2: miss rate 1%, 18-cycle access, miss penalty = 200 cycles

§ AMAT = 1 + 0.03 × (18 + 0.01 × 200) = 1.6 ns

52
Performance Summary
§ When CPU performance increased
• Miss penalty becomes more significant
§ Decreasing base CPI
• Greater proportion of time spent on memory stalls
§ Increasing clock rate
• Memory stalls account for more CPU cycles
§ Can’t neglect cache behavior when evaluating system
performance

53
EECS 112 (Spring 2024)
Organization of Digital Computers

Section 5. DRAM and Storage

54
DRAM Technology
address (column) Input-output data
§ Data stored as a charge in a capacitor …

• Single transistor used to access the charge


• Must periodically be refreshed
o Read contents and write back
o Performed on a DRAM “row”

address


(row)

Transistor
(as a switch)

<DRAM Cell Architecture>

55
DRAM Technology
§ Data stored as a charge in a capacitor
• Single transistor used to access the charge
• Must periodically be refreshed
o Read contents and write back
o Performed on a DRAM “row”

56
Advanced DRAM Organization
§ Bits in a DRAM are organized as a rectangular array
• DRAM accesses an entire row
• Burst mode: supply successive words from a row with reduced latency
§ Double data rate (DDR) DRAM
• Transfer on rising and falling clock edges
§ Quad data rate (QDR) DRAM
• Separate DDR inputs and outputs

57
DRAM Generations
Year Capacity $/GB
1980 64 Kibibit $6,480,000
1983 256 Kibibit $1,980,000
1985 1 Mebibit $720,000
1989 4 Mebibit $128,000
1992 16 Mebibit $30,000
1996 64 Mebibit $9,000
1998 128 Mebibit $900
2000 256 Mebibit $840
2004 512 Mebibit $150
2007 1 Gibibit $40
2010 2 Gibibit $13
2012 4 Gibibit $5
2015 8 Gibibit $7 𝑡RAC : Random access time, time required to read any random single memory cell
𝑡CAC : Column or page access time, time required to get data from existing row
2018 16 Gibibit $6

58
DRAM Performance Factors
§ Row Buffer
• A small buffer used for temporarily storing the lastly accessed data
• The size of the row buffer is for multiple words (size of a row in DRAM)

§ Burst Access
• Allows reading consecutive data without sending individual address
• Improves bandwidth

§ DRAM Banking
• Deploy multiple DRAM chips and read/write simultaneously
• Improves bandwidth

59
4 words x 1 bank vs 1 word x 4 banks

One- Four-
word word Assumtions:
wide wide • 1 cycle: send address to RAM
• 15 cycles: RAM access latency
§ 4-word wide memory • 1 cycle: return data from RAM
• Cache Miss Penalty = 1 + 15 + 1 = 17 bus cycles
• Bandwidth = 16 bytes / 17 cycles = 0.94 B/cycle
• Disadvantage: cost of wider buses
§ 4-bank interleaved memory
• Cache Miss penalty = 1 + 15 + 4×1 = 20 bus cycles
• Bandwidth = 16 bytes / 20 cycles = 0.8 B/cycle
• Benefit: overlapping the latencies of accessing each word
60
Flash Storage
§ Nonvolatile semiconductor storage
• 100× – 1000× faster than disk
• Smaller, lower power, more robust
• But more $/GB (between disk and DRAM)

66
Flash Types
§ NOR flash: bit cell like a NOR gate
• Random read/write access
• Used for instruction memory in embedded systems
§ NAND flash: bit cell like a NAND gate
• Denser (bits/area), but block-at-a-time access
• Cheaper per GB
• Used for USB keys, media storage, …
§ Flash bits wears out after 1000’s to 100k’s of accesses
• Not suitable for direct RAM or disk replacement
• Wear leveling: remap data to less used blocks

67
Disk Storage

§ Nonvolatile, rotating magnetic storage

68

You might also like