Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
Chapter 7
Large and Fast: Exploiting Memory
Hierarchy
Yu-Lun Kuo
Computer Sciences and Information Engineering
University of Tunghai, Taiwan
[email protected]
Major Components of a Computer
Processor Devices
Control Input
Memory
Datapath Output
2
Processor-Memory Performance Gap
µProc
10000 55%/year
100 Processor-Memory
Performance Gap
10 (grows 50%/year)
DRAM
1
7%/year
(2X/10yrs)
Year 3
Introduction
4
Memory Hierarchy
• Memory Hierarchy
– A structure that uses multiple levels of memories;
as the distance form the CPU increase, the size of
the memories and the access time both increase
– Locality + smaller HW is faster = memory hierarchy
• Levels
– each smaller, faster, more expensive/byte than
level below
• Inclusive
– data found in top also found in the bottom
5
Three Primary Technologies
6
Introduction
• Cache memory
– Made by SRAM (Static RAM)
– Small amount of fast and high speed memory
– Sits between normal main memory and CPU
– May be located on CPU chip or module
7
Introduction
• Cache memory
8
A Typical Memory Hierarchy c.2008
9
A Typical Memory Hierarchy
By taking advantage of the principle of locality
Can present the user with as much memory as is available in the
cheapest technology
at the speed offered by the fastest technology
On-Chip Components
Control eDRAM
Cache Cache
Second Secondary
Instr
ITLB DTLB
Main
Level Memory
Datapath
RegFile
Memory
(Disk)
Data
Cache
(DRAM)
(SRAM)
Processor
4-8 bytes (word)
Inclusive– what
is in L1$ is a
Increasing L1$ subset of what is
distance from in L2$ is a
8-32 bytes
the processor subset of what is
in access L2$(block)
in MM that is a
time subset of is in
1 to 4 blocks
SM
Main Memory
Secondary Memory
11
Memory Hierarchy List
• Registers
• L1 Cache
• L2 Cache
• L3 cache
• Main memory
• Disk cache
• Disk (RAID)
• Optical (DVD)
• Tape
12
Why IC and DC need?
13
The Memory Hierarchy: Terminology
Lower Level
To Processor Upper Level Memory
Memory
Blk X
From Processor Blk Y
14
The Memory Hierarchy: Terminology
Lower Level
To Processor Upper Level Memory
Memory
Blk X
From Processor Blk Y
15
How is the Hierarchy Managed?
• registers memory
– by compiler (programmer?)
• cache main memory
– by the cache controller hardware
• main memory disks
– by the operating system (virtual memory)
– virtual to physical address mapping assisted by
the hardware (TLB)
– by the programmer (files)
16
7.2 The basics of Caches
• Simple cache
– The processor requests are each one word
– The block size is one word of data
17
Caches
• Direct Mapped
– Assign the cache location based on the address of
the word in memory
– Address mapping:
(block address) modulo (# of blocks in the cache)
18
Direct Mapped (Mapping) Cache
19
Caches
• Tag
– Contain the address information required to
identify whether a word in the cache corresponds
to the requested word
• Valid bit
– After executing many instructions, some of the
cache entries may still be empty
– Indicate whether an entry contains a valid address
» If valid bit = 0, there cannot be a match for this block
20
Direct Mapped Cache
• Consider the main memory word reference string
Start with an empty cache - all
blocks initially marked as not
0 1 2 3 4 3 4 15
valid
0 miss 1 miss 2 miss 3 miss
00 Mem(0) 00 Mem(0)
00 Mem(0) 00 Mem(0)
00 Mem(1) 00 Mem(1) 00 Mem(1)
00 Mem(2) 00 Mem(2)
00 Mem(3)
28
What happens on a write?
29
What happens on a write?
30
Write Buffer for Write Through
Cache
Processor DRAM
Write Buffer
31
What happens on a write?
32
What happens on a write?
• Write Through
– All writes go to main memory as well as cache
– Multiple CPUs can monitor main memory traffic to
keep local (to CPU) cache up to date
– Lots of traffic
– Slows down writes
• Write Back
– Updates initially made in cache only
– Update bit for cache slot is set when update occurs
– If block is to be replaced, write to main memory only if
update bit is set
– Other caches get out of sync
33
Memory System to Support Caches
Multiplexor
Cache
Cache
Cache
bus
bus
bus
Memory Memory Memory Memory Memory
bank 0 bank 1 bank 2 bank 3
Memory
35
One-word-wide memory organization
Assume
1. A cache block for 4 words
2. 1 memory bus clock cycle to send the address
Cache
36
Wide memory organization
Assume
1. A cache block for 4 words
2. 1 memory bus clock cycle to send the address
3. 15 clock cycles for DRAM access initiated
CPU
4. 1 memory bus clock cycle to return a word of data
– Two word wide
Multiplexor
» 1 + 2 x 15 + 2 x 1 = 33 clock cycles
» 4 x 4 / 33 = 0.48
Cache
– Four word wide
» 1 + 1 x 15 + 1 x 1 = 17 clock cycles bus
» 4 x 4 / 17 = 0.94
Memory
37
Interleaved memory organization
Assume
1. A cache block for 4 words
2. 1 memory bus clock cycle to send the address
3. 15 clock cycles for DRAM access initiated
4. 1 memory bus clock cycle to return a word of data
5. Each memory bank: 1 word wide CPU
39