Welcome To Part 3: Memory Systems and I/O
Welcome To Part 3: Memory Systems and I/O
Welcome To Part 3: Memory Systems and I/O
Weve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? We will now focus on memory issues, which are frequently bottlenecks that limit the performance of a system. Well start off by looking at memory systems for the next two weeks.
Processor
Memory
Input/Output
COMPUTER
Cache introduction
Today well answer the following questions. What are the challenges of building big, fast memory systems? What is a cache? Why caches work? (answer: locality) How are caches organized?
Where do we put things -and- how do we find them?
COMPUTER
COMPUTER
Small or slow
Unfortunately there is a tradeoff between speed, cost and capacity.
Storage Static RAM Dynamic RAM Hard disks Speed Fastest Slow Slowest Cost Expensive Cheap Cheapest Capacity Smallest Large Largest
Fast memory is too expensive for most people to buy a lot of. But dynamic memory has a much longer delay than other functional units in a datapath. If every lw or sw accessed dynamic memory, wed have to either increase the cycle time or stall frequently. Here are rough estimates of some current storage parameters.
Storage Static RAM Dynamic RAM Hard disks
SCIENCE & ENGINEERING
COMPUTER
Level 1
Level 2
Level n
COMPUTER
Introducing caches
CPU
Introducing a cache a small amount of fast, expensive memory. The cache goes between the processor and the slower, dynamic main memory. It keeps a copy of the most frequently used data from the main memory. Memory access speed increases overall, because weve made the common case faster. Reads and writes to the most frequently used addresses will be serviced by the cache. We only need to access the slower main memory for less frequently used data.
COMPUTER
COMPUTER
Each instruction will be fetched over and over again, once on every loop iteration.
COMPUTER
Commonly-accessed variables can sometimes be kept in registers, but this is not always possible. There are a limited number of registers. There are situations where the data must be kept in memory, as is the case with shared or dynamically-allocated memory.
COMPUTER
Nearly every program exhibits spatial locality, because instructions are usually executed in sequence if we execute an instruction at memory location i, then we will probably also execute the next instruction, at memory location i+1. Code fragments such as loops exhibit both temporal and spatial locality.
COMPUTER
10
COMPUTER
11
CPU
12
CPU
COMPUTER
13
COMPUTER
14
COMPUTER
15
8-bit data
COMPUTER
16
COMPUTER
17
Index 0 1 2 3
COMPUTER
18
COMPUTER
19
or least-significant bits
An equivalent way to find the placement of a memory address in the cache is to look at the least significant k bits of the address. Memory With our four-byte cache we would inspect the two Address 0000 least significant bits of 0001 our memory addresses. 0010 0011 Again, you can see that 0100 Index address 14 (1110 in binary) 0101 00 maps to cache block 2 0110 01 0111 (10 in binary). 10 1000 11 Taking the least k bits of 1001 1010 a binary value is the same 1011 as computing that value 1100 k mod 2 . 1101
1110 1111 COMPUTER
20
21
Adding tags
We need to add tags to the cache, which supply the rest of the address bits to let us distinguish between different memory locations that map to the same cache block.
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 COMPUTER
Index 00 01 10 11
Tag 00 ?? 01 01
Data
22
Adding tags
We need to add tags to the cache, which supply the rest of the address bits to let us distinguish between different memory locations that map to the same cache block.
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 COMPUTER
Index 00 01 10 11
Tag 00 11 01 01
Data
23
COMPUTER
24
So the cache contains more than just copies of the data in memory; it also has bits to help us find data within the cache and verify its validity.
COMPUTER
25
So the cache contains more than just copies of the data in memory; it also has bits to help us find data within the cache and verify its validity.
COMPUTER
26
To CPU
Tag
Hit
COMPUTER
27
COMPUTER
28
Valid
Tag
Data
COMPUTER
29
COMPUTER
30
Summary
Today we studied the basic ideas of caches. By taking advantage of spatial and temporal locality, we can use a small amount of fast but expensive memory to dramatically speed up the average memory access time. A cache is divided into many blocks, each of which contains a valid bit, a tag for matching memory addresses to cache contents, and the data itself. Next week well look at some more advanced cache organizations and see how to measure the performance of memory systems.
COMPUTER
31