Onur 740 Fall11 Lecture25 Mainmemory
Onur 740 Fall11 Lecture25 Mainmemory
Computer Architecture
Lecture 25: Main Memory
3
4
DRAM BANKS
DRAM INTERFACE
Main Memory in the System
DRAM MEMORY
CORE 1
CORE 3
CONTROLLER
L2 CACHE 1 L2 CACHE 3
L2 CACHE 0 L2 CACHE 2
CORE 2
CORE 0
SHARED L3 CACHE
Memory Bank Organization
n Read access sequence:
4. Decode column
address & select subset
of row
• Send to output
5. Precharge bit-lines
• For next access
5
SRAM (Static Random Access Memory)
Read Sequence
row select 1. address decode
2. drive row select
3. selected bit-cells drive bitlines
_bitline
bitline
8
Memory
subsystem
organiza1on
• Memory
subsystem
organiza1on
– Channel
– DIMM
– Rank
– Chip
– Bank
– Row/Column
Memory
subsystem
“Channel”
DIMM
(Dual
in-‐line
memory
module)
Processor
Side view
Side view
<0:63> <0:63>
Memory
channel
DIMM
&
Rank
(from
JEDEC)
Breaking
down
a
Rank
. . .
Chip 0
Chip 1
Chip
7
Rank
0
<56:63>
<8:15>
<0:7>
<0:63>
Data
<0:63>
Breaking
down
a
Chip
Chip
0
Bank
0
<0:7>
<0:7>
<0:7>
<0:7>
...
<0:7>
Breaking
down
a
Bank
2kB
1B
(column)
row 16k-‐1
...
Bank
0
row
0
<0:7>
Row-‐buffer
1B
1B
1B
...
<0:7>
Memory
subsystem
organiza1on
• Memory
subsystem
organiza1on
– Channel
– DIMM
– Rank
– Chip
– Bank
– Row/Column
Example:
Transferring
a
cache
block
Physical
memory
space
0xFFFF…F
Channel
0
...
DIMM 0
0x40
to
Rank
0
p ed
64B
Map
cache
block
0x00
Example:
Transferring
a
cache
block
Physical
memory
space
Chip
0
Chip
1
Chip
7
Rank
0
0xFFFF…F
.
.
.
...
<56:63>
<8:15>
<0:7>
0x40
64B
Data
<0:63>
cache
block
0x00
Example:
Transferring
a
cache
block
Physical
memory
space
Chip
0
Chip
1
Chip
7
0xFFFF…F
Rank
0
Row
0
.
.
.
Col
0
...
<56:63>
<8:15>
<0:7>
0x40
64B
Data
<0:63>
cache
block
0x00
Example:
Transferring
a
cache
block
Physical
memory
space
Chip
0
Chip
1
Chip
7
0xFFFF…F
Rank
0
Row
0
.
.
.
Col
0
...
<56:63>
<8:15>
<0:7>
0x40
64B
Data
<0:63>
cache
block
8B
0x00
8B
Example:
Transferring
a
cache
block
Physical
memory
space
Chip
0
Chip
1
Chip
7
0xFFFF…F
Rank
0
Row
0
.
.
.
Col
1
...
<56:63>
<8:15>
<0:7>
0x40
64B
Data
<0:63>
cache
block
8B
0x00
Example:
Transferring
a
cache
block
Physical
memory
space
Chip
0
Chip
1
Chip
7
0xFFFF…F
Rank
0
Row
0
.
.
.
Col
1
...
<56:63>
<8:15>
<0:7>
0x40
64B
8B
Data
<0:63>
cache
block
8B
0x00
8B
Example:
Transferring
a
cache
block
Physical
memory
space
Chip
0
Chip
1
Chip
7
0xFFFF…F
Rank
0
Row
0
.
.
.
Col
1
...
<56:63>
<8:15>
<0:7>
0x40
64B
8B
Data
<0:63>
cache
block
8B
0x00
A
64B
cache
block
takes
8
I/O
cycles
to
transfer.
During
the
process,
8
columns
are
read
sequenUally.
Page Mode DRAM
n A DRAM bank is a 2D array of cells: rows x columns
n A “DRAM row” is also called a “DRAM page”
n “Sense amplifiers” also called “row buffer”
26
DRAM Bank Operation
Access Address:
(Row 0, Column 0) Columns
(Row 0, Column 1)
(Row 0, Column 85)
Row decoder
(Row 1, Column 0)
Rows
Row address 0
1
Row 01
Row
Empty Row Buffer CONFLICT
HIT !
Column address 0
1
85 Column mux
Data
27
Latency Components: Basic DRAM Operation
n CPU → controller transfer time
n Controller latency
q Queuing & scheduling delay at the controller
q Access converted to basic commands
n Controller → DRAM transfer time
n DRAM bank latency
q Simple CAS is row is “open” OR
q RAS + CAS if array precharged OR
q PRE + RAS + CAS (worst case)
n DRAM → CPU transfer time (through controller)
28
A DRAM Chip and DIMM
n Chip: Consists of multiple banks (2-16 in Synchronous DRAM)
n Banks share command/address/data buses
n The chip itself has a narrow interface (4-16 bits per read)
30
A 64-bit Wide DIMM
Command Data
31
A 64-bit Wide DIMM
n Advantages:
q Acts like a high-
capacity DRAM chip
with a wide
interface
q Flexibility: memory
controller does not
need to deal with
individual chips
n Disadvantages:
q Granularity:
Accesses cannot be
smaller than the
interface width
32
Multiple DIMMs
n Advantages:
q Enables even
higher capacity
n Disadvantages:
q Interconnect
complexity and
energy
consumption
can be high
33
DRAM Channels
35
Multiple Banks (Interleaving) and Channels
n Multiple banks
q Enable concurrent DRAM accesses
q Bits in address determine which bank an address resides in
n Multiple independent channels serve the same purpose
q But they are even better because they have separate data buses
q Increased bus bandwidth
37
Multiple Channels
n Advantages
q Increased bandwidth
q Multiple concurrent accesses (if independent channels)
n Disadvantages
q Higher cost than a single channel
n More board wires
n More pins (if on-chip memory controller)
38
Address Mapping (Single Channel)
n Single-channel system with 8-byte memory bus
q 2GB memory, 8 banks, 16K rows & 2K columns per bank
n Row interleaving
q Consecutive rows of memory in consecutive banks
Row (14 bits) Bank (3 bits) Column (11 bits) Byte in bus (3 bits)
Row (14 bits) High Column Bank (3 bits) Low Col. Byte in bus (3 bits)
8 bits 3 bits
XOR
Bank index
(3 bits)
40
Address Mapping (Multiple Channels)
C Row (14 bits) Bank (3 bits) Column (11 bits) Byte in bus (3 bits)
Row (14 bits) C Bank (3 bits) Column (11 bits) Byte in bus (3 bits)
Row (14 bits) Bank (3 bits) C Column (11 bits) Byte in bus (3 bits)
Row (14 bits) Bank (3 bits) Column (11 bits) C Byte in bus (3 bits)
Row (14 bits) C High Column Bank (3 bits) Low Col. Byte in bus (3 bits)
8 bits 3 bits
Row (14 bits) High Column C Bank (3 bits) Low Col. Byte in bus (3 bits)
8 bits 3 bits
Row (14 bits) High Column Bank (3 bits) C Low Col. Byte in bus (3 bits)
8 bits 3 bits
Row (14 bits) High Column Bank (3 bits) Low Col. C Byte in bus (3 bits)
8 bits 3 bits
41
Interaction with VirtualàPhysical Mapping
n Operating System influences where an address maps to in
DRAM
Virtual Page number (52 bits) Page offset (12 bits) VA
42
DRAM Refresh (I)
n DRAM capacitor charge leaks over time
n The memory controller needs to read each row periodically
to restore the charge
q Activate + precharge each row every N ms
q Typical N = 64 ms
n Implications on performance?
-- DRAM bank unavailable while refreshed
-- Long pause times: If we refresh all rows in burst, every 64ms
the DRAM will be unavailable until refresh ends
n Burst refresh: All rows refreshed immediately after one
another
n Distributed refresh: Each row refreshed at a different time,
at regular intervals
43
DRAM Refresh (II)
44
DRAM Controller
n Purpose and functions
q Ensure correct operation of DRAM (refresh)
q In chipset
+ More flexibility to plug different DRAM types into the system
+ Less power density in the CPU chip
46
DRAM Controller (II)
47
A Modern DRAM Controller
48
DRAM Scheduling Policies (I)
n FCFS (first come first served)
q Oldest request first
49
DRAM Scheduling Policies (II)
n A scheduling policy is essentially a prioritization order
50