Memory
The Memory/Processor Performance Gap
Memory-Processor Performance Gap
The memory hierarchy
CPU
Increasing distance
Level 1
from the CPU in
access time
Levels in the Level 2
memory hierarchy
Level n
Size of the memory at each level
The memory hierarchy
Fundamentals of Memory Hierarchy
Locality
Temporal locality
The currently required data are likely to need again in the near future
Spatial locality
There is high probability that the other data nearby will be need soon
Two Processor/Memory Architectures
Processor Processor
Princeton
Fewer memory
wires
Harvard Program Data memory Memory
memory (program and data)
Simultaneous
program and data
memory access Harvard Princeton
Memory Access Process
Physical
Address
Virtual Tag
address hit miss
CPU Main
TLB Cache
Memory
miss hit
Address
Translation
data
TLB (Translation lookaside buffer): Translate a virtual
address to a physical address
Memory management units
Handles DRAM refresh, bus interface and arbitration
Takes care of memory sharing among multiple processors
Translates logic memory addresses from processor to physical memory
addresses
logical physical
address memory address main
CPU management
memory
unit
Memory Data Organization
Endianness
Big Endian/Little Endian
Memory data alignment
Endianness
The order of bytes (sometimes “bit”) in memory to
represent different data types
Little/Big Endian
Little Endian:
put the least-significant byte first (at lower address)
e.g. Intel Processor
Big Endian:
put the most-significant byte first
e.g.some PowerPCs, Motorola, MIPS, SPARC
Big/Little Endian Example
32bit data 0xFABC0123 at address 0xFF20
0xFF20 0xFF21 0xFF22 0xFF23
Big Endian 0xFA 0xBC 0x01 0x23
Little Endian 0x23 0x01 0xBC 0xFA
Data Alignment
Data alignment
A datum with multiple bytes need to be allocated to an address
that is a multiple of its size
Examples
0bxxxxxxxxxx byte (8bit) aligned
0bxxxxxxxxx0 half word (16bit) aligned
0bxxxxxxxx00 word (32bit) aligned
0bxxxxxxx000 double word (64bit) aligned
Why aligned?
Misalignment causes implementation complications and reduces
performance
Memory device: basic concepts
m × n memory
Stores large number of bits …
m x n: m words of n bits each
m words
…
k = Log2(m) address input signals
or m = 2^k words
e.g., 4,096 x 8 memory: n bits per word
32,768 bits
12 address input signals memory external view
8 input/output data signals r/w
2k × n read and write
memory
Memory access enable
r/w: selects read or write A0
…
enable: read or write only when asserted Ak-1
…
multiport: multiple accesses to different locations
simultaneously
Qn-1 Q0
Memory Types
ROM: “Read-Only” Memory
Nonvolatile memory
Can be read from but not written to, by a processor in an
embedded system External view
Traditionally written to, “programmed”, before inserting to enable 2k × n ROM
embedded system A0
…
Uses Ak-1
…
Store software program for general-purpose processor
Qn-1 Q0
program instructions can be one or more ROM words
Store constant data needed by system
Implement combinational circuit
Example: 8 x 4 ROM
Horizontal lines = words Internal view
Vertical lines = data
8 × 4 ROM
Lines connected only at circles word 0
enable 3×8 word 1
Decoder sets word 2’s line to 1 if address input decoder word 2
is 010 A0 word line
A1
Data lines Q3 and Q1 are set to 1 because there A2
is a “programmed” connection with word 2’s
line data line
programmable
Word 2 is not connected with data lines Q2 and connection wired-OR
Q0 Q3 Q2 Q1 Q0
Output is 1010
Implementing combinational function
Any combinational circuit of n functions of same k variables can be done with 2^k x n
ROM
Truth table
Inputs (address) Outputs
a b c y z 8×2 ROM
0 0 word 0
0 0 0 0 0
0 0 1 0 1 0 1 word 1
0 1 0 0 1 0 1
0 1 1 1 0 enable 1 0
1 0 0 1 0 1 0
1 0 1 1 1 c 1 1
1 1 0 1 1 b 1 1
1 1 1 1 1 1 1 word 7
a
y z
Types of ROM
Written during manufacture
Programmable (once)
PROM
Needs special equipment to program
Read “mostly”
Erasable Programmable (EPROM)
Erased by UV
can program and erase individual words
Electrically Erasable (EEPROM)
Takes much longer to write than read
can program and erase individual words as well
Flash memory
Large blocks of memory read/write at once, rather than one word at a time
Faster erase
RAM
external view
Random access memory r/w
enable
2k × n read and write
memory
Typically volatile memory A0
…
bits are not held without power supply
Ak-1
Read and written easily by processor during execution …
Internal structure more complex than ROM Qn-1 Q0
a word consists of several memory cells, each storing 1 bit internal view
I3 I2 I1 I0
each input and output data line connects to each cell in its
column 4×4 RAM
rd/wr connected to every cell enable 2×4
decoder
when row is enabled by decoder, each cell has logic that stores
A0
input data bit when rd/wr indicates write or outputs stored bit A1
when rd/wr indicates read Memory
cell
rd/wr To every cell
Q3 Q2 Q1 Q0
Types of RAM
SRAM: Static RAM
Memory cell uses flip-flop to store bit
Holds data as long as power supplied
DRAM: Dynamic RAM
Memory cell uses transistor and capacitor to store bit
More compact than SRAM
“Refresh” required due to capacitor leak
Slower to access than SRAM
Device Schematic and Time Diagram
11-13, 15-19 data<7…0> data<7…0>
11-13, 15-19
2,23,21,24, addr<15...0> addr<15...0>
27,26,2,23,21,
25, 3-10 24,25, 3-10
22 /OE 22 /OE
27 /WE 20 /CS
20 /CS1
CS2 HM6264 27C256
26
block diagrams
Device Access Time (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V)
HM6264 85-100 .01 15 5
27C256 90 .5 100 5
device characteristics
Read operation Write operation
data data
addr addr
OE WE
/CS1 /CS1
CS2 CS2
timing diagrams
Composing memory
Increase number of words
Memory size needed often differs from size of readily available
2m+1 × n ROM
memories 2m × n ROM
When available memory is larger, simply ignore unneeded high- A0
… …
order address bits and higher data lines Am-1
1×2 …
When available memory is smaller, compose several smaller Am decoder
memories into one larger memory 2m × n ROM
enable
Connect side-by-side to increase width of words
…
Connect top to bottom to increase number of words
…
added high-order address line selects smaller memory containing
desired word using a decoder
Combine techniques to increase number and width of words …
Qn-1 Q0
A
2m × 3n ROM
Increase number
enable 2m × n ROM 2m × n ROM 2m × n ROM
and width of
Increase width words
A0 … … …
of words enable
Am
… … … outputs
Q3n-1 Q2n-1 Q0
Summary
Memory hierarchy
Memory/processor architecture
Memory access process
Endianness
Data alignment
Memory data organization
Memory devices
Basics
ROM/RAM