Unit 5 Memory
Unit 5 Memory
transaction
16x1-bit memory chip
Memory
Memories come in many shapes, sizes
and types
RAM - random
access memory
ROM - read only
memory
EPROM, FLASH -
electrically
programmable read
only memory
Source: Intel Seminar (R. Rajkumar)
Memory Types
DRAM: Dynamic Random Access Memory
Very dense (1 transistor per bit) and inexpensive
Requires refresh and often not the fastest access
times
SRAM: Static Random Access Memory
Fast and no refresh required
Not so dense and not so cheap
Often used for caches
ROM/Flash: ReadOnly Memory
often used for bootstrapping
Basic Static RAM Cell
6-Transistor SRAM Cell
word
0 1 (row select)
Back-to-Back
0 1 inverters form
Read: bit bit flip-flop
1. Select row
2. Cell pulls one line low and one high
3. Sense output on bit and bit
Write:
1. Drive bit lines (e.g, bit=1, bit=0)
2. Select row
Simplified SRAM timing diagram
tOH
CS
tACS
DOUT
WE = HIGH
r/w logic
address
ports data ports
Dynamic RAM
high speed/poor
C
density
DRAM: simple .
Bit
Line
transistor/capacitor .
.
pairs in high density Sense
form Amp
Refresh at regular
intervals
DRAM Organization
d x w DRAM:
dw total bits organized as d supercells
of size w bits 16 x 8 DRAM chip
cols
0 1 2 3
2 bits 0
/
addr
1
rows
memory supercell
2
controller (2,1)
(to CPU)
3
8 bits
/
data
8 3
/
data
cols
0 1 2 3
CAS = 1
2
/ 0
To CPU addr
1
rows
memory
controller 2
supercell 3
8
(2,1) /
data
supercell
internal row buffer
(2,1)
Memory Organisation
addr (row = i, col = j)
Memory Modules
DRAM 0
64 MB
memory module
consisting of
DRAM 7 eight 8Mx8 DRAMs
63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0
Memory
controller
64-bit doubleword at main memory address A
64-bit doubleword
DRAM Timing Parameters
tRAC: minimum time from RAS line falling
to the valid data output.
Quoted as the speed of a DRAM when buy
A typical 4Mb DRAM tRAC = 60 ns
tRC: minimum time from the start of one
row access to the start of the next.
tRC = 110 ns for a 4Mbit DRAM with a tRAC of
60 ns
More Timing Parameters
tCAC: minimum time from CAS line falling
to valid data output.
15 ns for a 4Mbit DRAM with a tRAC of 60 ns
tPC: minimum time from the start of one
column access to the start of the next.
35 ns for a 4Mbit DRAM with a tRAC of 60 ns
Enhanced DRAMs
All enhanced DRAMs are built around the
conventional DRAM core.
Synchronous DRAM (SDRAM)
Driven with rising clock edge instead of
asynchronous control signals.
SDRAM is tied to the system clock and is designed
to be able to read or write from memory in burst
mode (after the initial read or write latency) at 1
clock cycle per access (zero wait states)
Double data-rate synchronous DRAM (DDR
SDRAM)
Enhancement of SDRAM that uses both clock edges
as control signals.
Embedded DRAM
Provides high density storage
Up to 10 times larger than SRAM
Significantly slower than SRAM
Requires dedicated process for on
chip fabrication
Not well compatible with standard
CMOS technology for logic
implementation
Non-volatile Memory
Mask ROM
Used for dedicated functionality
Contents fixed at IC fab time (truly write once!)
EPROM (erasable programmable)
Requires special IC process
(floating gate technology)
Writing is slower than RAM. EPROM uses special
programming system to provide special voltages and
timing.
Reading can be made fairly fast.
Rewriting is slow.
erasure is first required , EPROM - UV light exposure,
EEPROM – electrically erasable
Flash
Floating Gate MOS
Floating gate is
surrounded by
silicon dioxide ,
which is an
excellent insulator
By controlling
the terminal
voltage, it is
possible to charge
electrically the
floating gate.
Source: L. Benini et al. ACM trans., Feb 2003
EEPROM
Erased using higher than normal
voltage
Can be erased by words and not in
entirety
In circuit programmable
Read in tens of nanoseconds &
Writes in tens of microseconds.
Flash Memory
• Uses single transistor per bit
• EEPROM employs two transistors
• A flash memory provides high density
storage with speed marginally less
than that of SRAM’s
• Write time is significantly higher
compared to DRAM
FLASH Memory
Electrically erasable
In system programmability and erasability (no
special system or voltages needed)
On-chip circuitry and voltage generators to
control erasure and programming (writing)
Erasure happens in variable sized "sectors" in
a flash (16K - 64K Bytes)
Flash NAND
A[31]
SRAM/ROM Memory
Timing
Address should be stable during
the falling edge of output enable
SRAM is fast, ROM is slow
ROM needs more time. Slows the
system
Use Wait states; more complex control
mclk
A[31:0] A B C
RAMOe
ROM Wait Control State
Transition
Example
ROM access requires 4 clock cycles
RAM access is faster
reset
fast
ROM RAM
A[31:0]
wait
ROM0e
fast ROM1 ROM2 ROM3
Operation
Processor internal operations cycles do
not need access to memory
Memory Access is much slower than internal
operations.
Internal
Use wait states for memory Accesses
Operations can
mreq = 1 internal operation run at max speed
mreq = 0 memory access
reset
RAM
mreq decode RAM
ROM
ras
First Row address
is presented and
decoder
array of
latch
latched by ras memory
cells
signal
Next column A[n:0]
address is
presented and latch
latched by cas mux
signal
Data out
cas
Making DRAM Access Fast
Accessing data in the same row using
cas-only access is 2 – 3 times faster
cas-only access does not activate the
cell matrix
If next accesses is within the same row,
a new column address may be presented
just by applying a cas-only access.
Exploiting the fact: most processor
addresses are sequential
DRAM Access
If we had a way of knowing that that
the next address is sequential with
respect with the current address
(current address + 4), then we could
only assert cas and make DRAM
access fast
Difficulty?
Detecting early in memory access cycle
that the next address is in the same row
ARM Solution to cas-only
Access
• Sequential addresses flagged by seq
signal
• The external memory device checks
previous address and row boundaries to
issue cas only or ras-cas combination
Revised State Transition
Diagram
seq = 1: sequential address
seq = 0: non-sequential
mreq = 1 internal operation
mreg = 0 memory access
reset
RAM
decode RAM
mreq seq
ROM seq
mclk
seq
wait
ras
cas
D[31:0]
N cycle S cycle S cycle
New Address Sequential Address
Summary
We have learnt about different types
of memory and their characteristics
We have seen how external memory
can be interfaced in ARM7 based
system
We shall study memory organisation
and use in future classes
Memory Organisation
Memory Centric View of
Embedded System
L1 Off-chip memory
CPU
Cache
Scratch Pad
Memory (SPM)
Memory Organization
Memory system starts with Register file
Cache or caches feed data and instructions
to the pipeline
Most embedded systems use one level cache
Main memory system may be contained
partially on-chip and partially off-chip
Scratch pad memories have been proposed
as one form of high speed on-chip memory
Off-chip a variety of technologies may be
used, including SDRAM
Partitioning of Data
Memory
10-20
cycles
controller
cache main
CPU
memory
address
data data
Cache controller uses different portions of the
address issued by the processor during a
memory request to select parts of Cache
memory
ACK: COMPUTER AS COMPONENT, WAYNE WOLF
Definitions
Working Set: Set of memory locations CPU
refers to at any one time
Cache Hit: When an address requested by
CPU is found in Cache
Cache Miss: If the location is not in Cache
Compulsory Miss ( Cold miss): occurs the first
time a location is used
Capacity Miss: is caused by a too large working
set
Conflict Miss: when two locations map to the
same location in the cache
Cache Operation
A miss causes cache controller to copy
the data from main memory to cache
Data is forwarded to the CPU at the same
time: data streaming
Data occupying cache block is evicted
and replaced by the content of
memory addresses requested by the
CPU
Dirty bit: a status bit to indicate whether
cache content has changed
Direct-Mapped Cache
In a direct-mapped cache, each
memory address is associated with
one possible block within the cache
To look in a single location in the cache
for the data if it exists in the cache
Block is the unit of transfer between
cache and memory
Direct-Mapped
4 Byte Direct
Cache (2) Cache
Index Mapped Cache
Memory
0 0
1 1
2 2
3 3
4
5 Block size = 1 byte
6 Cache Location 0 can be
7 occupied by data from:
8 Memory location 0, 4, 8, ..
9 In general: a fixed mapping from
A memory locations to Cache
B
C
D
E
F
Direct-mapped cache
1 0xabcd byte byte byte ...
valid tag data
cache block
Two regularly
tag index offset used memory
location maps to
= the same location
Index is used to select leading to
cache block to check hit value
byte conflict misses
ACK: COMPUTER AS COMPONENT, WAYNE WOLF
Direct-mapped cache
organization
address
compare mux
hit data
Source: Steve Furber: ARM System On-Chip; 2nd Ed, Addison-Wesley, 2000.
Set-associative Cache
Consists of a number of sets
Characterized by the number of sets it uses
Each set is implemented by direct mapped cache
Memory locations map onto blocks as in Direct
mapped cache
There are n separate blocks for each memory location
Each request is broadcast to all sets
simultaneously
If any of the sets has the location the cache reports a
hit
Set-associative cache
A set of direct-mapped caches:
2-way set-
associative
tag RAM data RAM
cache
organization
compare mux
hit data
compare mux
CAM used
in
ARM920T
mux &
hit data
ARM940T
Source: Steve Furber: ARM System On-Chip; 2nd Ed, Addison-Wesley, 2000.
Cache in ARM
Von Neumann Architecture
Unified Cache: A single cache for
instruction and data
Harvard Architecture:
Two caches: Split Cache
Instruction Cache & Data Cache
A unified instruction and
data cache
FF..FF16
registers
instructions
processor
instructions
address and data
data
copies of
instructions address
copies of
data
memory
cache
instructions 00..0016
and data
Source: Steve Furber: ARM System On-Chip; 2nd Ed, Addison-Wesley, 2000.
FF..FF16
copies of
instructions address
Separate cache
instructions
instructio registers
n caches processor
address
copies of
data
data memory
cache
00..0016
cache main
CPU Write
memory
buffer
logical physical
address memory address main
CPU management
memory
unit
Segmentation based Memory
Management
Segmentation provided by simple MMU
Program views its memory as set of segments.
Code segment, Data Segment, Stack segment,
etc.
Each program has its own set of private
segments.
Each access to memory is via a segment selector
and offset within the segment.
It allows a program to have its own private view
of memory and to coexist transparently with
other programs in the same memory space.
Segment
segment selector logical address
based
Address
Generation
base bound
Segment Descriptor Table (SDT)
+ >?
0:
Page Table 1:
Virtual Physical
Addresses 0: Addresses
1:
CPU
Disk
Disk
Servicing a Page Fault
Processor Signals Controller (1) Initiate Block Read
Read block of length P
starting at disk address X Processor
Processor
Reg
and store starting at (3) Read
memory address Y Done
Read Occurs Cache
Cache
Direct Memory Access (DMA)
Under control of I/O
controller Memory-I/O
Memory-I/Obus
bus
I / O Controller Signals
Completion I/O
I/O
Interrupt the processor controller
Memory
Memory controller
OS resumes suspended
process
(2) DMA
disk
Disk disk
Disk
Transfer
Managing Multiple
Processes
Each process has its own virtual address
space
operating system controls how virtual pages
are assigned to physical memory
A page table for each process
every program can start at the same
address (virtual address)!
A process should not access pages not
allocated to it
Protection
Page table entry contains access rights
information
Page Tables Memory
Read? Write? Physical Addr 0:
VP 0: Yes No PP 9 1:
Process i: VP 1: Yes Yes PP 4
VP 2: No No XXXXXXX
• • •
• • •
• • •
page i base
concatenate
page offset
Address Translation via Page
Table
virtual address
page table base register
n–1 p p–1 0
VPN acts as virtual page number (VPN) page offset
table index
valid access physical page number (PPN)
if valid=0
then page
not in memory m–1 p p–1 0
physical page number (PPN) page offset
physical address
Page Table Operations
Translation
Separate (set of) page table(s) per process
VPN forms index into page table (points to a
page table entry)
Computing Physical Address
Page Table Entry (PTE) provides information
about page
if (valid bit = 1) then the page is in memory.
Use physical page number (PPN) to construct address
if (valid bit = 0) then the page is on disk
Page fault
Must load page from disk into main memory before
continuing
Page Table Operations (2)
Checking Protection
Access rights field indicate allowable
access
e.g., read-only, read-write, execute-only
typically support multiple protection
modes (e.g., kernel vs. user)
Protection violation fault if process
doesn’t have necessary permission
Integrating VM and
Cache
VA PA miss
Trans- Main
CPU Cache
lation Memory
hit
data
miss hit
Trans-
lation
data
Address Translation
with a TLB
n–1 p p–1 0
virtual page number page offset virtual address
TLB hit
physical address
hit
physical address
31 12 11 0
Domains and Memory
Access
Domains control basic access to
virtual memory by isolating one
area of memory from another
when sharing a common virtual
map
There are 16 different domains that
can be assigned to 1 MB sections of
virtual memory
Caches and Write Buffer
Configure the caches and write
buffer for each page in memory
Indicates whether a page will be
cached and whether write buffer for
the page be enabled or not
Use of Virtual Memory
System
Example
Implementation of a static multi-tasking
system running concurrent tasks
Tasks can have overlapping virtual memory
map
May be located in physical memory at
addresses that do not overlap
Configure domain access and permissions in
the page table to protect the system
Demand paging not necessarily implemented
Multi-tasking and MMU
During context switch different
page table activated
Virtual to physical mapping change
To ensure cache coherency, Caches
may need cleaning and flushing
TLB also need flushing
MMU can relocate the task without
the need to move it
Multi-tasking and MMU
To reduce time for context switch
writethrough cache policy can be
followed in ARM9
Data cache configured as
writethrough will not require cleaning
of data cache
Demand Paging
Use flash memory as non-volatile
store
Disks in appliances like PDA’s etc.
copy programs to RAM during system
operation
Dynamic Paging with load on demand
Write-back policy for the pages
because access time to flash is much
more compared to RAM
Demand Paging with Nand-
flash
MCU
MMU
OS OS
APP1 APP1
APP2
STACK, HEAP
FILE SYSTEM
SDRAM
NAND