CHAPTER 4
COMPUTER MEMORY SYSTEM
Topics
4.1) Computer Memory System
4.2) Internal Memory
4.3) External Memory
4.4) Cache Memory System
4.1) Computer Memory System
Generally, computer memory
system can be divided into 2
categories:
Internal memory
External/Secondary Memory
3
Memory Hierarchy
1) Decrease cost per bit (shorter
access timegreater cost per
bit)
2) Increase capacity (Internal
memory is much smaller than
external memory)
3) Increase access time (It is
always faster to access data
from internal memory and
greater capacity longer
access time)
4) Decrease frequency of access
by the CPU
4.2) Internal Memory
There are 3 types of internal
memory:
a) Registers (in the CPU)
b) Cache
c) Main Memory (RAM & ROM)
Information stored in internal memory can be accessed
with greater speed by the CPU compared to external
memory.
Internal memory is contained on computer chips and
uses electronic circuits to store information
Internal memories are volatile, only with exception to
ROM
5
Main Memory
Central Storage of the computer.
Used to store instructions and data that are being
executed (run)
They are built based on Semiconductor Integrated
Circuits
Basically, they are divided into RAM and ROM
6
Unique address for each location
n address bits can address 2n locations
Access one word at a time
Basic operations are read and write
RAM
Acronym for Random Access Memory
Information in RAM can be read and modified (write)
Used to store programs and user data that is being
executed (run)
When an instruction set (or program) is typed into a
computer, the program is stored in a part of this main
memory called the RAM.
When the computer is switched off, the program will
disappear from the RAM.
A kind of volatile memory, i.e. the content in RAM will
be wiped out if electricity supply is interrupted.
8
Types of RAM
1. Static RAM (SRAM)
Made from flip-flop – a logic circuit which retains the information
stored in it as long as there is enough power to run the device
Disadvantages : expensive, small storage capacity in a chip
Advantages : easier to use and faster access time (read and write
cycle)
2. Dynamic RAM (DRAM)
Made from capacitors
Disadvantages : slower access time, harder to use because capacitors
need to be charge (‘refresh’) for a certain amount of time
Advantages: cheaper, bigger storage capacity in a chip
A few improvements has been done on DRAM to improve its
performance (access time) with different method and names, e.g.,
EDRAM, CDRAM, SDRAM, RDRAM etc
9
Following is a block diagram for a unit (chip) RAM that has 2k
word and the length of each word is n bit
n input line (from data
bus)
k address line (from
bus address)
RAM
(2k x n)
2k word
From control read
bus write
n bit
n output line (to data bus)
Note:
• memory is measured in base 2 units
• capacity in internal memory is typically expressed in terms of bytes or words
10
For example, a 224 x 16 RAM memory contains 224 = 16M
words, each 16 bits long.
? input line (from data
bus)
? address line (from
bus address)
RAM
(224 x 16)
224 word
From control read
bus write
? bit
Total capacity =
224 x 16 = 224 x 24 = 228 bits
= 225 bytes ? output line (to data bus)
11
Example -- RAM
We can use these cells
to make a 4 x 1 RAM.
Since there are four
words, ADRS is two
lines.
Each word is only one
bit, so DATA and OUT
are one bit each.
Notes: CS (chip select), enables or disables the RAM, WR selects between reading from or writing
Chip Organization
• Example of a
DRAM
organization
Module Organization
• Example:
− Memory
module of
256K words
of 8 bit each
8 bits
11 address 8 data lines
lines
2 control signals
4 bits, 4 data lines in and out
222 words, each 4 bits wide
22 address lines are required note
that there are only 11??? the chip
is designed to save number of pins,
the address is multiplexed on the
eleven pins (211 x 211)
ROM
Acronym for Read Only Memory
Information in ROM can only be read (cannot be written or
modified)-- programmed on the ROM chips by the manufacturer
Used to store permanent programs, e.g.:
bootstrap loader, i.e. a program used to start the operation of a
computer when the computer is turned on
Check to see that the cable to the printer is connected
Interpret each key on the Keyboard to the control unit
There are various kinds of ROM, among them are PROM,
EPROM and EEPROM. The difference is the ability to
change data in them using specific methods and equipments
17
The following is a block diagram of a ROM with the size of
2k x n (2k word and the length of each word is n bit)
k address line
(from address
bus)
n data line
(from data 2k word
bus)
n bit
Note:
Why don’t ROM chips have read-write pins?
18
8 bits, 8 data lines to be read out
220 words, each 8 bits wide
4.3) External Memory
Memories that are separated from the main computer
components
Also known as secondary memory
Instances of external memory are hard disks, diskettes,
CD-ROM, USB (universal serial bus) drive etc
20
External Memory (Cont.)
All external memory are non-volatile -- data stored in
internal memory is lost (except ROM) when the computer is
turned off but data stored in external memory is retained.
Used to store information (program and data) that has not or
has already been processed by the computer system.
Consists of secondary storage devices such as discs and
tapes where information contained in them are accessed by
the CPU through the I/O unit (via bus)..
21
Disk
The 2 most popular types:
1. Magnetic Disk
2. Flash Memory
Magnetic Disk
There are 2 kinds, i.e. hard disk and floppy disk.
A round platter made from metal or plastic and
shrouded with magnetic coating
22
Disk Data Layout
23
Disk Data Layout (cont.)
• Consists of tracks and sectors
- Track: 500 – 2000 tracks/surface
- Sector: 10 – 100 sector/track (fixed or varied)
• Data are moved into/taken out from the disk in block size (sector/track)
• The amount of data in every track is same => data density varies
• There are 2 types of read/write heads:
1. Fixed head: One head per track
2. Movable head: One head for all tracks
• Disks are inserted into the disk drive.
• There are 2 types of disk:
1. Movable disk: can be inserted and taken out from the disk drive, such
as the floppy disk.
2. Fixed disk: cannot be taken out from the disk drive, for instance hard
disk.
24
Flash Memory
2 types of flash memory : static and removable
A type of nonvolatile memory that can be erased
electronically and rewritten, similar to EEPROM.
Most computers use flash memory to hold their startup
instructions because it allows the computer easily to
update its contents.
Flash memory chips also store data and programs on
many mobile computers and devices such as smart
phones, portable media players, PDAs, printers, digital
cameras, automotive devices, pagers and digital voice
recorders.
Flash Memory (Cont)
Removable flash memory includes memory cards, USB
flash drive, and PC Cards/ExpressCard Module
4.4) Cache Memory System
Hierarchy List
Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape
Memory Hierarchy - Diagram
So you want fast?
It is possible to build a computer which uses only static
RAM (see later)
This would be very fast
This would need no cache
How can you cache cache?
This would cost a very large amount
Why do we need cache?
Memory is slow compared to the time required to execute
instructions
Memory becomes more expensive as it gets faster
Compromise
Place a small amount of very high speed memory to temporarily hold a
portion of the memory being accessed.
Cache
Small amount of fast memory
Sits between normal main memory and CPU
May be located on CPU chip or module
Cache and Main Memory
Cache/Main Memory Structure
CPU
CACHE: divided into a few slots
MM: divided into a few blocks or lines
(block size = slot size)
Example:
MM = 16 Mbytes
Memory block = 4 bytes
Memory = 4M blocks of 4 bytes each
Cache can hold 64 Kbytes
Cache is 16K lines of 4 bytes each
Figure shows connection of Cache and Main Memory
Word Block
Transfer Transfer
CPU Cache Main Memory
35
Cache Addressing
Where does cache sit?
Between processor and virtual memory management unit
Between MMU and main memory
Logical cache (virtual cache) stores data using virtual
addresses
Processor accesses cache directly, not thorough physical cache
Cache access faster, before MMU address translation
Virtual addresses use same address space for different applications
Must flush cache on each context switch
Physical cache stores data using main memory
physical addresses
Cache Memory (cont.)
A small and fast memory that situated between the CPU and the
MM (conceptually, not physically).
Reduces the difference of time of the memory cycle and the
CPU processing time.
The memory cycle time is the time used to fetch an instruction
or data from the main memory until it is usable.
With the technology available today, the processing time of the
CPU is faster than the memory cycle time by far.
Hence, a memory, which was called Cache, was introduced and
it was situated between the CPU and the MM.
37
Typical Cache Organization
data and address lines attached to buffers attached to
system bus to reach main memory
Cache connects to
the processor via
data, control, and
address lines
The method used for cache manufacturing rendered its memory
cycle time far faster than that of the main memory but its cost
was very expensive.
We would like the size of the cache to be small enough so that
the overall average cost per bit is close to that of main memory
alone and large enough so that the overall average access time is
close to that of the cache alone.
What is the ideal size for the cache??????
How does cache work?
39
How Cache Works?
• When the CPU needs certain data or instruction, it will refer to
the cache memory first.
• If it is in cache, then data access will be fast.
• Otherwise, it will have to be fetched from the MM and the data
block that contains the required data or instruction will be moved
into the Cache Memory with the hope that the next
instruction/data required by the CPU is contained in the block.
40
Cache operation – overview
CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from main memory
to cache
Then deliver from cache to CPU
Cache includes tags to identify which block of main
memory is in each cache slot
Cache Read Operation - Flowchart
Because there are fewer cache lines than main memory
blocks, an algorithm is needed for mapping main memory
blocks into cache lines/slots. Three techniques can be used:
direct, associative and set-associative.
Once the cache has been filled, when a new block is brought
into the cache, one of the existing blocks must be replaced.
A replacement algorithm is needed. What types of
algorithm???????
43
Cache Replacement Algorithm
LRU – least recently used
FIFO – first in first out
LFU – least frequently used
Cache Design
Addressing
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
Size does matter
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
Comparison of Cache Sizes
Year of
Processor Type L1 cache L2 cache L3 cache
Introduction
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
High-end server/
IBM SP 2000 64 KB/32 KB 8 MB —
supercomputer
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
Hit Ratio (L1 & L2)
For 8 kbytes and 16 kbyte L1
Unified v Split Caches
One cache for data and instructions or; two, one for data
and one for instructions
Advantages of unified cache
Higher hit rate
Balances load of instruction and data fetch
Only one cache to design & implement
Advantages of split cache
Eliminates cache contention between instruction fetch/decode
unit and execution unit
Important in pipelining
Pentium 4 Cache
80386 – no on chip cache
80486 – 8k using 16 byte lines and four way set
associative organization
Pentium (all versions) – two on chip L1 caches
Data & instructions
Pentium III – L3 cache added off chip
Pentium 4
L1 caches (8k bytes, 64 byte lines, four way set associative)
L2 cache (Feeding both L1 caches, 256k, 128 byte lines, 8 way set
associative)
L3 cache on chip
Intel Cache Evolution
Processor on which feature
Problem Solution first appears
Add external cache using faster 386
External memory slower than the system bus. memory technology.
Move external cache on-chip, 486
Increased processor speed results in external bus
operating at the same speed as the
becoming a bottleneck for cache access.
processor.
Internal cache is rather small, due to limited space on Add external L2 cache using faster 486
chip technology than main memory
Contention occurs when both the Instruction Prefetcher Create separate data and Pentium
and the Execution Unit simultaneously require access to instruction caches.
the cache. In that case, the Prefetcher is stalled while the
Execution Unit’s data access takes place.
Create separate back-side bus that Pentium Pro
runs at higher speed than the main
Increased processor speed results in external bus (front-side) external bus. The BSB
becoming a bottleneck for L2 cache access. is dedicated to the L2 cache.
Move L2 cache on to the processor Pentium II
chip.
Some applications deal with massive databases and must Add external L3 cache. Pentium III
have rapid access to large amounts of data. The on-chip
caches are too small. Move L3 cache on-chip. Pentium 4
Pentium 4 Block Diagram
Pentium 4 Core Processor
Fetch/Decode Unit
Fetches instructions from L2 cache
Decode into micro-ops
Store micro-ops in L1 cache
Out of order execution logic
Schedules micro-ops
Based on data dependence and resources
May speculatively execute
Execution units
Execute micro-ops
Data from L1 cache
Results in registers
Memory subsystem (L2 cache and systems bus)