Lec21 PDF
Lec21 PDF
Lec21 PDF
N. B. Dodge 9/15
N. B. Dodge 9/15
N. B. Dodge 9/15
N. B. Dodge 9/15
N. B. Dodge 9/15
Types of Memory
As we have just seen, even in the everyday PC, the use of
sophisticated memory management is common.
This means that there are four kinds of memory in the modern PC
or workstation computer: Registers, cache, DRAM, and the disk
or HDD (or SSM). And this does not count CDs, DVDs, Zip
drives, thumb drives (flash EPROM), or floppy disks!
The challenge to the computer engineer is to mesh the first five
storage media and to make the use of them transparent that is,
invisible to the user, who will appear to have massive amounts of
high-speed, cheap memory available to solve any problem.
Before we discuss how to manage this extremely challenging
engineering problem, we will discuss the types of memory that are
used and learn a little about them.
6
N. B. Dodge 9/15
Registers
D Q
CR
D FF
32-Bit Reg.
Register Block
N. B. Dodge 9/15
N. B. Dodge 9/15
L1 Cache
L1 cache (level 1 cache) is SRAM memory that is very close to
the CPU. For example, it is next to the ALU in most processors.
L1 cache is basically sets of D FFs but many more than in the
CPU register block.
For example, a typical register block might have 16-32 registers of
4 or 8 bytes each for a total of 64-128 bytes of storage. The Intel L1
Core i7 cache, on the other hand, has 64 Kbytes (32K instruction,
32K data) the equivalent of ~ 500,000 flip-flops.
Access speed of L1 cache is slower, however, due to the complex
arrangement of data buses which is necessary to access specific
bytes in the L1 memory array. It is typically about 1/2-1/3 as fast
as CPU registers in terms of load/store cycle.
9
N. B. Dodge 9/15
L1 Cache (Continued)
D Q
CR
D FF
32-Bit Reg.
L1 Cache
N. B. Dodge 9/15
L2 Cache
The level-2 cache is a bit farther away on the
chip. (L2 is also SRAM).
L2 cache is much larger, since more real
estate is devoted to memory. The Intel Core
i7 has 1 Mbyte cache.
Due to even more elaborate bus arrangements
and the fact that L2 cache is not as close to the
CPU, load/store access is > L1 cache.
11
N. B. Dodge 9/15
L3 Cache
In modern multicore processors, the cores share the L3
cache, which is typically 8-12 Mbyte.
As L1 cache is slower than the register block, and L2 is
slower than L1, L3 cache is slower still, though much
faster than DRAM. The reason is that the L3 cache is
yet even farther away from the CPU, though still on the
chip.
To minimize the degradation in memory speed, the
CPUs are typically clustered around the L3 cache, as
shown in the picture of the Core i7 chip (upcoming).
12
N. B. Dodge 9/15
13
N. B. Dodge 9/15
14
N. B. Dodge 9/15
15
N. B. Dodge 9/15
N. B. Dodge 9/15
Parameter
DRAM
Very fast
High; ~16
transistors
per storage cell
High
Speed
Fast
Low; 1
transistor
per cell
Very low
Virtually
none
Very low
Excessive
High
17
Complexity
Power Used
Heat
Generated
Cost
N. B. Dodge 9/15
DRAM Memory
The term DRAM stands for dynamic random-access memory
(pronounced D-ram, not dram). This means that the title
above is actually redundant!
DRAM is electronic memory that is capable of very fast access
(load or store), but is not as fast as cache. One exception is
Rambus memory, a special DRAM memory whose
manufacturer has announced cache-speed products (up to 7.2
GHz!). It is very expensive, however.
The simple construction of DRAM makes it ideal in modern,
workstation-based computing, where most users have their own
computer system (PC, Mac, Sun, etc.).
DRAM consists of a simple charge-storage device (stored charge =
1), with a switch to store/test the charge. Only a single
transistor is required for a DRAM bit cell.
18
N. B. Dodge 9/15
DRAM (Continued)
The term dynamic in DRAM is due to the fact that the
memory is not truly a flip-flop; it is not static. DRAM
remembers a 1 by storing charge on a capacitor.
Capacitors, however, are not perfect storage elements
the charge leaks off after a short time. Thus the DRAM
element is dynamic its memory lifetime is limited
and it must have its memory refreshed periodically.
On the next several slides, we explore the way DRAM is
constructed and the odd way that it must be treated to
be sure that it retains its memory.
19
N. B. Dodge 9/15
CMOS
transistor
Word
line
Capacitor
Ground
N. B. Dodge 9/15
Bit line
Current
+V (= logic 1)
turns on
transistor
Word
line
CMOS
transistor
Capacitor
charges
Current
+V (= logic 1)
turns on
transistor
Word
line
CMOS
transistor
Ground
21
0V (= logic 0)
Capacitor
discharges
Ground
N. B. Dodge 9/15
Bit line
No current flow
Current
+V (= logic 1)
turns on
transistor
Word
line
CMOS
transistor
Capacitor
charged
+V (= logic 1)
turns on
transistor
CMOS
transistor
Word
line
Ground
logic 0 sensed
Capacitor
has no charge
Ground
To read, or sense the value of the DRAM cell, the word line once again has a
voltage applied to it, which turns on the transistor. If the capacitor is charged,
current flows OUT of the transistor, and this current is sensed and amplified,
showing that a 1 is present. If the capacitor is discharged, no current flows,
so that the sensing element determines that a logic 0 is present.
22
N. B. Dodge 9/15
N. B. Dodge 9/15
Current
Word line
activated
Word
line
CMOS
transistor
+0
Capacitor
discharges
Bit line
Current
Word line
reactivated
CMOS
transistor
Word
line
0+
Capacitor
recharged
Ground
Logic 1 rewritten by
applying +V to bit line
Ground
N. B. Dodge 9/15
N. B. Dodge 9/15
Exercise 1
1. Rank these memories by speed: L2 cache, DRAM, L1
cache, registers, and hard disk drives.
2. A DRAM memory chip is accessed and a bit read out.
The bit that is read is a 1. What happens now?
3. That same memory bit is then left alone (i.e., not
accessed by its addressing mechanism for either read
or write) for several milliseconds. What happens
next?
26
N. B. Dodge 9/15
N. B. Dodge 9/15
Rotation
Current flow
N. B. Dodge 9/15
30
Recording head
N. B. Dodge 9/15
31
N. B. Dodge 9/15
N. B. Dodge 9/15
HDD Package
N. B. Dodge 9/15
Where:
Seek time = time for the positioning arm to move the head from its
present track to the track where the load/store data is located.
Rotational time = time for the requested sector to rotate underneath
the read/write head after the head is positioned over the track.
Transfer time = time for data transfer from disk to main memory.
Controller delay = time to set up transfer in the HDD electronic
interface.
34
N. B. Dodge 9/15
N. B. Dodge 9/15
N. B. Dodge 9/15
37
N. B. Dodge 9/15
N. B. Dodge 9/15
N. B. Dodge 9/15
N. B. Dodge 9/15
L2/3
Cache
DRAM
HDD
Registers
8-64 Kbytes
200 ps
160-2000 Gbytes
~10-20 ms
N. B. Dodge 9/15
N. B. Dodge 9/15
Cache Utilization
Cache designers make use of the principles of temporal and spatial
locality to assure that the most-probably needed instructions and
data are available to the computer in cache (to speed execution).
Special hardware is designed to manage cache content with the
goal of forecasting upcoming instructions and data required by the
processor during program execution and moving it from slower
DRAM into cache.
This hardware has two special goals: (1) examining the currentlyexecuting process and predicting instruction and data need, and
(2) moving the required information from DRAM to cache in a
timely manner to foresee that anticipated need.
43
N. B. Dodge 9/15
44
N. B. Dodge 9/15
N. B. Dodge 9/15
Cache Diagram
CPU Package
CPU
L1
Cache
L2
Cache
DRAM
HDD
Registers
N. B. Dodge 9/15
Summary
Modern memory management maximizes the speed of
computer processing while keeping system cost
reasonable for the user.
This approach uses a small amount of very fast SRAM
cache memory which are physically near the computer,
a substantial amount of DRAM, which is still very fast,
as the main working memory, and HDD or flash
memory for large program storage.
Effective, (but complex) hardware and software have
been developed to manage this memory hierarchy and
maximize its effectiveness.
47
N. B. Dodge 9/15
Exercise 2
1. Each small area of cache (say, 1K byte) represents a
much larger area (say, 1 Mbyte) in DRAM. If an
instruction, for example, is supposed to reside in a
given Mbyte of DRAM, the corresponding cache
extent is searched. Assume that, according to the
validity indicator, the correct instruction is NOT in
cache. What now?
2. Give simple definitions of the principles of temporal
and spatial locality.
48
N. B. Dodge 9/15
N. B. Dodge 9/15
Memory
In 2008, a completely new kind of circuit element (that
was predicted in 1971), the memristor, was developed.
Memristor circuit elements can retain a state (i.e.,
memory) even when power is off. They could eventually
replace flash EPROM, and perhaps DRAM. Thousand
Gbyte main memories are possible.
Memristors remember multiple states (not just ones and
zeros). Thus a memristor memory might eventually
remember like a human neuron, leading to neuraltype processors in the long term.
51
N. B. Dodge 9/15
Memristors: When?
HP says that it will have 100
TByte memristor drives by
2018.
The claim is that such drives
could be packaged in one
blade box for a total of 24
Petabytes!
There is no specific product
roadmap but HPs CTO has
talked confidently of HP
popping 100TB Memristor
drives into StoreServ arrays in
five years.
52
N. B. Dodge 9/15
Memory (3)
Another new memory type is Phase
Change Memory (PCM).
Heated to a high temperature, the device
produces an amorphous crystal
(disorganized structure), with high
conductivity. This is a 1.
To write a 0, the material is heated to a
lower temperature, creating an organized
crystal structure with low conductivity.
Still experimental. Can be read/written
about 100 million timesfar too low.
DRAM and flash memory can do 1-10
quadrillion cycles!).
53
N. B. Dodge 9/15
Memory (4)
Other new memory types:
Magnetic RAM (MRAM) Uses
tunneling resistance that
depends on the relative
magnetization of ferromagnetic
electrodes. Early work.
Resistive RAM (ReRAM)
Varies resistance according to
applied voltage. Nonvolatile,
low power, high density.
Production cost and reliability
are problems.
54
N. B. Dodge 9/15
The CPU
Intel and AMD abandoned the GHz race in CPUs
years ago. In the early 2000s, Intel stated a goal of a
10 GHz CPU by 2010which clearly didnt happen.
Multiple CPUs became the performance enhancer.
The standard is now 4- and 6- core CPUs. Intel Xeon
server CPUs are 8-core.
With the new Broadwell architecture at the 14 nm
node, Intel is now said to be planning an 18-core CPU,
with 8- and 10-core performance CPUs as well. Speed
is inching upmaybe to 5 GHz in the late teens.
55
N. B. Dodge 9/15
N. B. Dodge 9/15
N. B. Dodge 9/15
Three-D Technology
As feature size gets
smaller (22 nm14 nm),
3-D manufacturing
processes continue to
improve.
Generically referred to as
FINFET due to the 3-d
fin. Intel calls it trigate.
At right, comparison of
transistor sizes in 22 and
14 nm processes.
58
N. B. Dodge 9/15
Multi-Core Advance
Intel has announced a 72-core
CPU, Knights Landing.
Available in 2015, it will have
up to 16GB DRAM, up to
500GB/sec of memory
bandwidth, plus up to 384GB
of DDR4-2400 mainboard
memory. KL will use the 14nm
process. With a promise of 3 Intel Knights
Landing,
teraflops (double precision)
with 72 Pentium
per socket it will almost
cores with 64-bit
certainly be used to build some support.
monster x86 supercomputers.
59
N. B. Dodge 9/15
Stampede
New supercomputer at the University of Texas.
Uses several thousand Dell Zeus servers, each with
dual 8-core Intel Xeon processors.
Each Zeus server uses several Knights Corner chips, a
precursor to Knights Landing. Has 522,080 cores.
Knights Corner also uses modified Pentium-era cores.
Has 270 Tbytes (yes, that TERA bytes) of DRAM.
Has 14 Petabytes of storage memory.
Peak performance = 9,600,000,000,000,000 floating
point operations per second (9.6 petaflops).
Developers claim that exaflops are on the way.
60
N. B. Dodge 9/15
* Intel has odd CPU Family names, like Apple OS X updates Leopard, Snow Leopard, Lion.
N. B. Dodge 9/15
Lecture # 21: Memory Management in Modern Computers
Tick-Tock
Intel is said to follow a Tick-Tock design strategy.
The Tick phase is shrinking the minimum feature size.
Tock is introducing a new microarchitecture. Thus:
62
Tick
Tock
Tick
Tock
Tick
Tock
Skylake (Mid-2015)
Tick
N. B. Dodge 9/15
N. B. Dodge 9/15
N. B. Dodge 9/15
N. B. Dodge 9/15
Windows (Continued)
Windows 10 has been free to early adapters via
download.
This probably will not remain true for long. However,
W10 may be offered at a steep discount to Windows XP
users to get them to ditch their 13-year-old operating
system. Maybe even a similar carrot for Vista users.
Other rumors state that W10 is the last Windows.
An OS that can be more easily updated (a la Apple)
may be next. It will be interesting to see Microsoft OS,
TNG (the next generation) and its features.
66
N. B. Dodge 9/15
Samsung 105 4K TV
N. B. Dodge 9/15
3-D Printing
N. B. Dodge 9/15
69
N. B. Dodge 9/15
N. B. Dodge 9/15