0% found this document useful (0 votes)
40 views54 pages

EE6304 Lecture8 Mem Hierarchy

This document provides an overview of a lecture on memory hierarchies. It discusses the need for memory hierarchies due to the performance gap between processors and memory. It describes the different levels of memory hierarchies including registers, caches, dynamic RAM (DRAM), and hard disks. It explains the tradeoffs between memory size, speed, and cost. The document also summarizes different types of RAM and ROM memories and how they are used in computer systems.

Uploaded by

Ashish Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views54 pages

EE6304 Lecture8 Mem Hierarchy

This document provides an overview of a lecture on memory hierarchies. It discusses the need for memory hierarchies due to the performance gap between processors and memory. It describes the different levels of memory hierarchies including registers, caches, dynamic RAM (DRAM), and hard disks. It explains the tradeoffs between memory size, speed, and cost. The document also summarizes different types of RAM and ROM memories and how they are used in computer systems.

Uploaded by

Ashish Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

EEDG/CE/CS 6304 Computer Architecture

Lecture 8-Memory Hierarchy

Benjamin Carrion Schaefer


Associate Professor
Department of Electrical and Computer Engineering
Computer System Overview
• Fundamentals of Design and Analysis of
Computers (2 lectures)
– History, technological breakthroughs, etc.
– Trends and metrics: performance,
power/energy, cost
• CPU (7 Lectures)
– Instruction Set Architecture
– Arithmetic for Computers (new)
– Instruction Level Parallelism (ILP)
– Dynamic instruction scheduling
– Branch prediction
– Thread-level parallelism
– Modern processors
• Memories (4 Lectures)
– Memory hierarchy
– Caches
– Secondary storage
– Virtual memory
• Buses (1 lecture)
• New computer structures: Heterogeneous
computing (1 lecture)
Learning Objectives
• Upon completion of this lecture, you will be able to:
– Understand the need for Memory Hierarchies
• On chip – Caches
• Off chip - DRAM
– Classification of memories and differentiate between them
• SRAM
• DRAM
• Non-volatile (Flash, Magnetic disk)
– Recent trends in memories
• Hybrid memories
• Rowhammer effect
• In memory computing

Ref1: Hennessey & Patterson, 5th Edition, Morgan Kaufmann– Chapter 2


Ref2: IOLTS 2018 keynote– Onur Mutlu
Memories Definition
• Memories are circuits of systems that store
digital information in large quantity
• Can be categorized based on:
– Volatile : DRAM, SRAM
– Non-volatile : Flash, EPROM, ROM
– Access mechanism
– On Chip/off chip
– Capacity
èMost important metric is cost per bit

4
Memory Semiconductor Business - 2020

è ~30% of worldwide semiconductor business is due to memory chips

6
Memory Classifications
Memory Types
Memory Arrays

Random Access Memory Serial Access Memory Content Addressable Memory


(CAM)
Hash tables, caches
Read/Write Memory Read Only Memory
Shift Registers Queues
(RAM) (ROM)
(Volatile) (Nonvolatile)

Serial In Parallel In First In Last In


Static RAM Dynamic RAM Parallel Out Serial Out First Out First Out
(SRAM) (DRAM) (SIPO) (PISO) (FIFO) (LIFO)

Mask ROM Programmable Erasable Electrically Flash ROM


ROM Programmable Erasable
(PROM) ROM Programmable
(EPROM) ROM
(EEPROM)

7
Memory Capacity, Memory Organization, Speed

• Memory Chip Capacity: the number of bits that a


memory chip can store (Kbits, Mbits, …)
• Memory Organization: memory chips are organized into
a number of locations within the chip. Each location can
hold 1 bit, 4 bits, 8 bits, or even 16 bits.
– Assume x address pins and y data pins, the organization of the
memory chip: 2x×y and the chip contains 2x×y bits.
– Note that a different interpretation is used for a DRAM memory
chip
• Speed: the access time (normally ns)

8
Example

9
Memory Hierarchy
• Memory size vs. speed vs. cost trade-offs
PC Memories Hierarchy
+

Registers
On-
chip
Cost Cache L1
Speed
per bit ($)
(SRAM-CAM)

Cache L2 (SRAM)

Dynamic RAM (DRAM) Off-


chip
- Hard Disk (Spinning magnetic discs)
11
Why Memory Hierarchy?
• Fast and small memories
– Enable quick access (fast cycle time)
– Enable lots of bandwidth
• Slower larger memories
– Capture larger share of memory
– Still relatively fast
• Slow huge memories
– Hold rarely-needed state
– Need to store programs and data
è All together: provide appearance of large, fast
memory with cheap, slow memory
Performance Gap
• Historical performance gap between processor and
memory The rate of growth for DRAM memories has
• How to address this? slowed down. If 8 Gbits of DRAM was first
available in 2015, and 16 Gbits were available in
2019, what is the current yearly growth rate?
Solution: 16/8=x(2019-2015) à 18.9% growth
On-Chip : Registers

• Very quick access time (quickest of all)


– Access time as low as 1 clock cycle
• BUT
– Limited number of registers
– Need dedicated assembly instructions for each
register

14
On-Chip : Cache ($) Level 1 (L1) and Level 2 (L2)

• Smaller on-chip memory which stores copies of the


data from frequently used main memory locations
(RAM).
• Split into:
– Instruction cache (stores instructions to be executed) – I$
– Data cache (stores data which might be needed) – D$
• L1 cache smaller but faster (part of the instruction
pipeline)
• L2 cache larger but slower

15
On-Chip : Cash ($) Level 3 (L3)
• Third level of cache.
• Normally used in multi-core processors or between
CPU and Graphics unit as shared on-chip memory

16
Main Memory System

• Main memory is a critical component of all computing


systems: server, mobile, embedded, desktop, sensor
• Main memory system must scale (in size, technology,
efficiency, cost, and management algorithms) to
maintain performance growth and technology scaling
benefits
Off-Chip : DRAM
• DRAM Technology à Each bit is hold by 1
transistor and a capacitor
– Larger integration than SRAM(on-chip memory) ,
which need 6 transistors to hold 1 bit
– Cheaper
– BUT Need to refresh the contents at constant
intervals à slower

SIMM :single in-line memory modules


DIMM: dual in-line memory modules (
have separate electrical contacts on each side of
the module)

18
DRAM Types
• Different types of RAMs based on data rate
– DDR : 3.2 Gbytes/second
– DDR2 : 8.5 Gbytes/second and can be installed in pairs to
increase throughput
– DDR3 : 12.8 Gbytes/seconds and can be installed in pairs or
groups o 3
• Type of memory limited by your motherboard type
• RAM vendors have online memory scanning utilities
(www.crucial.com)

19
SDRAM vs. DDR RAM
• SDR DRAM (Single Data Rate)
– First generation DRAM
– Transfer a single data word per clock cycle (data word depends
on the design of memory system , typically 32 or 64-bits)
• DDR DRAM (Double Data Rate)
– Two memory transfers per clock cycle

Upton et al., Learning Computer Architecture with Raspberry PI. Wiley


DDR2, DDR3 and DDR4
• Memory accesses occur as short burst from
starting address
• Access to memory array is time consuming
• After first column is read à subsequent columns
from the same row available for “free” (called
pre-fetching)
• DDR : forces to take 2 adjacent 32-bit words
• DDR2: forces to take 4 adjacent words (doubles)
• DDR3: forces 8 to take 8 words (doubles again)
DDR Prefetching
• Read adjacent memory locations
• DDR2 prefetching example (take 4 adjacent words):
Memory Types - RAM
• Random Access Memories (RAM) store information in FF
style circuits (SRAM) or charging capacitors (DRAM)
• Approximately equal delays in reading or writing
• Volatile because data is stored in active circuits (data gets
deleted when powered off)
• Most common types of RAM
– SRAM : Static RAM
• Stores values in FF
• High speed – on-chip memory
– DRAM : Dynamic RAM
• Stores values in capacitors – larger integration as SRAM
• Slower than SRAM – off-chip memory
• Requires re-fresh circuitry to re-charge capacitor

23
Static RAM (SRAM)
• It retains data for as long as power supply is
applied.
• No special action (except power) is required
to retain stored data.
• The access time of a SRAM is much shorter
than that of a DRAM and secondary storage.
– Made of Flip-Flops for storage (each cell 4-
6 transistors to hold 1 bit)
• Normally used in CPU internal memories
(registers, caches, etc…)
24
6116 SRAM (cont.)

25
Memory Organization
• Most common organization in random-access architecture (RAM)
– Any memory location can be accessed in random order at a fixed rate
– Independent of physical location
– For reading or writing
• Access cell (R/W) by selecting its row (wordline) and column (bitline)
• Bit selection done using a multiplexer circuit to direct cell output to
register. Total of 2nx2m cells can be stored

bitline conditioning
wordlines
bitlines
row decoder

memory cells:
2n-k rows x
2m+k columns

n-k
k column
circuitry
n column
decoder 27
2m bits
Memory Organization
• If n=m=8 (wordline)è memory has a total of 65,536 cells (28x28)
• Memory uses 16-bit address to produce a single bit output ( 2n = 65,536
àn=16)

SRAM cell =
6 transistors

Ref: Hodges, Jackson, Saleh, Analysis and Design of Digital Integrated Circuits 28
SRAM – 6T Cell
• Cell size accounts for most of array size
– Reduce cell size at expense of complexity
• 6T SRAM Cell
– Used in most commercial chips
– Data stored in cross-coupled inverters
• Value is stored symmetrically—both true and
complement are stored on cross-coupled transistors

• Write:
– Drive data onto bit, bit’
– Raise wordline (select)
• Read:
– Raise wordline (select)
– Read data on bit, bit’

29
SRAM - Write
– Drive data onto bit, bit’
– Raise wordline (select)
– Lower wordline (select)
1

0 0 1 1

30
SRAM - READ
– Raise wordline (select)
– Read data on bit, bit’
– Lower wordline
1

0 0 1 1

31
Content-addressable Memory (CAM)
• Based on SRAM
• Also called associative memories
• Compares simultaneously the desired information against the
entire list of pre-stored entries à extremely fast (typically 1 clock
cycle)
– Matching scheme vs. decoding scheme
• Supplies desired data (tag) à returns data address (in some
architectures, it also returns the contents of that storage address)
Dynamic Random-Access Memories (DRAM)
• Smaller area memories reduce cost per bit
• SRAM require 6 transistors cell and 4-5 lines
connecting each cell
• DRAM store data as charge on a capacitance
• Leakage current removes stored charge è DRAM
require periodic refreshing of charge
• Refresh circuit makes DRAM:
– Slower than SRAM
– Consume more power

33
DRAM 1T cell
• Reading/Writing by turning transistor M1 on
• Data stored as high/low in C1
• C1 as small as possible, but trade-off to re-fresh more often
• During Read cycle data is destroyed (destructive read-out)
due to C discharge è needs to be regenerated
• After the data is read out the sense amplifier must
immediately write it back

34
DRAM 1T cell
• Read
– Sensing circuitry measures the amount of charge that flows
into the capacitor and determines whether it is zero or a
one.
– Capacitor refreshed by either fully charging it or completely
depleting it of charge, depending on which state it was
initially
• Write
– Wordline and the bitline are brought to high voltageà the
transistor is on
– charge can flow to the capacitor
• If the capacitor initially had no charge (stored zero) à charge flows
into the capacitor
• If the capacitor initially is charged (stored one) àvery little charge
flows into the capacitor

35
Data Retention in Memory
• Retention Time profile of DRAM

Location dependent
Stored value pattern dependent
Time dependent
DRAM – CAS/RAS
• Used to reduce the number
of pins à DRAM is used as
main computer memory
(currently ~8Gbytes)
• Send first address half (RAS)
• Send second address half
(CAS)
• DRAM has internal latch to
store the full address
èComplex interface
èRefresh circuit

37
Ref. Wikipedia
DRAM Read Cycle
• Simplified read Cycle

38
Example

39
Example

40
Memory Organization – 1 Byte
• Only 1-bit can be read or stored per array à Need multiple arrays
E.g., 8 to store 1 byte
• Memory array organized like this is called : Bank
Bank Organization
• Single chip will contain 4,8 or 16 banks
• 3-bits or memory address allocated to bank address (16 bank case)
DIMM Organization
• 8 Banks fitted on same circuit board: stick or RAM, memory module or
DIMM
• DIMM connected to CPU through memory channel
– All DIMMs connected to the same memory channel called RANK (typically each
DIMM on separate channel : Single RANK DIMM)
• Typical PC has four DIMM slots
– If can access all four DIMMS individually : Quad-channel mode (not all PCs are
quad-channel mode èuse utility e.g., CPU-Z
DRAM Capacity, Bandwidth, Latency Trends

Capacity Bandwidth Latency 128x


DRAM Improvement (log)

100

20x
10

1.3x
1
1999 2003 2006 2008 2011 2013 2014 2015 2016 2017

[Ref: IOLTS 2018 keynote– Mutlu]


Major Trends Affecting Main Memory
• DRAM scaling has already become increasingly difficult
– Increasing cell leakage current, reduced cell reliability, increasing
manufacturing difficulties [Kim+ ISCA 2014],
[Liu+ ISCA 2013], [Mutlu IMW 2013], [Mutlu DATE 2017]
– Difficult to significantly improve capacity, energy
• Emerging memory technologies are promising

3D-Stacked DRAM higher bandwidth smaller capacity


Reduced-Latency DRAM
lower latency higher cost
(e.g., RL/TL-DRAM, FLY-RAM)
Low-Power DRAM higher latency
lower power
(e.g., LPDDR3, LPDDR4, Voltron) higher cost
Non-Volatile Memory (NVM) (e.g., higher latency
PCM, STTRAM, ReRAM, 3D Xpoint) larger capacity higher dynamic power
lower endurance

[Ref: IOLTS 2018 keynote– Mutlu]


Major Trend: Hybrid Main Memory
• Hardware/software manage data allocation
and movement to achieve the best of
multiple technologies

CPU
DRAM PCM
Ctrl Ctrl
DRAM Phase Change Memory (or Tech. X)
Fast, durable
Small, Large, non-volatile, low-cost
leaky, volatile, Slow, wears out, high active energy
high-cost

[Ref: IOLTS 2018 keynote– Mutlu]


DRAM RowHammer
• Repeatedly reading a row enough times (before
memory gets refreshed) induces disturbance
errors in adjacent rows in most real DRAM chips

Row of Cells Wordline


Row Row
Victim
Row Opened
HammeredClosed
Row VHIGH
LOW
Row Row
Victim
Row
Recent DRAMs are More Vulnerable
Recent DRAMs are More Vulnerable

First
Appearance
Recent DRAMs are More Vulnerable
• All modules from 2012-2013 become vulnerable

First
Appearance
Taking over the Computer

Flipping Bits in Memory Without Accessing Them:


An Experimental Study of DRAM Disturbance Errors
(Kim et al., ISCA 2014)

Exploiting the DRAM rowhammer bug to


gain kernel privileges (Seaborn, 2015)

51
RowHammer
• Like breaking into an apartment by repeatedly slamming a
neighbors door until the vibrations open the door you were
after
Some Potential Solutions

• Make better DRAM chip à cost


• Refresh frequently à Power and Performance
• Sophisticated ECC à Cost, Power
Processing in-Memory/Near Memory
• Most power consumption happens when data is moved from
memory to CPU then consumed by the CPU

http://www.aimemory.com.tw/index.php/technology/near-memory-computing/
Processing in-Memory
• Current von Neumann architecture spends more time moving
data than processing it
• Accelerators don’t help (enough) if using the same architecture
à Need new types of active memory that store data and can
process it
Summary
• Memory classification
• Need for memory hierarchies
• Memory capacity and organization
• SRAM
• DRAM
• Problems with new DRAM memories
– Rowhammer
• In memory/near memory computing

You might also like