0% found this document useful (0 votes)

30 views32 pages

Computer Organization and Architecture Chapter 7 Large and Fast Exploiting

Uploaded by

tazusamamiya2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views32 pages

Computer Organization and Architecture Chapter 7 Large and Fast Exploiting

Uploaded by

tazusamamiya2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 32

Computer Organization and Architecture

Chapter 7
Large and Fast: Exploiting Memory
Hierarchy
Yu-Lun Kuo
Computer Sciences and Information Engineering
University of Tunghai, Taiwan

[email protected]
Major Components of a Computer

Processor Devices

Control Input
Memory

Datapath Output

2
Processor-Memory Performance Gap
µProc
10000 55%/year

“Moore’s Law” (2X/1.5yr)

1000
Performance

100 Processor-Memory
Performance Gap
10 (grows 50%/year)
DRAM
1
7%/year
(2X/10yrs)

Year 3
Introduction

• The Principle of Locality

– Program access a relatively small portion of the
address space at any instant of time.
• Two Different Types of Locality
– Temporal Locality (Locality in Time)
» If an item is referenced, it will tend to be referenced again
soon
• e.g., loop, subrouting, stack, variable of counting
– Spatial Locality (Locality in Space)
» If an item is referenced, items whose addresses are close
by tend to be referenced soon
• e.g., array access, accessed sequentially

4
Memory Hierarchy

• Memory Hierarchy
– A structure that uses multiple levels of memories;
as the distance form the CPU increase, the size of
the memories and the access time both increase
– Locality + smaller HW is faster = memory hierarchy

• Levels
– each smaller, faster, more expensive/byte than
level below
• Inclusive
– data found in top also found in the bottom

5
Three Primary Technologies

• Building Memory Hierarchies

– Main Memory
» DRAM (Dynamic random access memory)
– Caches (closer to the processor)
» SRAM (static random access memory)

• DRAM vs. SRAM

– Speed : DRAM < SRAM
– Cost: DRAM < SRAM

6
Introduction

• Cache memory
– Made by SRAM (Static RAM)
– Small amount of fast and high speed memory
– Sits between normal main memory and CPU
– May be located on CPU chip or module

7
Introduction

• Cache memory

8
A Typical Memory Hierarchy c.2008

Split instruction & data Multiple interleaved

primary caches memory banks
(on-chip SRAM) (off-chip DRAM)
L1
Instruction Memory
CPU Cache
Unified L2 Memory
Cache Memory
L1 Data
RF Memory
Cache

Multiported Large unified secondary cache

9
A Typical Memory Hierarchy
 By taking advantage of the principle of locality
 Can present the user with as much memory as is available in the
cheapest technology
 at the speed offered by the fastest technology

On-Chip Components
Control eDRAM
Cache Cache

Second Secondary
Instr
ITLB DTLB

Main
Level Memory
Datapath
RegFile

Memory
(Disk)
Data

Cache
(DRAM)
(SRAM)

Speed (%cycles): ½’s 1’s 10’s 100’s 1,000’s

Size (bytes): 100’s K’s 10K’s M’s G’s to T’s
Cost: highest lowest
Characteristics of Memory Hierarchy

Processor
4-8 bytes (word)
Inclusive– what
is in L1$ is a
Increasing L1$ subset of what is
distance from in L2$ is a
8-32 bytes
the processor subset of what is
in access L2$(block)
in MM that is a
time subset of is in
1 to 4 blocks
SM
Main Memory

1,024+ bytes (disk sector = page)

Secondary Memory

(Relative) size of the memory at each level

11
Memory Hierarchy List

• Registers
• L1 Cache
• L2 Cache
• L3 cache
• Main memory
• Disk cache
• Disk (RAID)
• Optical (DVD)
• Tape
12
Why IC and DC need?

13
The Memory Hierarchy: Terminology

• Hit: data is in some block in the upper level

(Blk X)
– Hit Rate: the fraction of memory accesses found
in the upper level
– Hit Time: Time to access the upper level which
consists of
RAM access time + Time to determine hit/miss

Lower Level
To Processor Upper Level Memory
Memory
Blk X
From Processor Blk Y

14
The Memory Hierarchy: Terminology

• Miss: data is not in the upper level so

needs to be retrieve from a block in
the lower level (Blk Y)
– Miss Rate = 1 - (Hit Rate)
– Miss Penalty
» Time to replace a block in the upper level + Time to
deliver the block the processor

Lower Level
To Processor Upper Level Memory
Memory
Blk X
From Processor Blk Y

» Hit Time << Miss Penalty

15
How is the Hierarchy Managed?

• registers  memory
– by compiler (programmer?)
• cache  main memory
– by the cache controller hardware
• main memory  disks
– by the operating system (virtual memory)
– virtual to physical address mapping assisted by
the hardware (TLB)
– by the programmer (files)

16
7.2 The basics of Caches

• Simple cache
– The processor requests are each one word
– The block size is one word of data

• Two questions to answer (in hardware):

– Q1: How do we know if a data item is in the
cache?
– Q2: If it is, how do we find it?

17
Caches

• Direct Mapped
– Assign the cache location based on the address of
the word in memory

– Address mapping:
(block address) modulo (# of blocks in the cache)

» First consider block sizes of one word

18
Direct Mapped (Mapping) Cache

19
Caches

• Tag
– Contain the address information required to
identify whether a word in the cache corresponds
to the requested word
• Valid bit
– After executing many instructions, some of the
cache entries may still be empty
– Indicate whether an entry contains a valid address
» If valid bit = 0, there cannot be a match for this block

20
Direct Mapped Cache
• Consider the main memory word reference string
Start with an empty cache - all
blocks initially marked as not
0 1 2 3 4 3 4 15
valid
0 miss 1 miss 2 miss 3 miss
00 Mem(0) 00 Mem(0)
00 Mem(0) 00 Mem(0)
00 Mem(1) 00 Mem(1) 00 Mem(1)
00 Mem(2) 00 Mem(2)
00 Mem(3)

4 miss 3 hit 4 hit 15 miss

01 4
00 Mem(0) 01 Mem(4) 01 Mem(4) 01 Mem(4)

00 Mem(1) 00 Mem(1) 00 Mem(1) 00 Mem(1)

00 Mem(2) 00 Mem(2) 00 Mem(2) 00 Mem(2)

00 Mem(3) 00 Mem(3) 00 Mem(3) 11 00 Mem(3) 15

 8 requests, 6 misses
23
Hits vs. Misses
• Read hits
– this is what we want!
• Read misses
– stall the CPU, fetch block from memory, deliver to
cache, restart
• Write hits
– can replace data in cache and memory (write-through)
– write the data only into the cache (write-back the
cache later)
• Write misses
– read the entire block into the cache, then write the
word

28
What happens on a write?

• Write work somewhat differently

– Suppose on a store instruction
» Write the data into only the data cache
» Memory would have different value
• The cache & memory are “inconsistent”
– Keep the main memory & cache
» Always write the data into both the memory and the
cache
» Called write-through ( 直接寫入 )

29
What happens on a write?

• Although this design handles writes simple

– Not provide very good performance
» Every write causes the data to be written to main memory
» Take a long time
» Ex. 10% of the instructions are stores
CPI without cache miss: 1.0
spending 100 extra cycles on every write

CPI = 1.0 + 100 x 10% = 11

reducing performance

30
Write Buffer for Write Through

• A Write Buffer is needed between the Cache

and Memory (TLB: Translation Lookaside
Buffer 轉譯旁觀緩衝區 )
– A queue that holds data while the data are waiting
to be written to memory
– Processor:
» writes data into the cache and the write buffer
– Memory controller:
» write contents of the buffer to memory

Cache
Processor DRAM

Write Buffer
31
What happens on a write?

• Write back ( 間接寫入 )

– New value only written only to the block in the
cache
– The modified block is written to the lower level of
the hierarchy when it is replaced

32
What happens on a write?
• Write Through
– All writes go to main memory as well as cache
– Multiple CPUs can monitor main memory traffic to
keep local (to CPU) cache up to date
– Lots of traffic
– Slows down writes
• Write Back
– Updates initially made in cache only
– Update bit for cache slot is set when update occurs
– If block is to be replaced, write to main memory only if
update bit is set
– Other caches get out of sync

33
Memory System to Support Caches

• It is difficult to reduce the latency to fetch the

first word from memory
– We can reduce the miss penalty if increase the
bandwidth from the memory to the cache
CPU CPU
CPU

Multiplexor
Cache
Cache
Cache
bus
bus
bus
Memory Memory Memory Memory Memory
bank 0 bank 1 bank 2 bank 3
Memory
35
One-word-wide memory organization
 Assume
1. A cache block for 4 words
2. 1 memory bus clock cycle to send the address

CPU 3. 15 clock cycles for DRAM access initiated

4. 1 memory bus clock cycle to return a word of data

Cache

– Miss penalty: 1+ 4x15 + 4x1 = 65 clock cycles

bus
– Number of bytes transferred per bus clock cycle
for a single miss
Memory » 4 x 4 / 65 = 0.25

36
Wide memory organization
 Assume
1. A cache block for 4 words
2. 1 memory bus clock cycle to send the address
3. 15 clock cycles for DRAM access initiated
CPU
4. 1 memory bus clock cycle to return a word of data
– Two word wide
Multiplexor
» 1 + 2 x 15 + 2 x 1 = 33 clock cycles
» 4 x 4 / 33 = 0.48
Cache
– Four word wide
» 1 + 1 x 15 + 1 x 1 = 17 clock cycles bus
» 4 x 4 / 17 = 0.94

Memory
37
Interleaved memory organization
 Assume
1. A cache block for 4 words
2. 1 memory bus clock cycle to send the address
3. 15 clock cycles for DRAM access initiated
4. 1 memory bus clock cycle to return a word of data
5. Each memory bank: 1 word wide CPU

1. Advance: One latency time

Cache
– 1 + 1 x 15 + 4 x 1 = 20 clock cycle
– 4 x 4 / 20 = 0.8 byte/clock bus

» 3 times for one-word-wide

Memory Memory Memory Memory
bank 0 bank 1 bank 2 38bank 3
•Q & A

Yan Solihin - Fundamentals of Parallel Computer Architecture
100% (2)
Yan Solihin - Fundamentals of Parallel Computer Architecture
547 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
Week 12 - Lecture 12 - Memory
No ratings yet
Week 12 - Lecture 12 - Memory
27 pages
02b_Cache
No ratings yet
02b_Cache
48 pages
Cache Memory: CS2100 - Computer Organization
No ratings yet
Cache Memory: CS2100 - Computer Organization
45 pages
Unit3 coa
No ratings yet
Unit3 coa
30 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
Lec13 Memory 1 Notes
No ratings yet
Lec13 Memory 1 Notes
27 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
Lecture 13 16 Post
No ratings yet
Lecture 13 16 Post
24 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
Chapter 3 P1
No ratings yet
Chapter 3 P1
57 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Memory Design
No ratings yet
Memory Design
36 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
ACA Unit 2
No ratings yet
ACA Unit 2
45 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Memory 2
No ratings yet
Memory 2
31 pages
Memory Interface & Controller Lecture 3
No ratings yet
Memory Interface & Controller Lecture 3
77 pages
Unit 5 Memory System
No ratings yet
Unit 5 Memory System
77 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
Chapter 5 - Memory
No ratings yet
Chapter 5 - Memory
44 pages
Chapter 2z Ppt
No ratings yet
Chapter 2z Ppt
54 pages
Final Chapter-5
No ratings yet
Final Chapter-5
9 pages
04 Cache Memory Comparc
No ratings yet
04 Cache Memory Comparc
47 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
1559460031_Chap 4 Cache Memory
No ratings yet
1559460031_Chap 4 Cache Memory
55 pages
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
No ratings yet
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
44 pages
Chapter 3 Large and Fast
No ratings yet
Chapter 3 Large and Fast
86 pages
Chapter5-The Memory System
No ratings yet
Chapter5-The Memory System
36 pages
Chapter 03
No ratings yet
Chapter 03
57 pages
CO & OS Unit-3 (Only Imp Concepts)
No ratings yet
CO & OS Unit-3 (Only Imp Concepts)
26 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
28 pages
Lecture 3 (Memory Hierarchy and Caches)
No ratings yet
Lecture 3 (Memory Hierarchy and Caches)
88 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
Chapter 6 Cache Memory
No ratings yet
Chapter 6 Cache Memory
22 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
13_Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13_Large and Fast Exploiting Memory Hierarchy Final
118 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Memory
No ratings yet
Memory
125 pages
2.1 Memory System(S.K)
No ratings yet
2.1 Memory System(S.K)
30 pages
10
No ratings yet
10
15 pages
week10
No ratings yet
week10
59 pages
Mekelle Institute of Technology: PC Hardware Troubleshooting (CSE501) Lecture - 4
No ratings yet
Mekelle Institute of Technology: PC Hardware Troubleshooting (CSE501) Lecture - 4
63 pages
Unit 5
No ratings yet
Unit 5
40 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
help2
No ratings yet
help2
102 pages
Memory Sub-System: CT101 - Computing Systems
No ratings yet
Memory Sub-System: CT101 - Computing Systems
46 pages
Chapter 7
No ratings yet
Chapter 7
43 pages
Memory Organization: Dr. Bernard Chen PH.D
No ratings yet
Memory Organization: Dr. Bernard Chen PH.D
77 pages
Cache Memory
No ratings yet
Cache Memory
89 pages
CS 211: Computer Architecture Cache Memory Design
No ratings yet
CS 211: Computer Architecture Cache Memory Design
32 pages
Input Output Organization(2.3)
No ratings yet
Input Output Organization(2.3)
151 pages
Large and Fast: Exploiting Memory Hierarchy: Omputer Rganization and Esign
No ratings yet
Large and Fast: Exploiting Memory Hierarchy: Omputer Rganization and Esign
87 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Computer's components
From Everand
Computer's components
Jose Israel Jirón Méndez
No ratings yet
How Caching Works: Computer
No ratings yet
How Caching Works: Computer
5 pages
Unit Iv Advanced Microprocessor Notes PDF
No ratings yet
Unit Iv Advanced Microprocessor Notes PDF
73 pages
Cache Memory: Computer Organization and Architecture Characteristics of Memory Systems
No ratings yet
Cache Memory: Computer Organization and Architecture Characteristics of Memory Systems
16 pages
COSS Mid Sem Session 1 - 8 PDF
100% (1)
COSS Mid Sem Session 1 - 8 PDF
412 pages
Cache Memory
No ratings yet
Cache Memory
11 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
No ratings yet
Fundamentals of Quantitative Design and Analysis: A Quantitative Approach, Fifth Edition
77 pages
MCQs On Cache Memories
No ratings yet
MCQs On Cache Memories
5 pages
207 Assignment 6
No ratings yet
207 Assignment 6
7 pages
CSC204 - Chapter 3.2 OS Performance Issue (Memory Management) - New
No ratings yet
CSC204 - Chapter 3.2 OS Performance Issue (Memory Management) - New
40 pages
Computer Science Textbook Solutions - 30
No ratings yet
Computer Science Textbook Solutions - 30
31 pages
Coa Unit Test QP 1
0% (1)
Coa Unit Test QP 1
7 pages
Thi Kien Truc May Tinh Va Hop Ngu
No ratings yet
Thi Kien Truc May Tinh Va Hop Ngu
15 pages
MSC Thesis
No ratings yet
MSC Thesis
68 pages
Test 6 PracticeQuestion Cachememory 1
No ratings yet
Test 6 PracticeQuestion Cachememory 1
21 pages
Temporal and Spatial Locality
No ratings yet
Temporal and Spatial Locality
24 pages
2017 CS8493 Operating System Dr. D. Loganathan
No ratings yet
2017 CS8493 Operating System Dr. D. Loganathan
37 pages
Unit 4 and Unit-5 - Memory
No ratings yet
Unit 4 and Unit-5 - Memory
104 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
Memory in Computer Architecture
No ratings yet
Memory in Computer Architecture
24 pages
Cse2009 Coa M2
No ratings yet
Cse2009 Coa M2
59 pages
Memory Hierarchy Main Memory Auxiliary Memory Associative Memory Cache Memory Virtual Memory
No ratings yet
Memory Hierarchy Main Memory Auxiliary Memory Associative Memory Cache Memory Virtual Memory
22 pages
Unix and Internet Fundamentals
No ratings yet
Unix and Internet Fundamentals
28 pages
Coa Unit 4
No ratings yet
Coa Unit 4
90 pages
Dlco Unit 5 PDF
No ratings yet
Dlco Unit 5 PDF
24 pages
Fundamentals of Ultra-Low Voltage Embedded Memory Design: Eric Karl
No ratings yet
Fundamentals of Ultra-Low Voltage Embedded Memory Design: Eric Karl
66 pages
DDCA - CO-3 & 4 - Terminal Questions
No ratings yet
DDCA - CO-3 & 4 - Terminal Questions
18 pages

Computer Organization and Architecture Chapter 7 Large and Fast Exploiting

Uploaded by

Computer Organization and Architecture Chapter 7 Large and Fast Exploiting

Uploaded by

Computer Organization and Architecture

“Moore’s Law” (2X/1.5yr)

• The Principle of Locality

• Building Memory Hierarchies

• DRAM vs. SRAM

Split instruction & data Multiple interleaved

Multiported Large unified secondary cache

Speed (%cycles): ½’s 1’s 10’s 100’s 1,000’s

1,024+ bytes (disk sector = page)

(Relative) size of the memory at each level

• Hit: data is in some block in the upper level

• Miss: data is not in the upper level so

» Hit Time << Miss Penalty

• Two questions to answer (in hardware):

» First consider block sizes of one word

4 miss 3 hit 4 hit 15 miss

00 Mem(1) 00 Mem(1) 00 Mem(1) 00 Mem(1)

00 Mem(2) 00 Mem(2) 00 Mem(2) 00 Mem(2)

00 Mem(3) 00 Mem(3) 00 Mem(3) 11 00 Mem(3) 15

• Write work somewhat differently

• Although this design handles writes simple

CPI = 1.0 + 100 x 10% = 11

• A Write Buffer is needed between the Cache

• Write back ( 間接寫入 )

• It is difficult to reduce the latency to fetch the

CPU 3. 15 clock cycles for DRAM access initiated

– Miss penalty: 1+ 4x15 + 4x1 = 65 clock cycles

1. Advance: One latency time

» 3 times for one-word-wide

You might also like