0% found this document useful (0 votes)

98 views30 pages

CS3350B Computer Architecture Memory Hierarchy: Why?: Marc Moreno Maza

The document discusses the memory hierarchy in computers. It explains that the gap between CPU and DRAM speeds continues to grow, creating a "memory wall" problem. It also describes the principles of locality, where programs tend to access a small portion of addresses at a time. This locality property, combined with the higher cost and lower capacity of fast storage technologies, suggests organizing computer memory into a hierarchy with faster but smaller memories closer to the CPU and slower larger memories further away. This memory hierarchy approach aims to obtain the benefits of a large, cheap, and fast unified memory.

Uploaded by

AsHraf G. ElrawEi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views30 pages

CS3350B Computer Architecture Memory Hierarchy: Why?: Marc Moreno Maza

Uploaded by

AsHraf G. ElrawEi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

CS3350B Computer Architecture

Memory Hierarchy: Why?

Marc Moreno Maza

http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html
Department of Computer Science
University of Western Ontario, Canada

Thursday January 10, 2018

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January Science
10, 2018 University
1 / 27of W
Components of a Computer

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January Science
10, 2018 University
2 / 27of W
The “Memory Wall”

Processor vs DRAM speed disparity continues to grow

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January Science
10, 2018 University
3 / 27of W
Principles of Locality
A program is likely to access a relatively small portion of the address
space at any instant of time. This program has:
Temporal Locality (locality in time): If, whenever a memory location is
referenced, then this location is likely to be referenced again soon
Spatial Locality (locality in space): If, whenever a memory location is
referenced, then the locations with nearby addresses are likely to be
referenced soon.
What program structures lead to temporal and spatial locality in code?
In data?

Locality Example:
- Reference to array elements in succession sum = 0;
(stride-1 reference pattern): for (i=0; i<n; i++)
Spatial locality sum += a[i];
- Reference to sum each iteration: return sum;
Temporal locality

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January Science
10, 2018 University
4 / 27of W
Locality Exercise 1

Question: Does this function in C have good locality? If yes, which

type?

int sumarrayrows(int a[M][N]) {

int i, j, sum = 0;

for (i = 0; i < M; i++)

for (j = 0; j < N; j++)
sum += a[i][j];
return sum;
}

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January Science
10, 2018 University
5 / 27of W
Locality Exercise 2

Question: Does this function in C have good locality? If yes, which

type?

int sumarraycols(int a[M][N]) {

int i, j, sum = 0;

for (j = 0; j < N; j++)

for (i = 0; i < M; i++)
sum += a[i][j];
return sum;
}

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January Science
10, 2018 University
6 / 27of W
Locality Exercise 3

Question: Can you permute the loops so that the function scans the
3D array a[] with a stride-1 reference pattern (and thus has good
spatial locality)?

int sumarray3d(int a[M][N][N]) {

int i, j, k, sum = 0;

for (i = 0; i < N; i++)

for (j = 0; j < N; j++)
for (k = 0; k < M; k++)
sum += a[k][i][j];
return sum;
}

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January Science
10, 2018 University
7 / 27of W
Why Memory Hierarchies?

Some fundamental and enduring properties of hardware and software:

Fast storage technologies (SRAM) cost more per byte and have less
capacity
Gap between CPU and main memory (DRAM) speed is widening
Well-written programs tend to exhibit good locality
These fundamental properties complement each other beautifully.
They suggest an approach for organizing memory and storage systems
known as a memory hierarchy, to obtain the effect of a large, cheap,
fast memory.

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January Science
10, 2018 University
8 / 27of W
Characteristics of the Memory Hierarchy

CPU looks first for data in L1, then in L2, ..., then in main memory.

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January Science
10, 2018 University
9 / 27of W
Photo of a CPU: Nehalem Die

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
10 / 27of W
Core Area Breakdown

32KB I$ per core

32KB D$ per core
512KB L2$ per core
Share one 8-MB L3$

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
11 / 27of W
Cache Parameters of Two Processors

Intel Nehalem AMD Barcelona

L1 cache size & 32KB for each per core; 64KB for each per core;
organization 64B blocks; Split I$ and D$ 64B blocks; Split I$ and D$
L1 associativity 4-way (I), 8-way (D) set 2-way set assoc.; LRU
assoc.; ∼LRU replacement replacement
L1 write policy write-back, write-allocate write-back, write-allocate
L2 cache size & 256MB (0.25MB) per 512KB (0.5MB) per
organization core; 64B blocks; Unified core; 64B blocks; Unified
L2 associativity 8-way set assoc.; ∼LRU 16-way set assoc.; ∼LRU
L2 write policy write-back, write-allocate write-back, write-allocate
L3 cache size & 8192KB (8MB) shared by 2048KB (2MB) shared by
organization cores; 64B blocks; Unified by cores; 64B blocks; Unified
L3 associativity 16-way set assoc. 32-way set assoc.; evict block
shared by fewest cores
L3 write policy write-back, write-allocate write-back, write-allocate

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
12 / 27of W
Caches

Cache: A small-and-fast storage device that acts as a staging area for

subset of data in a larger-and-slower device
Fundamental idea of a memory hierarchy:
For each k, the fast-and-small device at level k serves as cache for the
larger-and-slower device at level k + 1.
Why do memory hierarchies work?
Programs tend to access (thus find) data at level k more often than
they access data at level k + 1
Thus, storage at level k + 1 can be slower, and thus larger and cheaper
per bit.
Net effect: Large pool of memory that costs as little as the cheap
storage near the bottom, but that serves data to programs at ≈ rate of
the fast storage near the top.

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
13 / 27of W
Caching in a Memory Hierarchy

Each level of memory is

partitioned into blocks of
consecutive byes and of
equal size (which depends
on the level)
The smaller, faster, more
expensive storage-device at
level k caches a subset of
the blocks from level k + 1
Data is copied between
levels in block-sized transfer
units.

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
14 / 27of W
General Caching Concepts

A Program needs an object d, which is

stored in some block b
Cache hit (at level k)
The Program finds b in the cache at level
k. e.g., block 14
Cache miss (at level k)
b is not at level k, so the level k cache
must fetch it from level k + 1. e.g., block 12
If the level k cache is full, then some
current block must be replaced (evicted).
Which one is the “victim”?
- Placement (mapping) policy: where can
the new block go? e.g., b mod 4
- Replacement policy: which block should
be evicted? e.g., LRU (least recently
used).

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
15 / 27of W
General Caching Concepts

Types of cache misses:

Cold (compulsory) miss at level k
- A cold miss occurs at level k for a block b when this block is missing
for the first time in the level k cache
Conflict miss at level k
- Most caches limit block positions at level k to a small subset
(sometimes a singleton) of the block positions at level k + 1
- e.g. block i at level k + 1 must be placed in block i mod 4 at level k
- Conflict misses occur at level k when multiple data items from level
k + 1 all map to the same block at level k
- e.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time, with
a i z→ i mod 4 mapping
Capacity miss at level k
- Occurs when the set of active blocks (that is, the data set with which
the Program is working) is larger than the cache at level k

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
16 / 27of W
More Caching Concepts

Hit Rate: the percentage of memory accesses found in a level of the

memory hierarchy
Hit Time: Time to access that level which consists of:
Time to determine hit/miss + Time to access the block.
Miss Rate: the percentage of memory accesses not found in a level
of the memory hierarchy, that is, 1 - (Hit Rate).
Miss Penalty: Time to replace a block in that level with the
corresponding block from a lower level which consists of:
Time to access the block in the lower level
+ Time to transmit that block to the level that experienced the miss
+ Time to insert the block in that level
+ Time to pass the block to the requester

Hit Time ≪ Miss Penalty

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
17 / 27of W
Examples of Cache Memories in the Hierarchy
Cache Type What Cached Where Cached Latency Managed By
(cycles)
Registers 4-byte word CPU registers 0.5 Compiler
TLB Address On-Chip TLB 0.5 Hardware
translations
L1 cache 32-byte block On-Chip L1 0.5 Hardware
L2 cache 32-byte block On/Off-Chip L2 10 Hardware
Virtual Memory 4-KB page Main memory 100 Hardware+
OS
Buffer cache Parts of files Main memory 100 OS
Network buffer Parts of files Local disk 10,000,000 AFS/NFS
cache client
Browser cache Web pages Local disk 10,000,000 Web browser
Web cache Web pages Remote server 1,000,000,000 Web proxy
disks server

The TLB (Translation lookaside buffer) stores the recent translations of virtual
memory to physical memory and can be called an address-translation cache
The Andrew File System (AFS) and Network File System (NFS) are distributed file
system protocols
A proxy server is a server (a computer system or an application) that acts as an
intermediary for requests from clients seeking resources from other servers.
Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html
CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
18 / 27of W
Claim

Being able to look at code and get a qualitative sense of its locality
properties is a key skill for professional programmer.
Examples of projects driven by data locality (and other features):
BLAS (Basic Linear Algebra Subprograms)
http://www.netlib.org/blas/
SPIRAL, Software/Hardware Generation for DSP Algorithms
http://www.spiral.net/
FFTW, by Matteo Frigo and Steven G, Johnson
http://www.fftw.org/
Cache-Oblivious Algorithms, by Matteo Frigo, Charles E. Leiserson,
Harald Prokop, and Sridhar Ramachandran, 1999
https://en.wikipedia.org/wiki/Cache-oblivious_algorithm
...

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
19 / 27of W
Memory Performance

Cache Miss Rate: number of cache misses/total number of cache

references (accesses)
Miss rate + hit rate = 1.0 (100%)
Miss Penalty: the
difference in access time of a given memory level, where a block is
missing, and the lower level, where the block is found.
Average Memory Access Time (AMAT) is the average time to
access memory considering both hits and misses
AMAT = Time for a Hit + Miss Rate * Miss Penalty
What is the AMAT for a processor with a 200 ps clock, a miss
penalty of 50 clock cycles, a miss rate of 0.02 misses per instruction
and a cache access time of 1 clock cycle?

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
20 / 27of W
Memory Performance

Cache Miss Rate: number of cache misses/total number of cache

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
20 / 27of W
Measuring Cache Performance - Effect on CPI

Assuming that the cache hit costs are included as part of the normal
CPU execution cycle, we have:

CPU time = IC × CPI × CC

= IC × (CPIideal + Average memory stall cycles) × CC
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
CPIstall

A simple definition for memory-stall cycles:

Memory-stall cycles = #accesses/program × miss rate × miss penalty
This ignores extra costs of write misses.

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
21 / 27of W
Impacts of Cache Performance

Relative cache miss penalty increases as processor performance

improves (faster clock rate and/or lower CPI). Indeed:
Memory speed unlikely to improve as fast as processor cycle time.
When calculating CPIstall , the cache miss penalty is measured in
processor clock cycles needed to handle a miss.
The lower is CPIideal , the more pronounced is the impact of stalls
Processor with a CPIideal of 2, a 100 cycle miss penalty, 36%
load/store instr’s, and 2% instruction-cache miss rate, and 4%
data-cache miss rate.
Memory-stall cycles = 1 × 2% × 100 + 36% × 4% × 100 = 3.44
So CPIstall = 2 + 3.44 = 5.44
More than twice the CPIideal !
What if the CPIideal is reduced to 1?
What if the data cache miss rate went up by 1%?

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
22 / 27of W
Impacts of Cache Performance

Relative cache miss penalty increases as processor performance

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
22 / 27of W
Multiple Cache Levels

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
23 / 27of W
Multiple Cache Levels

With advancing technology, the CPU has more room on die for bigger
L1 caches and for second level cache - normally a unified L2 cache
(i.e., it holds both instructions and data,) and in some cases even a
unified L3 cache.
New AMAT Calculation:
AMAT = L1 Hit Time + L1 Miss Rate ∗ L1 Miss Penalty,
L1 Miss Penalty = L2 Hit Time + L2 Miss Rate ∗ L2 Miss Penalty

and so forth (final miss penalty is Main Memory access time)

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
24 / 27of W
New AMAT Example

1 cycle L1 hit time, 2% L1 miss rate,

5 cycle L2 hit time, 5% L2 miss rate.
100 cycle main memory access time
Without L2 cache:

With L2 cache:

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
25 / 27of W
New AMAT Example

1 cycle L1 hit time, 2% L1 miss rate,

5 cycle L2 hit time, 5% L2 miss rate.
100 cycle main memory access time
Without L2 cache:
AMAT = 1 + .02*100 = 3
With L2 cache:
AMAT = 1 + .02*(5 + .05*100) = 1.2

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
25 / 27of W
Summary

Wanted: effect of a large, cheap, fast memory

Approach: Memory Hierarchy
Successively lower levels contain “most used” data from next higher
level
Exploits temporal & spatial locality of programs
Do the common case fast, worry less about the exceptions (RISC
design principle)
Challenges to programmer:
Develop cache friendly (efficient) programs
From Wikipedia: Reduced instruction set computing, or RISC, is a CPU design strategy
based on the insight that a simplified instruction set provides higher performance when
combined with a microprocessor architecture capable of executing those instructions using
fewer microprocessor cycles per instruction.[1] A computer based on this strategy is a
reduced instruction set computer, also called RISC.

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
26 / 27of W
Layout of C Arrays in Memory (hints for the exercises)

C arrays allocated in row-major order

Each row in contiguous memory locations
Stepping through columns in one row:
for (i = 0; i < N; i++)
sum += a[0][i];
Accesses successive elements of size k bytes
If block size (B) > k bytes, exploit spatial locality
Compulsory miss rate = k bytes / B.
Typically k = 8 and B = 8 k or B = 16 k.
Stepping through rows in one column:
for (i = 0; i < n; i++)
sum += a[i][0];
Accesses distant elements
No spatial locality!
Compulsory miss rate = 1 (i.e. 100%)

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CS3350B Computer Architecture MemoryDepartment
Hierarchy:
Thursday
Why?
of Computer
January 10,
Science
2018 University
27 / 27of W

Addressing Modes of 8086
100% (1)
Addressing Modes of 8086
18 pages
CS3350B Computer Architecture Memory Hierarchy: How?: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture Memory Hierarchy: How?: Marc Moreno Maza
33 pages
Systems I: Locality and Caching
No ratings yet
Systems I: Locality and Caching
18 pages
Computer Architecture and Organization: Lecture12: Locality and Caching
No ratings yet
Computer Architecture and Organization: Lecture12: Locality and Caching
17 pages
Week 10
No ratings yet
Week 10
59 pages
CS 3853 Computer Architecture - Memory Hierarchy
No ratings yet
CS 3853 Computer Architecture - Memory Hierarchy
37 pages
Memory Hierarchy
100% (1)
Memory Hierarchy
47 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Lec 6
No ratings yet
Lec 6
9 pages
CAO - Unit 4
No ratings yet
CAO - Unit 4
57 pages
Unit 4 MMemory Hierarchy
No ratings yet
Unit 4 MMemory Hierarchy
14 pages
Memory Hierarchy: Memory Hierarchy Design in A Computer System Mainly
No ratings yet
Memory Hierarchy: Memory Hierarchy Design in A Computer System Mainly
16 pages
Storage Hierarchy Storage Hierarchy
No ratings yet
Storage Hierarchy Storage Hierarchy
14 pages
Memory
No ratings yet
Memory
125 pages
Aca Lec3
No ratings yet
Aca Lec3
39 pages
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
No ratings yet
Ddca 2024 Lecture24 Memory Hierarchy and Caches Beforelecture
304 pages
Aaron 1
No ratings yet
Aaron 1
3 pages
Lecture Slides 07 073-Caches-Hierar
No ratings yet
Lecture Slides 07 073-Caches-Hierar
7 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
86 pages
04 Cache Memory
No ratings yet
04 Cache Memory
80 pages
CS 61C: Great Ideas in Computer Architecture: Lecture 12 - Memory Hierarchy/Direct-Mapped Caches
No ratings yet
CS 61C: Great Ideas in Computer Architecture: Lecture 12 - Memory Hierarchy/Direct-Mapped Caches
27 pages
CompArch 18a Cache-1
No ratings yet
CompArch 18a Cache-1
14 pages
Memory Hierarchy 1
No ratings yet
Memory Hierarchy 1
44 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Computer Architecture Unit 2
No ratings yet
Computer Architecture Unit 2
32 pages
L-4 (Cache Memory)
No ratings yet
L-4 (Cache Memory)
61 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
112 pages
Cache
No ratings yet
Cache
35 pages
ACA Unit 2
No ratings yet
ACA Unit 2
45 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
30 pages
Memory Design
No ratings yet
Memory Design
36 pages
Final Chapter-5
No ratings yet
Final Chapter-5
9 pages
EL3011 - 13 Memory Hierarchy
No ratings yet
EL3011 - 13 Memory Hierarchy
56 pages
08 Caches
No ratings yet
08 Caches
78 pages
Lecture 3 (Memory Hierarchy and Caches)
No ratings yet
Lecture 3 (Memory Hierarchy and Caches)
88 pages
Memory Graduate-Sp22
No ratings yet
Memory Graduate-Sp22
91 pages
Chapter 3 P1
No ratings yet
Chapter 3 P1
57 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
28 pages
Cache Memory A
No ratings yet
Cache Memory A
62 pages
COA ch3
No ratings yet
COA ch3
39 pages
Lecture 13 16 Post
No ratings yet
Lecture 13 16 Post
24 pages
Compute System Architecture (CSA - APIIT)
No ratings yet
Compute System Architecture (CSA - APIIT)
42 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
5 pages
Memory Hirecracy
No ratings yet
Memory Hirecracy
3 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
UNIT II - Multi Core Architecture
No ratings yet
UNIT II - Multi Core Architecture
102 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
43 pages
SEN307 Lecture 10
No ratings yet
SEN307 Lecture 10
41 pages
Cache
No ratings yet
Cache
36 pages
Lecture 03
No ratings yet
Lecture 03
37 pages
Chache Memory, Internal Memory and External Memory
No ratings yet
Chache Memory, Internal Memory and External Memory
113 pages
Computer Architecture: Memory Hierarchy Design
No ratings yet
Computer Architecture: Memory Hierarchy Design
60 pages
Memory HIerarchy
No ratings yet
Memory HIerarchy
53 pages
Chapter 3 Large and Fast
No ratings yet
Chapter 3 Large and Fast
86 pages
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
No ratings yet
Lecture 10: Memory System - Memory Technology: CSE 564 Computer Architecture Summer 2017
44 pages
Memory Hierarchy: - Memory Flavors - Principle of Locality - Program Traces - Memory Hierarchies - Associativity
No ratings yet
Memory Hierarchy: - Memory Flavors - Principle of Locality - Program Traces - Memory Hierarchies - Associativity
26 pages
09 MemSubsystem PDF
No ratings yet
09 MemSubsystem PDF
100 pages
Lecture-17 CH-05 1
No ratings yet
Lecture-17 CH-05 1
21 pages
Stud-CSA Memory Mod2-Part2 (Autosaved) (Autosaved)
No ratings yet
Stud-CSA Memory Mod2-Part2 (Autosaved) (Autosaved)
48 pages
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
28 pages
An Overview of General Purpose Graphics Processing Units: Marc Moreno Maza
No ratings yet
An Overview of General Purpose Graphics Processing Units: Marc Moreno Maza
18 pages
CS3350B Computer Architecture MIPS Introduction: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture MIPS Introduction: Marc Moreno Maza
24 pages
CS3350B Computer Architecture: Marc Moreno Maza
100% (1)
CS3350B Computer Architecture: Marc Moreno Maza
45 pages
L7 Multicore 2
No ratings yet
L7 Multicore 2
22 pages
CS3350B Computer Architecture: Lecture 6.3: Instructional Level Parallelism: Advanced Techniques
No ratings yet
CS3350B Computer Architecture: Lecture 6.3: Instructional Level Parallelism: Advanced Techniques
24 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
CS3350B Computer Architecture: Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions
No ratings yet
CS3350B Computer Architecture: Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions
31 pages
Section 9
No ratings yet
Section 9
60 pages
Nano PDF
No ratings yet
Nano PDF
1 page
AlgNotes PDF
No ratings yet
AlgNotes PDF
106 pages
مكتبة نور - مميز بالاصفر PDF
No ratings yet
مكتبة نور - مميز بالاصفر PDF
229 pages
EnglishTo26 09
No ratings yet
EnglishTo26 09
4 pages
Microprocessor Experiment 1
No ratings yet
Microprocessor Experiment 1
4 pages
Ics233finalsol 072
No ratings yet
Ics233finalsol 072
16 pages
Very Impb Prog PDF
No ratings yet
Very Impb Prog PDF
284 pages
MIPS Superscalar Simulator
No ratings yet
MIPS Superscalar Simulator
5 pages
8086 Microprocessor
No ratings yet
8086 Microprocessor
25 pages
8048 MC
No ratings yet
8048 MC
4 pages
T5L DGUSII Application Development Guide V2.91
No ratings yet
T5L DGUSII Application Development Guide V2.91
238 pages
GEC GANDHINAGAR IC - EVEN 2025 - Lab - Manual - Microprocessor and Interfacing-3141710
No ratings yet
GEC GANDHINAGAR IC - EVEN 2025 - Lab - Manual - Microprocessor and Interfacing-3141710
87 pages
Super Scalar Architecture With Dynamic Branch Prediction
No ratings yet
Super Scalar Architecture With Dynamic Branch Prediction
5 pages
Computer Science & Engineering Information Theory English Medium
No ratings yet
Computer Science & Engineering Information Theory English Medium
190 pages
9691-CIE-Answers (3.3) - Comp Architecture and Fetch Execute Cycle
No ratings yet
9691-CIE-Answers (3.3) - Comp Architecture and Fetch Execute Cycle
9 pages
CA Chap4 CPU NLT2020
No ratings yet
CA Chap4 CPU NLT2020
82 pages
Use A Microprocessor, A DSP, or Both
100% (2)
Use A Microprocessor, A DSP, or Both
11 pages
NIOS II Processor
No ratings yet
NIOS II Processor
28 pages
Lect 03 - MIT Addressing Modes PDF
No ratings yet
Lect 03 - MIT Addressing Modes PDF
88 pages
CSE 2301 Microprocessor and Interfacing
No ratings yet
CSE 2301 Microprocessor and Interfacing
10 pages
Final Exam OS
No ratings yet
Final Exam OS
86 pages
1 8085 Microprocessor Architecture
No ratings yet
1 8085 Microprocessor Architecture
7 pages
7 - M.SC Cyber Security Syllabus
No ratings yet
7 - M.SC Cyber Security Syllabus
58 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
17 pages
CA Chap2 Isa Nlt2020
No ratings yet
CA Chap2 Isa Nlt2020
58 pages
MC-Part-A - 4 Units - Tamil
No ratings yet
MC-Part-A - 4 Units - Tamil
10 pages
F SQP Ip 2023-24
0% (1)
F SQP Ip 2023-24
8 pages
CH 2 CC Notes
No ratings yet
CH 2 CC Notes
10 pages
5 CPE 413 String Processing and Macros
No ratings yet
5 CPE 413 String Processing and Macros
52 pages
The Traffic Light Controller
No ratings yet
The Traffic Light Controller
12 pages
1157 CS F342 20240213111735 Comprehensive Exam Question Paper
No ratings yet
1157 CS F342 20240213111735 Comprehensive Exam Question Paper
2 pages
Basic Electronics and Communication Engineering (21ELN14/21ELN24) - Embedded Systems (Module 3)
100% (2)
Basic Electronics and Communication Engineering (21ELN14/21ELN24) - Embedded Systems (Module 3)
130 pages

CS3350B Computer Architecture Memory Hierarchy: Why?: Marc Moreno Maza

Uploaded by

CS3350B Computer Architecture Memory Hierarchy: Why?: Marc Moreno Maza

Uploaded by

CS3350B Computer Architecture

Memory Hierarchy: Why?

Marc Moreno Maza

Thursday January 10, 2018

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Processor vs DRAM speed disparity continues to grow

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Question: Does this function in C have good locality? If yes, which

int sumarrayrows(int a[M][N]) {

for (i = 0; i < M; i++)

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Question: Does this function in C have good locality? If yes, which

int sumarraycols(int a[M][N]) {

for (j = 0; j < N; j++)

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

int sumarray3d(int a[M][N][N]) {

for (i = 0; i < N; i++)

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Some fundamental and enduring properties of hardware and software:

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

32KB I$ per core

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Intel Nehalem AMD Barcelona

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Cache: A small-and-fast storage device that acts as a staging area for

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Each level of memory is

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

A Program needs an object d, which is

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Types of cache misses:

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Hit Rate: the percentage of memory accesses found in a level of the

Hit Time ≪ Miss Penalty

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Cache Miss Rate: number of cache misses/total number of cache

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Cache Miss Rate: number of cache misses/total number of cache

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

CPU time = IC × CPI × CC

A simple definition for memory-stall cycles:

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Relative cache miss penalty increases as processor performance

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Relative cache miss penalty increases as processor performance

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

and so forth (final miss penalty is Main Memory access time)

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

1 cycle L1 hit time, 2% L1 miss rate,

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

1 cycle L1 hit time, 2% L1 miss rate,

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

Wanted: effect of a large, cheap, fast memory

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

C arrays allocated in row-major order

Marc Moreno Maza (http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html

You might also like