0% found this document useful (0 votes)
6 views7 pages

HW5 Sol

The document outlines a homework assignment for an advanced computer architecture course, focusing on cache design and performance analysis. It includes detailed calculations for cache parameters, comparisons of cache replacement algorithms (LRU, FIFO), and the impact of virtual versus physical address usage in cache lookups. Additionally, it provides clock cycle analysis for memory accesses under different scenarios, demonstrating the efficiency of using virtual addresses in certain cache configurations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

HW5 Sol

The document outlines a homework assignment for an advanced computer architecture course, focusing on cache design and performance analysis. It includes detailed calculations for cache parameters, comparisons of cache replacement algorithms (LRU, FIFO), and the impact of virtual versus physical address usage in cache lookups. Additionally, it provides clock cycle analysis for memory accesses under different scenarios, demonstrating the efficiency of using virtual addresses in certain cache configurations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Homework 5

(EE275: Advanced Computer Architecture)

Due Date: 11/3

Answer 1: (20 points)

The CPU has 64KB data cache for 2-way associative with each block being 64 bytes and a 40-bit
address.

a) (10 pts)
Offset = 64 bytes block (2 ^ 6 = 64)
Therefore, Offset = 6 bits
Index = 64kB /(2-way x 64) = 9 bits
tag = 40 - 9 – 6 = 25 bits

< ---------------------------------------------40 bits address----------------------------------------------------

b) (10 pts)
Tag array size
#-way associativity x # index bit combinations x tag bits = 2 x 29 x 25 = 25KB

Answer 2: (20 points)

a) (10 pts)
2- way set associative cache 4 set

LRU- method emulation


2-hits with LRU method

b) (10 pts)
FIFO Method
3- hits with FIFO method

FIFO method is better with 3-hits on cache

Answer 3: (24 points)

a) (3 pts)
Read hit:
LRU: 4 accesses for arrays of data/tag/miscellaneous components => 4*(20+5+1) =
104 power units.
FIFO and Random: 4 accesses for arrays data/tag components => 4*(20+5) = 100
power units.

b) (3 pts)
Read miss:
LRU: 4 accesses for arrays of data/tag/miscellaneous components => 4*(20+5+1) =
104 power units.
FIFO: 4 accesses for arrays data/tag components + one access to FIFO Pointer => 4*(20+5) +
1 = 101 power units.
Random: 4 accesses for arrays data/tag components => 4*(20+5) = 100 power units.

c) (3 pts)
Read hit (split access):
LRU: 4 accesses for arrays of tag/miscellaneous components plus one access to hit data array
=> 4*(5+1) + 20 = 44 power units.
FIFO, Random: 4 accesses for arrays of tag components plus one access to hit data array =>
4*(5) +20 = 40 power units.

d) (3 pts)
Read miss, split access (cost of line-fill ignored):
LRU: 4 accesses for arrays of tag/miscellaneous components => 4*(5+1) = 24
power units.
FIFO, Random: 4 accesses for arrays of tag components => 4*(5) = 20 power units.

e) (3 pts)
Read hit, split access with way prediction hit:
LRU: one access to arrays of tag/miscellaneous components plus one access to data array =>
(5+1) + 20 = 26 power units.
FIFO, Random: one access to arrays of tag components plus one access to data Array => 5+20
= 25 power units.

f) (3 pts)
Read hit, split access with way prediction miss:
LRU: one access to arrays of tag/miscellaneous components, plus 4 accesses of
tag/miscellaneous components, plus one access to data array => (5+1) + 4*(5+1) + 20 = 50
power units.
FIFO, Random: access to arrays of tag components, plus 4 accesses of tag components, plus one access to
data array => 5 + 4*(5) + 20 = 45 power units.

g) (3 pts)
Read miss, split access with way prediction miss (cost of line-fill ignored):
LRU: one access to arrays of tag/miscellaneous components, plus 4 accesses of
tag/miscellaneous components => (5+1) + 4*(5+1) = 30 power units.
FIFO: one access to arrays of tag components, plus 4 accesses to arrays of tag component, and
one access to miscellaneous component => 5 + 4*(5) +1 = 26 power units.
Random: one access to arrays of tag components, plus 4 accesses to arrays of tag component =>
5 +4*(5) = 25 power units.

h) (3 pts)
For every access:
P (way hit, cache hit) = 0.95
P (way miss, cache hit) = 0.02
P (way miss, cache miss) = 0.03
LRU = (0.95*26 + 0.02*50 + 0.03*30) power units
FIFO = (0.95*25 + 0.02*45 + 0.03*26) power units
Random = (0.95*25 + 0.02*45 + 0.03*25) power units
Answer 4: (36 points)
a) (18 pts)
The CPU reads a word at virtual address 124A5DF4, assuming the page translation is in the L2 TLB,
but not the L1 TLB, and the word is in memory, but not found in any cache:
Virtual address: (124A5DF4)16 = (0001 0010 0100 1010 0101 1101 1111 0100)2
L1 cache Binary Hex
Offset (4 bits): 0100 4
Index (10 bits): 01 1101 1111 1DF
Tag (18 bits): 0001 0010 0100 1010 01 4929
L1 TLB
Virtual page number (20 bits): 0001 0010 0100 1010 0101 124A5
Page offset (12 bits): L2 1101 1111 0100 DF4
TLB
Virtual page number (20 bits): 0001 0010 0100 1010 0101 124A5
Page offset (12 bits): 1101 1111 0100 DF4

Physical address: (036B0DF4)16 = (0000 0011 0110 1011 0000 1101 1111 0100)2
L2 cache
Offset (5 bits): 1 0100 14
Index (10 bits): 000 1101 111 6F
Tag (17 bits): 0000 0011 0110 1011 0 6D6
L3 cache
Offset (5 bits): 1 0100 14
Index (13 bits): 11 0000 1101 111 186F
Tag (14 bits): 0000 0011 0110 10 DA

Clock Action
0 CPU→L1 cache: look up 4 bytes at tag 4929, index 1DF, offset 4 (miss)
CPU→L1 TLB: look up virtual page 124A5
3 L1 TLB (miss)
4 L1 TLB→L2 TLB: look up virtual page 124A5
13 L2 TLB (hit)
L2 TLB returns translation to physical page 036B0 Construct
physical address 036B0DF4
14 L1 cache→L2 cache: look up 16 bytes at tag 6D6, index 6F, offset 14
18 L2 cache (miss)
19 L2 cache→L3 cache: look up 32 bytes at tag DA, index 186F, offset 14
33 L3 cache (miss)
34 L3 cache→Memory: look up 32 bytes with physical address 036B0DF4
133 Memory returns data for physical addresses 036B0DE0 - 036B0DFF
L3 replaces one block in set at index 186F, tag DA
L3 returns data for physical addresses 036B0DE0 - 036B0DFF
L2 replaces one block in set at index 6F, tag 6D6
L2 returns data for physical addresses 036B0DF0 - 036B0DFF,
virtual address 124A5DF0 - 124A5DFF
L1 replaces one block in set at index 1DF, tag 4929
CPU gets data for virtual address 124A5DF4, physical address 036B0DF4

b) (18 pts)
The CPU reads a word at virtual address 124A5DF4 with the same assumptions as the part a), but this
time assuming the L2 cache uses the virtual address for its index and tag instead of the physical address:
Virtual address: (124A5DF4)16 = (0001 0010 0100 1010 0101 1101 1111 0100)2
L1 cache Binary Hex
Offset (4 bits): 0100 4
Index (10 bits): 01 1101 1111 1DF
Tag (18 bits): 0001 0010 0100 1010 01 4929
L2 cache
Offset (5 bits): 1 0100 14
Index (10 bits): 101 1101 111 2EF
Tag (17 bits): 0001 0010 0100 1010 0 2494
L1 TLB
Virtual page number (20 bits): 0001 0010 0100 1010 0101 124A5
Page offset (12 bits): L2 1101 1111 0100 DF4
TLB
Virtual page number (20 bits): 0001 0010 0100 1010 0101 124A5
Page offset (12 bits): 1101 1111 0100 DF4
Physical address: (036B0DF4)16 = (0000 0011 0110 1011 0000 1101 1111 0100)2
L3 cache
Offset (5 bits): 1 0100 14
Index (13 bits): 11 0000 1101 111 186F
Tag (14 bits): 0000 0011 0110 10 DA
Clock Action
0 CPU→L1 cache: look up 4 bytes at tag 4929, index 1DF, offset 4 (miss) CPU→L1
TLB: look up virtual page 124A5
1 L1 cache→L2 cache: look up 16 bytes at tag 2494, index 2EF, offset 14
3 L1 TLB (miss)
4 L1 TLB→L2 TLB: look up virtual page 124A5
5 L2 cache (miss)
13 L2 TLB (hit)
L2 TLB returns translation to physical page 036B0 Construct
physical address 036B0DF4
14 L2 cache→L3 cache: look up 32 bytes at tag DA, index 186F, offset 14
28 L3 cache (miss)
29 L3 cache→Memory: look up 32 bytes with physical address 036B0DF4
128 Memory returns data for physical addresses 036B0DE0 - 036B0DFF
L3 replaces one block in set at index 186F, tag DA
L3 returns data for physical addresses 036B0DE0 - 036B0DFF,
virtual address 124A5DE0 - 124A5DFF
L2 replaces one block in set at index 2EF, tag 2494
L2 returns data for virtual address 124A5DF0 - 124A5DFF
L1 replaces one block in set at index 1DF, tag 4929
CPU gets data for virtual address 124A5DF4, physical address 036B0DF4
Convert the L2 cache to use the virtual address saves 5 cycles in total comparing to the case of the L2
cache in part a), because the L2 cache checking here can occur simultaneously with the L1 TLB checking.
It is better to make this change.

You might also like