0% found this document useful (0 votes)

11 views23 pages

CS-30005 (HPC) - CS End Nov 2024

The document discusses various strategies for reducing cache miss rates to improve computer architecture performance, including larger cache sizes, higher associativity, victim caches, prefetching, and compiler optimizations. It also covers the importance of managing hazards in instruction scheduling, specifically Write After Write (WAW) and Write After Read (WAR) hazards, using techniques like Tomasulo's algorithm. Additionally, it explains vector and VLIW processor architectures, highlighting their advantages in handling large datasets and parallel instruction execution.

Uploaded by

Sacrifing Doomrangs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views23 pages

CS-30005 (HPC) - CS End Nov 2024

Uploaded by

Sacrifing Doomrangs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

1.

a 1

1.b

1.c
1.d

1.e

1.f
1.g

1.h
1.i
1.j
2.a 5

Marking Scheme: 2+3

2.b 5

3.a 5

Marking Scheme: 1+4

3.b 5
4.a 5

Reducing the miss rate in a cache system is critical for improving the performance of computer
architectures. The miss rate is the fraction of memory accesses that result in cache misses.
Various optimization techniques are used to reduce the miss rate, and these can be broadly
categorized into cache design strategies and programming optimizations.

1. Larger Cache Size

A larger cache can hold more data, which reduces capacity misses because it is less likely for a
required data block to be evicted due to lack of space. However, increasing cache size may lead
to higher access times and cost.

Example:

• If a 64 KB cache experiences frequent misses due to limited capacity, upgrading it to 128 KB

can reduce the miss rate.

2. Higher Associativity

Caches can be classified as direct-mapped, set-associative, or fully-associative. Higher

associativity reduces conflict misses by allowing multiple cache lines to be mapped to the
same set.

Example:

• A 4-way set-associative cache reduces conflict misses compared to a direct-mapped cache

because multiple blocks can coexist in the same set.

3. Victim Cache

A small, fully-associative victim cache stores blocks evicted from the main cache. This reduces
conflict misses by giving recently evicted blocks a second chance.

Example:

• A system using an 8-entry victim cache alongside a direct-mapped cache can catch frequently
re-accessed blocks and reduce misses.

4. Prefetching

Prefetching fetches data into the cache before it is needed, based on predictable access
patterns. Hardware and software prefetching are commonly used.

Example:

• For a loop that accesses array elements sequentially, a hardware prefetcher can predict the
next memory accesses and bring them into the cache early.

5. Cache Line Size Optimization

Increasing the cache line size can reduce compulsory misses by fetching more data during each
memory access. However, it may increase the miss penalty if unused data is fetched.
Example:

• A 64-byte cache line may reduce compulsory misses in applications with spatial locality
compared to a 32-byte cache line.

6. Multi-level Caches

Using multiple cache levels (L1, L2, L3) helps reduce misses. Data that cannot fit in the L1 cache
can often be found in L2 or L3, reducing main memory accesses.

Example:

• Modern processors often use a small, fast L1 cache for frequently accessed data and larger
L2/L3 caches for less frequently accessed data.

7. Compiler Optimizations

Compilers can reorganize code to improve data locality, reducing misses.

Techniques:

• Loop Interchange: Reordering nested loops to improve spatial locality.

• Blocking (Tiling): Breaking large data sets into smaller blocks to fit into the cache.

Example:

// Without optimization:
for (i = 0; i < N; i++) {
for (j = 0; j < M; j++) {
A[i][j] = B[i][j] + C[i][j];
}
}

// With loop interchange (better cache utilization):

for (j = 0; j < M; j++) {
for (i = 0; i < N; i++) {
A[i][j] = B[i][j] + C[i][j];
}
}

8. Reducing Cache Pollution

Reducing unnecessary data loads into the cache can lower misses. Techniques like selective
caching or bypassing less useful data help achieve this.

Example:

• Streaming data that is unlikely to be reused can bypass the cache to prevent evicting useful
data.

9. Cache Partitioning

Partitioning the cache among different threads or cores reduces contention and conflict misses
in multi-threaded environments.

Example:
• In a multi-core processor, assigning a portion of the cache to each core avoids frequent cache
evictions caused by other cores.

10. Software-Controlled Caches

Explicit control over cache behavior, such as cache hints in software, can help optimize data
placement and reduce misses.

Example:

• Programming models like CUDA allow developers to explicitly manage shared memory and
cache in GPU programming, reducing miss rates.

By combining these strategies effectively, system designers and programmers can significantly
reduce cache miss rates, leading to improved overall performance.

4.b 5

Marking Scheme: 2+2=1

5.a 5
5.b 5

6.a 5
Direct Mapped: 2.5 + Set Associative : 2.5 marks
6.b 5

Marking Scheme: WAW: 2.5 + WAR: 2.5 Marks

WAW Resolution: Delays writes of later instructions to ensure they do not
overwrite earlier writes prematurely.
WAR Resolution: Delays writes until all dependent reads are completed.

The combination of reservation stations and the CDB dynamically schedules and
synchronizes instructions, effectively resolving these hazards without stalling the
pipeline hazards by dynamically scheduling instructions and managing
dependencies.

1. WAW Hazard

● Tomasulo's approach allows instructions to issue and execute out of order.

● If an instruction is ready to write to the destination register via the CDB, it
checks whether any later instruction also writes to the same register. The
reservation station ensures that the later write occurs only after the earlier
instruction has completed, preventing the hazard.

I1: ADD R1, R2, R3 # Writes result to R1

I2: MUL R1, R4, R5 # Also writes to R1

In Tomasulo's approach,

● Both instructions are issued to their respective reservation stations.

● When ADD (I1) completes execution, it writes to the CDB.
● MUL (I2) will not write to R1 until the reservation station confirms that
ADD has completed and broadcast its result on the CDB. This ensures the
correctness of the final value in R1.

2. Write After Read (WAR) Hazard

● Tomasulo's algorithm tracks dependencies using reservation stations and

avoids hazards by delaying writes until reads are completed.
● If an instruction wants to write to a register being read by an earlier
instruction, the reservation station ensures the write is delayed until the read
completes.

Example:

I1: ADD R1, R2, R3 # Reads R2, R3

I2: MUL R2, R4, R5 # Writes to R2

ADD (I1) is issued to a reservation station and begins execution, reading R2 and
R3.
MUL (I2) is issued but will not write to R2 until ADD completes its execution and
finishes reading R2. This is managed by the reservation station tracking the usage
of R2.
7.a 5
Marking Scheme: 3 marks for split cache + 2 marks for unified cache.
7.b 5

(i) Vector processor Architecture:

In computing, a vector processor is a central processing unit (CPU) that

implements an instruction set where its instructions are designed to operate
efficiently and effectively on large one dimensional arrays of data called vectors.
This is in contrast to scalar processors, whose instructions operate on single data
items only, and in contrast to some of those same scalar processors having
additional single instruction, multiple data (SIMD) or SIMD within a register
(SWAR) Arithmetic Units.
Marking Scheme: Best of the two answers will be awarded out of 5 marks. If only 1
option is done, then also awarded out of 5 marks.

For specific kinds of computing applications, vector processing performs very well
in terms of key features. These features are as follows:

1. Simultaneous operations: This is achieved through the use of specialized

hardware that can process multiple data elements in parallel.
2. High performance: Vector processing can achieve high performance by
exploiting data parallelism and reducing memory access. This means that
vector processors can perform computations faster than traditional
processors, particularly for tasks that involve repeated operations on large
datasets.
3. Scalability: Vector processors can scale up to handle larger datasets without
sacrificing performance.
4. Limited instruction set: Vector processors have a limited instruction set
that’s optimized for numerical computations.
5. Data alignment: Vector processors require data to be aligned in memory to
achieve optimal performance. This means the data must be stored in
contiguous memory locations so that the processor can access it efficiently.

Vector processing provides higher performance than traditional CPU or GPU

architectures because it’s able to handle more data at once. And we all know how
vital high performance is when you’re working on graphics-related use cases. There
are two main types of vector processing: SIMD and MIMD.

(ii) VLIW processor Architecture:

In multiple issue processors, we increase the width of the pipeline. Several

instructions are fetched and decoded in the front-end of the pipeline. Several
instructions are issued to the functional units in the back-end. Suppose if m is the
maximum number of instructions that can be issued in one cycle, we say that the
processor is m-issue wide. There are basically two variations in multiple issue
processors – Superscalar processors and VLIW (Very Long Instruction Word)
processors.

VLIW processors, issue a fixed number of instructions formatted either as one large
instruction or as a fixed instruction packet with the parallelism among instructions
explicitly indicated by the instruction. Hence, they are also known as EPIC–
Explicitly Parallel Instruction Computers. Examples include the Intel/HP Itanium
processor.

VLIW uses Instruction Level Parallelism, i.e. it has programs to control the parallel
execution of the instructions.

VLIW Architecture deals with the performance by depending on the compiler. The
programs decide the parallel flow of the instructions and to resolve conflicts. This
increases compiler complexity but decreases hardware complexity by a lot.
Features :

 The processors in this architecture have multiple functional units, fetch from
the Instruction cache that have the Very Long Instruction Word.
 Multiple independent operations are grouped together in a single VLIW
Instruction. They are initialized in the same clock cycle.
 Each operation is assigned an independent functional unit.
 All the functional units share a common register file.
 Instruction words are typically of the length 64-1024 bits depending on the
number of execution unit and the code length required to control each unit.
 Instruction scheduling and parallel dispatch of the word is done statically by
the compiler.
 The compiler checks for dependencies before scheduling parallel execution
of the instructions.

Dalgakiran Refrigeration Air Dryers
0% (1)
Dalgakiran Refrigeration Air Dryers
2 pages
100 Geometry Problems: Contributors: Djmathman, Abishek99, Captainflint
No ratings yet
100 Geometry Problems: Contributors: Djmathman, Abishek99, Captainflint
8 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
Compiler Optimizations and Prefetching
No ratings yet
Compiler Optimizations and Prefetching
22 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
No ratings yet
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
78 pages
Lecture 7
No ratings yet
Lecture 7
21 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
Cache Writing & Performance
No ratings yet
Cache Writing & Performance
23 pages
Advanced Cache Optimizations - : Adapted From Patterson and Hennessey (Morgan Kauffman Pubs)
No ratings yet
Advanced Cache Optimizations - : Adapted From Patterson and Hennessey (Morgan Kauffman Pubs)
12 pages
CompArch Most Important Questions
No ratings yet
CompArch Most Important Questions
12 pages
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
No ratings yet
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
18 pages
Architecture Sem2 @di
No ratings yet
Architecture Sem2 @di
40 pages
Comp Org Exam 3 Cheat Sheet
No ratings yet
Comp Org Exam 3 Cheat Sheet
3 pages
COA Digital-Cheatsheet
No ratings yet
COA Digital-Cheatsheet
4 pages
Ca Q,,a 4TH Sem
No ratings yet
Ca Q,,a 4TH Sem
18 pages
Memory Hierarchy Design-Aca
No ratings yet
Memory Hierarchy Design-Aca
15 pages
Lec 34
No ratings yet
Lec 34
26 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
Unit II
No ratings yet
Unit II
9 pages
Questions On Chapter 1 and 2 Color New V2
No ratings yet
Questions On Chapter 1 and 2 Color New V2
8 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
CA Final PDF
No ratings yet
CA Final PDF
13 pages
Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
Computer Organisation and Architecture PYQ
No ratings yet
Computer Organisation and Architecture PYQ
14 pages
Aca Seminar Report
No ratings yet
Aca Seminar Report
11 pages
Lec 22
No ratings yet
Lec 22
14 pages
COAIMP
No ratings yet
COAIMP
6 pages
Advanced Comp
No ratings yet
Advanced Comp
7 pages
10 Caches
No ratings yet
10 Caches
34 pages
Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty
No ratings yet
Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty
16 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Memory 2
No ratings yet
Memory 2
31 pages
Cso Inprotent Qustion2024
No ratings yet
Cso Inprotent Qustion2024
29 pages
Questions With Answers
No ratings yet
Questions With Answers
22 pages
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
No ratings yet
Topics: Cache Innovations (Sections 2.4, B.4, B.5), Virtual Memory Intro
20 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
L07 MemoryII
No ratings yet
L07 MemoryII
27 pages
Coa Poster Content
No ratings yet
Coa Poster Content
2 pages
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
No ratings yet
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
7 pages
Solutions COA7e 1
No ratings yet
Solutions COA7e 1
92 pages
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
51 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
F11 - Cache Aware Programming For Multicores
No ratings yet
F11 - Cache Aware Programming For Multicores
20 pages
Oral Questions 2021 - Architecture
No ratings yet
Oral Questions 2021 - Architecture
14 pages
Computer Architecture Solutions - OK
No ratings yet
Computer Architecture Solutions - OK
6 pages
Caches and Memory
No ratings yet
Caches and Memory
65 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
Part A
No ratings yet
Part A
3 pages
Lecture 8
No ratings yet
Lecture 8
37 pages
0595 Mark Guide 2025 Mock - 085957
No ratings yet
0595 Mark Guide 2025 Mock - 085957
6 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
17 pages
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
No ratings yet
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
16 pages
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
57 pages
Mastering C: Advanced Techniques and Tricks
From Everand
Mastering C: Advanced Techniques and Tricks
Ted Norice
No ratings yet
Natural Gas Engineering
100% (1)
Natural Gas Engineering
51 pages
LG - TV - LG Uj6500
No ratings yet
LG - TV - LG Uj6500
37 pages
Medieval English Architecture
No ratings yet
Medieval English Architecture
4 pages
Wbi11 01 Que 20240508
No ratings yet
Wbi11 01 Que 20240508
28 pages
LKG GK Syllabus Whole Session
No ratings yet
LKG GK Syllabus Whole Session
6 pages
Environmental Fact Sheet (# 33) Crude Palm Kernel Oil (CPKO)
No ratings yet
Environmental Fact Sheet (# 33) Crude Palm Kernel Oil (CPKO)
6 pages
Catalog Fortuner GR Sport Compressed
No ratings yet
Catalog Fortuner GR Sport Compressed
8 pages
11th Physics Book Back Questions With Answers in English
No ratings yet
11th Physics Book Back Questions With Answers in English
29 pages
CSPP Geo GRB Installation Guide v0.3
No ratings yet
CSPP Geo GRB Installation Guide v0.3
13 pages
The Opportunity Cost of Using Excess Capacity
No ratings yet
The Opportunity Cost of Using Excess Capacity
8 pages
Policy Wordings
No ratings yet
Policy Wordings
19 pages
Notification-Price List For Central Cafe and Old Canteen 2025
No ratings yet
Notification-Price List For Central Cafe and Old Canteen 2025
3 pages
Computer Graphics
100% (1)
Computer Graphics
132 pages
Learning On The Road To Good Design - Case Studies
100% (1)
Learning On The Road To Good Design - Case Studies
29 pages
Screenshot 2021-06-17 at 4.10.12 PM
No ratings yet
Screenshot 2021-06-17 at 4.10.12 PM
3 pages
CH 7.5 - Cargo & Ballast Operations
No ratings yet
CH 7.5 - Cargo & Ballast Operations
472 pages
Enb 3
No ratings yet
Enb 3
10 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
3 pages
Answer Key PDF
No ratings yet
Answer Key PDF
199 pages
Element Builder SE
29% (38)
Element Builder SE
5 pages
Global Organic Textile Standard - GOTS
No ratings yet
Global Organic Textile Standard - GOTS
3 pages
Ecosystem Services: Roldan Muradian, Laura Rival
No ratings yet
Ecosystem Services: Roldan Muradian, Laura Rival
8 pages
Community Immersion
No ratings yet
Community Immersion
2 pages
DC80
No ratings yet
DC80
13 pages
George B. Handley - Literature and Ecotheology - From Chaos To Cosmos (Routledge Environmental Humanities) - Routledge (2024)
No ratings yet
George B. Handley - Literature and Ecotheology - From Chaos To Cosmos (Routledge Environmental Humanities) - Routledge (2024)
249 pages
Morning Briefing (May 07, 2012)
No ratings yet
Morning Briefing (May 07, 2012)
2 pages
Tally ERP 1 Book (1) 1-1
No ratings yet
Tally ERP 1 Book (1) 1-1
43 pages
SSF Plastics Baseline Draft For Review
No ratings yet
SSF Plastics Baseline Draft For Review
19 pages

CS-30005 (HPC) - CS End Nov 2024

Uploaded by

CS-30005 (HPC) - CS End Nov 2024

Uploaded by

1.

Marking Scheme: 2+3

Marking Scheme: 1+4

1. Larger Cache Size

• If a 64 KB cache experiences frequent misses due to limited capacity, upgrading it to 128 KB

Caches can be classified as direct-mapped, set-associative, or fully-associative. Higher

• A 4-way set-associative cache reduces conflict misses compared to a direct-mapped cache

5. Cache Line Size Optimization

Compilers can reorganize code to improve data locality, reducing misses.

• Loop Interchange: Reordering nested loops to improve spatial locality.

// With loop interchange (better cache utilization):

8. Reducing Cache Pollution

10. Software-Controlled Caches

Marking Scheme: 2+2=1

Marking Scheme: WAW: 2.5 + WAR: 2.5 Marks

● Tomasulo's approach allows instructions to issue and execute out of order.

I1: ADD R1, R2, R3 # Writes result to R1

● Both instructions are issued to their respective reservation stations.

2. Write After Read (WAR) Hazard

● Tomasulo's algorithm tracks dependencies using reservation stations and

I1: ADD R1, R2, R3 # Reads R2, R3

(i) Vector processor Architecture:

In computing, a vector processor is a central processing unit (CPU) that

1. Simultaneous operations: This is achieved through the use of specialized

Vector processing provides higher performance than traditional CPU or GPU

(ii) VLIW processor Architecture:

In multiple issue processors, we increase the width of the pipeline. Several

You might also like