HPCA Endsem SPR 2024

The document is an end-semester examination paper for the High-Performance Computer Architecture course at IIT Kharagpur, detailing instructions for two sections of students. It includes various questions related to microprocessor architecture, cache coherence, and RISC-V assembly code, requiring students to provide concise answers and complete tables based on given scenarios. The exam covers theoretical concepts and practical applications in computer architecture, with a total of 100 marks and a duration of 3 hours.

Uploaded by

Pratik Sonune

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views3 pages

HPCA Endsem SPR 2024

Uploaded by

Pratik Sonune

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Indian Institute of Technology Kharagpur

Department of Computer Science and Engineering

End-semester Examination, Spring 2023-24
High Performance Computer Architecture (CS60003)
Students: 167 Full Marks: 100 Time: 3 hours

INSTRUCTIONS: This question paper has two parts: PART-I for Section-1 students, and PART-II
for Section-2 students. ONLY ATTEMPT THE APPROPRIATE PART OF THE QUESTION
PAPER. This test is closed books and closed notes. Calculators are allowed.

PART-I [for Section-1]

ANSWER ALL QUESTIONS

1. Give brief answers (two-three sentences maximum) to each of the following questions:
(a) Consider a single-core microprocessor with two levels of on-chip cache. Derive the overall Average
Memory Access Time (AMAT) of the cache system as a function of the Hit Time, Miss Penalty, Hit
Rate and Miss Rate of the two levels of the [3]
(b) Why is a Critical Word First strategy more effective in a cache with relatively large block size? [3]
(c) What policies are adopted by microprocessors that perform out-of-order execution (with or without
speculation) to enable precise exceptions? [3]
(d) Intel and AMD processors have a CISC ISA, but internally convert CISC instructions through a
hardware layer to RISC-type instructions before executing them. [3]
(e) In the latest high-throughput microprocessors, the Reorder Buffer (ROB) has decreased in significance,
and only buffers control information. Why? [3]
(f) Explain the concept of macro-op fusion adopted in modern high-performance microprocessors. [3]
(g) Explain why it is rare to find an issue width greater than four in modern microprocessors. [3]
(h) Give two reasons why increase of microprocessor clock frequency to increase throughput has fallen out of
favour over the last two decades. [3]
(i) Explain why it is particularly challenging to accurately predict cache write latencies (even when there
has been a hit) in a modern multi-core microprocessor. [3]
(j) Distinguish between local node, home node, and remote node in the context of a directory-based cache
coherence protocol. [3]
2. Consider the following RISC-V code fragment:

addi x4,x1,#800 ; x1 = upper bound for X

foo: fld F2,0(x1) ; (F2) = X(i)
fmul.d F4,F2,F0 ; (F4) = a*X(i)
fld F6,0(x2) ; (F6) = Y(i)
fadd.d F6,F4,F6 ; (F6) = a*X(i) + Y(i)
fsd F6,0(x2) ; Y(i) = a*X(i) + Y(i)
addi x1,x1,#8 ; increment X index
addi x2,x2,#8 ; increment Y index
sltu x3,x1,x4 ; set X3 to 1 if X1 < X4
bnez x3,foo ; loop if needed, nothing to write to CDB

Consider a processor with the 64-bit RISC-V ISA implementing the Tomasulo’s Algorithm without speculation
scheme, with the specifications of Functional Units (FUs) as shown in Fig. 1, executing the above code
snippet. Assume the following:
Functional units are not pipelined.

1
Figure 1: Functional Unit specifications in a processor with Tomasulo’s scheme.

There is no forwarding between functional units; results are communicated by the Common Data Bus
(CDB).
The execution stage does both the effective address calculation and the memory access for loads and
stores.
Loads require one clock cycle.
Issue and write-back result each require one clock cycle.
There are five load buffer slots and five store buffer slots.
Branch on Not Equal to Zero (bnez) instruction requires one clock cycle to execute. This means that
(since there is no speculation), the first instruction of the next iteration is issued correctly in the next
clock cycle after issue of bnez because of the presence of a perfect branch predictor, but it waits for one
clock cycle after issue in its corresponding reservation station while the bnez instruction is executing.
bnez does not write to CDB, and hence does not consume any clock cycle for “Write CDB”.

Complete the following table for the first three iterations of the loop. You may ignore the addi
instruction before the loop starts. The execution of the first two instructions of the first iteration of the loop
has been shown for your convenience in Fig. 2. Note that Executes/memory implies start of execution.
[15]

Figure 2: Example table entries for a processor with Tomasulo’s scheme.

3. Suppose we have a 96-core future generation processor, but on average only 54 cores can be busy. Suppose
that 90% of the time, we can use all available cores; 9% of the time, we can use 50 cores; and 1% of the time
is strictly serial.

(a) How much speedup might we expect from the above microprocesor? [5]
(b) Now assume that cores of the processor in part-(a) can be turned off when not in use. How would the
multi-core speedup compare to the 24-processor count version that can use all its processor 99% of the
time? [2]
(c) Explain (in brief) the advantage of the MESI protocol over the MSI protocol, in the context of ensuring
cache coherence. [3]
(d) Explain, with RISC-V assembly language code snippet, how a traditional spin lock based technique to
solve the critical section problem should be modified to take advantage of a coherent cache system. You
may assume the existence of an (effectively) atomic register↔memory swap operation denoted by EXCH.
[5]

2
4. (a) Suppose we have an application running on a 100-core microprocessor, and assume that application can
use 1, 50, or 100 cores. If we assume that 90% of the time we can use all 100 cores, how much of the
remaining 10% of the execution time must employ 50 processors if we want a speedup of 75? [3]
(b) Consider an 8-core microprocessor where each processor has its private L1 and L2 caches, and snooping
is performed on a shared bus among the L2 caches. Assume the average L2 request, whether for a
coherence miss or other miss, is 12 cycles. Assume a clock rate of 2.5 GHz, a CPI of 0.75, and a
load/store frequency of 45%.If our goal is that no more than 50% of the L2 bandwidth is consumed by
coherence traffic, what is the maximum coherence miss rate allowable per processor? [3]
(c) Consider a 64-processor CC-NUMA (Cache Coherent – Non-Uniform Memory Access) computer system.
Each processor has a single-level 128 KB on-chip cache. The sharing among the processors is at the
block-level, and the block size is 64 bytes. The size of the main memory connected to each node is 2 GB.
A directory-based cache coherence protocol is being used. Determine (i) the length of each directory
entry (in bits); (ii) the total space occupied by directories in the entire system. [2 + 2 = 4]
5. Consider a microprocessor with two levels of on-chip data cache. The L1 cache is “Virtually Indexed
Physically Tagged” (VIPT), direct-mapped, holds 8 kB of data. The L2 cache is direct-mapped, holds 4 MB
of data. Both L1 and L2 caches use 64 byte blocks. The page size is 8 kB. The Translation Lookaside Buffer
(TLB) is direct-mapped with 256 entries. Each Virtual Address is 64 bits long, and each Physical Address is
41 bits long.
(a) Draw a block diagram depicting the microarchitecture of this data cache, clearly depicting the width of
each field in bits. [6]
(b) Determine the total size of the TLB, L1 cache, and L2 cache on the system. Assume each L1 entry is
accompanied by 3 bits of metadata, each L2 entry is accompanied by 4 bits of metadata, and each TLB
entry is accompanied by 5 bits of metadata. [6]
(c) Distinguish between true coherence miss and false coherence miss in the private cache of a multi-core
microprocessor, with a simple example. [4]
(d) Inspired by the success of the “critical-word-first” techniques in reducing the L1 cache miss penalty, the
designers for a microprocessor are considering the use of these techniques for the L2 cache. Assume that
the microprocessor is being planned to have a single 1 MB L2 cache with 64-byte blocks. Further assume
that the L2 cache can be written with 16 bytes every 4 processor cycles. The time to receive the first
16-bytes from the memory controller is 100 cycles, and transfer of each additional 16 bytes from the
main memory requires 16 cycles (excluding the 4 cycles required to write the 16 bytes to the L2 cache).
For each of the following two scenarios determine how many cycles would it take to service an L2 cache
miss: (i) without using critical-word-first, and, (ii) with the critical-word-first technique. [2 + 2 = 4]
6. Suppose IIT-KGP’s ERP software runs on an 8-core RISC-V microprocessor-based computer. The receivables
of IIT-KGP consist of a large number of grants, and also the student fees. In the ERP software, a thread is
spawned whenever a receivable transaction is initiated. A thread corresponding to a receivable transaction
updates a shared variable balance. For updating balance with the amount received, each thread essentially
carries out the following operation:
balance = balance + amount; // You need to synchronize access to balance!
Assume that the variable balance is shared among the threads and its memory location is given by 0(X1).
The received amount is in the register X2.
(a) Explain the operations of the Load Reserved (LR) and Store Conditional (SC) instructions of the RISC-V
Instruction Set Architecture. [4]
(b) Write a RISC-V code fragment that can (effectively) atomically perform the above operation,
considering as a Critical Section Problem. Clearly mark the instructions corresponding to the entry
section, the critical section, and the exit section. [6]

——————————————- END OF PART-I —————————————————

Final Exam Topics: CSE 564 Computer Architecture Summer 2017
No ratings yet
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
78 pages
Main Sol Midterm
No ratings yet
Main Sol Midterm
21 pages
COSS - 2022-23 Question Paper
No ratings yet
COSS - 2022-23 Question Paper
6 pages
Main Midterm
No ratings yet
Main Midterm
25 pages
CSC 417 PQ
No ratings yet
CSC 417 PQ
6 pages
Final
No ratings yet
Final
20 pages
Comparch Answers and Questions
No ratings yet
Comparch Answers and Questions
7 pages
Midterm1 s15 Sol
No ratings yet
Midterm1 s15 Sol
26 pages
Practice Exam 1
No ratings yet
Practice Exam 1
11 pages
CS222 - COAL - SOLUTION - Final - Spring2023
No ratings yet
CS222 - COAL - SOLUTION - Final - Spring2023
12 pages
Computer Organization and Architecture Csen 2202 - 2022
No ratings yet
Computer Organization and Architecture Csen 2202 - 2022
6 pages
En m3 Ex Sol
No ratings yet
En m3 Ex Sol
35 pages
Cao 2021 HW2
No ratings yet
Cao 2021 HW2
4 pages
Cse-Viii-Advanced Computer Architectures (06CS81) - Question Paper
No ratings yet
Cse-Viii-Advanced Computer Architectures (06CS81) - Question Paper
5 pages
Coursepoint 20221017200206
No ratings yet
Coursepoint 20221017200206
9 pages
B. Tech, High Performance Computer Architecture (CS-3010), Autumn End Semester Examination 2021
No ratings yet
B. Tech, High Performance Computer Architecture (CS-3010), Autumn End Semester Examination 2021
9 pages
7th Question Paper
No ratings yet
7th Question Paper
21 pages
CSE360 3 T1 Sp22
No ratings yet
CSE360 3 T1 Sp22
2 pages
Question 1
No ratings yet
Question 1
41 pages
Advanced Computer Architecture Test-2 Answer
No ratings yet
Advanced Computer Architecture Test-2 Answer
3 pages
Final w11
No ratings yet
Final w11
10 pages
(D) Global Optimization and Local Optimization of Compliers
No ratings yet
(D) Global Optimization and Local Optimization of Compliers
4 pages
ch1 4 First Test Samples v4 Solution PDF
No ratings yet
ch1 4 First Test Samples v4 Solution PDF
5 pages
ISA 2 Regular Solution
No ratings yet
ISA 2 Regular Solution
4 pages
2EC319 IR December 2015
No ratings yet
2EC319 IR December 2015
4 pages
2EC319 IR December 2014
No ratings yet
2EC319 IR December 2014
3 pages
COA Answers
No ratings yet
COA Answers
5 pages
EC355TBF - CA - 2022 Scheme - V Sem - MQP
No ratings yet
EC355TBF - CA - 2022 Scheme - V Sem - MQP
4 pages
L-3rr-l/CSE Date:: Iw Iw
No ratings yet
L-3rr-l/CSE Date:: Iw Iw
30 pages
Major Solution
No ratings yet
Major Solution
6 pages
PCS216
No ratings yet
PCS216
3 pages
350 Exam 2 Spring 2024
No ratings yet
350 Exam 2 Spring 2024
7 pages
Ca PDF
No ratings yet
Ca PDF
10 pages
CSE 530 Homework #1 Due September 26 Anthony Dotterer: C C C T C T C C T T
No ratings yet
CSE 530 Homework #1 Due September 26 Anthony Dotterer: C C C T C T C C T T
9 pages
Final 2014
No ratings yet
Final 2014
12 pages
Practice Final Soln
No ratings yet
Practice Final Soln
17 pages
CSCI435 Midterm Fall2224 - V3
No ratings yet
CSCI435 Midterm Fall2224 - V3
3 pages
F001 English CLC f1
No ratings yet
F001 English CLC f1
6 pages
Assignment 1 COS 122 - University of Pretoria
No ratings yet
Assignment 1 COS 122 - University of Pretoria
5 pages
Compre Final
No ratings yet
Compre Final
2 pages
Csci 343 - Summer 2013 - Exam 1
No ratings yet
Csci 343 - Summer 2013 - Exam 1
6 pages
Computer Architecture: Ph.D. Qualifiers Examination - Sample Questions
No ratings yet
Computer Architecture: Ph.D. Qualifiers Examination - Sample Questions
2 pages
tdt4260 May 2013 Final
No ratings yet
tdt4260 May 2013 Final
7 pages
111 Computer Organization - Final
No ratings yet
111 Computer Organization - Final
4 pages
Adv. Computer Architecture CS701 - Jan 011
No ratings yet
Adv. Computer Architecture CS701 - Jan 011
2 pages
COSS MidSem 2020.07.05 MakeUp With Key COPYM06Tq# Name-Rana
No ratings yet
COSS MidSem 2020.07.05 MakeUp With Key COPYM06Tq# Name-Rana
5 pages
2023 Contoh Soalan Computer Architecture and Organization
No ratings yet
2023 Contoh Soalan Computer Architecture and Organization
7 pages
Indian Institute of Technology, Kharagpur: Mid-Spring Semester 2021-22
No ratings yet
Indian Institute of Technology, Kharagpur: Mid-Spring Semester 2021-22
4 pages
Assignment Nov 19
No ratings yet
Assignment Nov 19
7 pages
CENG3420 Homework 3: Solutions
No ratings yet
CENG3420 Homework 3: Solutions
5 pages
Homework 5
No ratings yet
Homework 5
6 pages
18-742 Advanced Computer Architecture: Test I February 24, 1998
No ratings yet
18-742 Advanced Computer Architecture: Test I February 24, 1998
10 pages
Quiz Questions
No ratings yet
Quiz Questions
2 pages
Coss
No ratings yet
Coss
2 pages
Illinois Exam2 Practice Solfa08
No ratings yet
Illinois Exam2 Practice Solfa08
4 pages
201 Python Programming
No ratings yet
201 Python Programming
189 pages
Assignment Questions
No ratings yet
Assignment Questions
3 pages
Coss 2
No ratings yet
Coss 2
2 pages
31
100% (1)
31
9 pages
Solutions To Set 8
No ratings yet
Solutions To Set 8
18 pages
Lab2 Synthesis
No ratings yet
Lab2 Synthesis
27 pages
CS3551 DC - Int - I - Answer Key 7.9.23
No ratings yet
CS3551 DC - Int - I - Answer Key 7.9.23
68 pages
MERN Stack Interview Questions (2024)
100% (1)
MERN Stack Interview Questions (2024)
24 pages
Verdi Quickrefpdf
No ratings yet
Verdi Quickrefpdf
7 pages
Shalvi Python Internship Report - Word
No ratings yet
Shalvi Python Internship Report - Word
53 pages
Thesis On Image Encryption and Decryption
100% (2)
Thesis On Image Encryption and Decryption
5 pages
Introduction To Linux - Basic Commands & Environment - Linux-2
No ratings yet
Introduction To Linux - Basic Commands & Environment - Linux-2
57 pages
Companies Data
No ratings yet
Companies Data
22 pages
BACnet
No ratings yet
BACnet
6 pages
Escalated Quickly: Well, That
No ratings yet
Escalated Quickly: Well, That
52 pages
A+ Emerging Final Exam AAU
No ratings yet
A+ Emerging Final Exam AAU
14 pages
Software Architecture and Design
No ratings yet
Software Architecture and Design
8 pages
Zeek PDF Guide
No ratings yet
Zeek PDF Guide
23 pages
History of Operating System
No ratings yet
History of Operating System
16 pages
Anomaly Detection With Machine Learning in Wireless Networks and IoT by Zyyad Shah Master Thesis 2021
No ratings yet
Anomaly Detection With Machine Learning in Wireless Networks and IoT by Zyyad Shah Master Thesis 2021
98 pages
Cisco MDS 9000 Family Release Notes For Cisco MDS SAN-OS Release 2.1 (2b)
No ratings yet
Cisco MDS 9000 Family Release Notes For Cisco MDS SAN-OS Release 2.1 (2b)
22 pages
Quantum Computing
No ratings yet
Quantum Computing
3 pages
CISCO CCNA1 Chapter 6 Ethiopian Digital Library
No ratings yet
CISCO CCNA1 Chapter 6 Ethiopian Digital Library
5 pages
Holy Mary Institute of Technology & Science: Answer All Questions and Each Question Carries: Questions
No ratings yet
Holy Mary Institute of Technology & Science: Answer All Questions and Each Question Carries: Questions
2 pages
Physics-Lab-Project-Report
No ratings yet
Physics-Lab-Project-Report
38 pages
Questions and Answers-Ajax
No ratings yet
Questions and Answers-Ajax
8 pages
FIFO DESIGN FOR AVOIDING DATA LOSSES IN TRANSMISSION USING VERILOG HDL - May 24
No ratings yet
FIFO DESIGN FOR AVOIDING DATA LOSSES IN TRANSMISSION USING VERILOG HDL - May 24
5 pages
111 MINIMAX Data Sheet of SI01 Module
No ratings yet
111 MINIMAX Data Sheet of SI01 Module
1 page
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
No ratings yet
Module 3.1 - CUDA Parallelism Model: GPU Teaching Kit
44 pages
Access Your ESI Training Content
No ratings yet
Access Your ESI Training Content
3 pages
DLBCSICS01 Sample Solution 2
No ratings yet
DLBCSICS01 Sample Solution 2
5 pages
JD Software Engineer Quality Intern-2024
No ratings yet
JD Software Engineer Quality Intern-2024
2 pages
Task To Complete - Frontend Internship
No ratings yet
Task To Complete - Frontend Internship
2 pages
IGNOU PGDCA MCS 203 Operating System Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 203 Operating System Previous Years Unsolved Papers
Manish Soni
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet

HPCA Endsem SPR 2024

Uploaded by

HPCA Endsem SPR 2024

Uploaded by

Indian Institute of Technology Kharagpur

Department of Computer Science and Engineering

PART-I [for Section-1]

addi x4,x1,#800 ; x1 = upper bound for X

Figure 2: Example table entries for a processor with Tomasulo’s scheme.

——————————————- END OF PART-I —————————————————

You might also like