0% found this document useful (0 votes)

15 views12 pages

CS222 - COAL - SOLUTION - Final - Spring2023

The document outlines the final exam for the Computer Organization and Assembly Language course at GIK Institute, detailing the exam structure, instructions, and specific questions. It includes calculations related to instruction execution times and CPI improvements, as well as code translation tasks in RISC-V assembly. Additionally, it discusses concepts of locality in programming and evaluates different cache configurations for optimal hit rates.

Uploaded by

muazndm129

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

CS222 - COAL - SOLUTION - Final - Spring2023

Uploaded by

muazndm129

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

GIK Institute of Engineering Sciences and Technology, Topi

Spring 2023 (FCSE) Final Exam

22nd May 2023, 02:30 pm – 05:30 am

Course Code: CS222 Course Name: Computer Organization and Assembly

Language

Instructor name: Taj Muhammad Khan Section: AI

Vetter name: Vetter signature:

Student Name: Registration No:

 Read each question completely before answering it. There are 3 questions.
 In case of any ambiguity, you may make an assumption. Write it but your assumption should not contradict
any statement in the question paper.
 Write the answer in the space below each question. Answers written with pencil will get zero marks.
 Do rough work with pencil and then write final answer with a pen. Use last few pages for rough work.
 Read the whole paper first.
 Whenever possible, show your work and your thought process. This will make it easier for us to
give you partial credit.
.

Time: 150 minutes. Max Marks: 80

Question Total Marks Obtained

Pre-mid 30

Caches 30

Parallelism 20

Total 80

Question 1 (10+10+10 marks) :

Page 1 of 12
Part a (1.15): Assume a program requires the execution of 50 × 106 Floating Point instructions, 110
× 106 Integer instructions, 80 × 106 Load/Store instructions, and 16 × 106 Branch instructions.
The CPI for each type of instruction is 1, 1, 4, and 2, respectively. Assume that the processor has
a 2 GHz clock rate.

i. By how much must we improve the CPI of FP instructions if we want the program to run
two times faster?
ii. By how much is the execution time of the program improved if the CPI of INT and FP in-
structions is reduced by 40% and the CPI of L/S and Branch is reduced by 30%?

Answer i:
Total Clock cycles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S instr. + CPIbranch
× No. branch instr.

= 1 × 50 × 106 + 1 × 110 × 106 + 4 × 80 × 106 + 2 × 16 × 106

= 512 × 106

Time TCPU = clock cycles/clock rate = (512 × 106) / (2 × 109) = 0.256 s

To make the program run two times faster we’d have to reduce its time to half, i.e., half the
number of clock cycles by improving the CPI of FP instructions:

CPIimproved fp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S instr. + CPIbranch × No. branch instr. = clock cycles/2

CPIimproved fp = (clock cycles/2 − (CPIint × No. INT instr. + CPIl/s × No. L/S instr. + CPIbranch × No. branch instr.)) / No. FP instr.

CPIimproved fp = (256 − 462)/50 < 0

The result requires a –ve number of clock cycles per Floating Point instruction which is absurd.
Therefore, we can conclude that it is impossible to make the program run two times faster by re-
ducing the CPI of Floating Point instructions.

Page 2 of 12
Answer ii:

Original Clock cycles = CPIfp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S instr. + CPI-
branch × No. branch instr.

= 1 × 50 × 106 + 1 × 110 × 106 + 4 × 80 × 106 + 2 × 16 × 106

= 512 × 106

Time TCPU = clock cycles/clock rate = (512 × 106) / (2 × 109) = 0.256 s

New Clock cycles = CPIfp-new × No. FP instr. + CPIint-new × No. INT instr. + CPIl/s-new × No. L/S instr. +
CPIbranch-new × No. branch instr.

= 0.6 × 1 × 50 × 106 + 0.6 ×1 × 110 × 106 + 0.7 × 4 × 80 × 106 + 0.7 × 2 ×

16 × 10 6

= (30 + 66 + 224 + 22.4) × 106

= (30 + 66 + 224 + 22.4) × 106
= 342 × 106

New Time TCPU = new clock cycles/clock rate = (342 × 106) / (2 × 109) = 0.171 s

Page 3 of 12
Part b:
Assuming risc-v calling convention, what does the following code do? The function f2 is called
with two arguments:
f2( int arr[], int N); // arr is an array of 4-byte integers
// N is size of the array; always a +ve even integer

f2:
srli a1 a1 1 // divide the array size by 2 to get the half of it
addi t0 x0 0 // initialize i to zero

loop:
bge t0 a1 ret // if loop finished, return
lw t2 0(a0) // t2 = arr[i]
slli t1 a1 2 // t1 = half*4 ; get byte offset for arr[i+half]
add t3 a0 t1 // t3 = &arr[i+half]
lw t1 0(t3) // t1 = arr[i+half]
sw t1 0(a0) // arr[i] = t1
sw t2 0(t3) // arr[i+half] = t2
addi t0 t0 1 // i++
addi a0 a0 4 // update &arr[i]
beq x0 x0 loop // loop back

ret:
jalr x0 0(x1) // return

Answer:

It swaps the two halves of the array, i.e., given an array

arr = {1, 2, 3, 4, 5, 6, 7, 8} and N=8,

it would make the array {5, 6, 7, 8, 1, 2, 3, 4}.

f2( int arr[], int N) {

int half = N/2;
for (int i=0; i<half; i++){
int tmp = arr[i];
arr[i] = arr[i+half];
arr[i+half] = tmp;
}
}

See comments above for details.

Page 4 of 12
Part c:

Translate the following C code to RISC-V assembly code. Use a minimum number of instructions.
Assume that the values of a, b, i, and j are in registers x5, x6, x7, and x29, respectively. Also,
assume that register x10 holds the base address of the array D, which is an array of unsigned
short integers.

for(i=0; i<a; i++)

for(j=0; j<b; j++)
D[4*j] = D[i] + j;

Comment each line of your code to tell what you are doing.

Answer:
addi x7, x0, 0 // Init i = 0
LOOPI:
bge x7, x5, ENDI // While i < a
addi x30, x10, 0 // x30 = &D
addi x29, x0, 0 // Init j = 0
LOOPJ: bge x29, x6, ENDJ // While j < b
slli x28, x7, 1 // i*2 for short ints offset
add x28, x30, x28
lhu x28, 0(x28) // x28 = D[i]
add x31, x28, x29 // x31 = D[i]+j
slli x28, x29, 3 // (j*4)*2 for short ints offset
add x28, x30, x28
sh x31, 0(x28) // D[4*j] = x31
addi x29, x29, 1 // j++
jal x0, LOOPJ
ENDJ: addi x7, x7, 1 // i++;
jal x0, LOOPI
ENDI:
// next insn

Page 5 of 12
Question 2:
Part a (5.1):

The following code is written in C, where elements within the same row are stored contiguously.
Assume each word is a 64-bit integer.

for (I=0; I<8; I++)

for (J=0; J<8000; J++)
A[I][J]=B[I][0]+A[J][I];

i. How many 64-bit integers can be stored in a 16-byte cache block?

2 => 1 64-bit integer will take 8 bytes, so 2 in a block.

ii. Which variable references exhibit temporal locality?

I : accessed repeatedly in both loops

J : accessed repeatedly in inner loop
B[I][0] : accessed repeatedly in inner loop

iii. Which variable references exhibit spatial locality?

A[I][J] : A[I][0], A[I][1], A[I][2], …. are accessed successively inner loop. They are
stored contiguously in memory.

Locality is affected by both the reference order and data layout. The same computation can also
be written below in Matlab, which differs from C in that it stores matrix elements within the
same column contiguously in memory.

for I=1:8
for J=1:8000
A(I,J)=B(I,0)+A(J,I);
end
end

iv. Which variable references exhibit temporal locality?

I : accessed repeatedly in both loops

J : accessed repeatedly in inner loop
B (I, 0) : accessed repeatedly in inner loop

v. Which variable references exhibit spatial locality?

A (J, I) : A (0, I), A (1, I), A (2, I), … accessed one after other in inner loop. They are
contiguous in RAM as Matlab stores 2D array in column wise.

B (I, 0) : A (0, 0), A (1, 0), A (2, 0), … accessed one after other in outer loop. They are
contiguous in RAM as Matlab stores 2D array in column wise.

Page 6 of 12
Part b (5.2.3): Given the following addresses in binary (ignore the underscores) by a processor,
in the given order, which of the following cache configurations will result in the maximum hit
rate? Assume all caches are direct mapped and the total size of each cache is 32 bytes (8
words). For each case, assume initially cache is empty and all valid bits are zero. For each
option, indicate how many bits of the address will be used for offset, index and tag. For each of
them explain in a line or two how did you reach this conclusion.

Option 1: a cache with block size of 1 word:

Block size = 1 word = 4 bytes  2 least significant bits for byte offset within a block
Num blocks = total_words / block_size = 8/1 = 8  Next 3 bits for block index (23 = 8)
Tag = rest of the bits

Option 2: a cache with block size of 2 words:

Block size = 2 words = 8 bytes  3 least significant bits for byte offset within a block
Num blocks = total_words / block_size = 8/2 = 4  Next 2 bits for block index (22 = 4)
Tag = rest of the bits

Block size = 4 words = 16 bytes  4 least significant bits for byte offset within a block
Num blocks = total_words / block_size = 8/4 = 2  Next 1 bit for block index (21 = 2)
Tag = rest of the bits

In the above cases, option 2 has the highest hit rate with 2 hits out of 12 access (100*2/12 =
16.67%), therefore it’s the most optimized configuration for this series of memory accesses.

Page 8 of 12
Part c:
The average memory access time (AMAT) for a microprocessor with 1 level of cache is 2.4 clock cycles
- If data is present and valid in the cache, it can be found in 1 clock cycle
- If data is not found in the cache, 80 clock cycles are needed to get it from off-chip memory
i. What is the miss rate for this L1 cache?

We must first determine the miss rate of the L1 cache to use in the revised AMAT formula:
AMAT = Hit Time + Miss Rate x Miss Penalty
2.4 = 1 + Miss Rate x 80
Miss Rate = 1.75%

Designers are trying to improve the average memory access time to obtain a 65% improvement in
average memory access time, and are considering adding a 2nd level of cache on-chip.
- This second level of cache could be accessed in 6 clock cycles
- The addition of this cache does not affect the first level cache’s access patterns or hit times
- Off-chip accesses would still require 80 additional cycles.
ii. What should be the hit rate for this L2 cache in order to achieve the desired speedup?

There was confusion about the terminology about this question, i.e.,
65% improvement means 65% speedup, or 65% improvement means reducing
the time by 65%, or reducing the time to 65%,

L2 miss rate signifies percentage of total accesses, or percentage of L2 accesses,

so there are different answers by different students.

I am therefore awarding full marks to everyone in this question.

Answer 1:
Next, we can calculate the target AMAT … i.e. AMATwith L2:
Speedup = Time (old) / Time (new)
1.65 = 2.4 / Time (new)
Time (new) = 1.4545 clock cycles
We can then again use the AMAT formula to solve for highest acceptable miss rate in L2:
1.4545 = 1 + 0.0175 x (6 + (Miss RateL2)(80))
Solving for Miss RateL2 suggests that the highest possible miss rate is ~24.96%
Thus, as the hit rate is (1 – Miss Rate), the L2 hit rate must be ~75%.

Answer 2:
New time = 65% of old time = 0.65 * 2.4 = 1.56

1.56 = 1 + (0.0175 * 6) + ((Miss RateL2)*80) => Miss RateL2 = 0.56% ; L2 miss rate is % of total
accesses
1.56 = 1 + (0.0175 * 6) + ((0.0175 * Miss RateL2)*80) => Miss RateL2 = 32.5% ; L2 miss rate is %
of L2 accesses

Answer 3:
New time = (1-65%) of old time = 0.35 * 2.4 = 0.84

0.84 = 1 + (0.0175 * 6) + ((Miss RateL2)*80) => Miss RateL2 = -0.33% ; L2 miss rate is % of total
accesses

Page 9 of 12
0.84 = 1 + (0.0175 * 6) + ((0.0175 * Miss RateL2)*80) => Miss RateL2 = -18.9% ; L2 miss rate is %
of L2 accesses

Page 10 of 12
Question 3:

Part a: Your future self is taking the course of Parallel Processing in your 7 th semester. You are
asked to write a very compute intensive program. You can make 90% of the program parallel,
with 10% of it being sequential.

What speedup can you get on 10 processors?

This is solved using Amdahl’s Law.

Sp = 1 / (f + (1 - f)/p) = 1 / (.1 + (.9)/10) = 1 / .19 = 5.26

where f = portion of program non-parallelizable, and p=number of processors

What would be the maximum speedup on an infinite number of processors?

max Sp = 1 / (.1 + .9/∞) = 1 / .1 = 10

Page 11 of 12
part b:

Briefly explain (few lines each) the workings of SISD, SIMD, MIMD, and SPMD processors.

SISD Single Instruction Single Data:

A single instruction stream operate on single pieces of data. It’s the classic case, e.g., an add instruction adds two
integers:

add x1 x2 x3

Here a single program’s instructin stream instructions manipulate individual data items.

SIMD Single Instruction Multipe Data:

A single instruction stream operate on multiple pieces of data. It’s the classic case, e.g., an add instruction adds
multiple integers. Such data is loaded in SIMD or vector registers who have large widths and can store multiple
data items in the same register. So when an add instruction adds two such registers, it is in fact adding multiple in-
tegers at the same time.

vadd.vv v1 v2 v3

will add 16 pairs of integers simultaneousy assuming v1 v2 and v3 are 512 bits wide and integers are 32 bits.

MIMD Multipe instruction Multiple Data:

Here multiple instruction streams are being executed simultaneoulsy, each with its own data. A typical example is
a multicore processor running multiple programs simultaneoulsy, one on each core, and each program is manipu-
lating its own data.

SPMD Single Program Multiple Data:

It is like the above MIMD in that there are multiple cores but these cores are running instructions from the same
instruction stream (same program) albeit each core may be manipulating different data. A simple example will be
a program calculating the sum of elements of a very large array runs the summation code in parallel on multiple
cores, each calculating the sum of a part of the array. For a quad core system, the array can be divided into 4 por-
tions and each core can run the summation code on its 1/4th of the array calculating partial sums which can then
later be added to calculate the total sum.

Page 12 of 12

COSS - 2022-23 Question Paper
No ratings yet
COSS - 2022-23 Question Paper
6 pages
2011 Quiz 4 Sol
No ratings yet
2011 Quiz 4 Sol
17 pages
Midterm Discussion
No ratings yet
Midterm Discussion
28 pages
Answer:: Remark
No ratings yet
Answer:: Remark
72 pages
Computer Architecture - Mid - Solution
No ratings yet
Computer Architecture - Mid - Solution
25 pages
Final Exam Solution - Test Paper Final Exam Solution - Test Paper
No ratings yet
Final Exam Solution - Test Paper Final Exam Solution - Test Paper
15 pages
ENCE361 Exam 2024
No ratings yet
ENCE361 Exam 2024
25 pages
Midterm Sol
No ratings yet
Midterm Sol
7 pages
ITCS 321 Test TWO DEC 2018 KEY
No ratings yet
ITCS 321 Test TWO DEC 2018 KEY
6 pages
ARM Assembly ProblemsF
No ratings yet
ARM Assembly ProblemsF
8 pages
CH02 Solution
No ratings yet
CH02 Solution
10 pages
ECE391 Final 21-12-2021 Answer
No ratings yet
ECE391 Final 21-12-2021 Answer
6 pages
Final Raw
No ratings yet
Final Raw
8 pages
COMP1411 Final Exam Question Book
No ratings yet
COMP1411 Final Exam Question Book
10 pages
Pediatric Demyelinating Diseases of The Central Nervous System and Their Mimics
100% (1)
Pediatric Demyelinating Diseases of The Central Nervous System and Their Mimics
338 pages
111 Computer Organization - Midterm
No ratings yet
111 Computer Organization - Midterm
6 pages
03 - CPU Memory Program Execution Assembly - Exercise Sheet (Solutions) - 1504219614
No ratings yet
03 - CPU Memory Program Execution Assembly - Exercise Sheet (Solutions) - 1504219614
6 pages
Archmidsem 2009 Sol
No ratings yet
Archmidsem 2009 Sol
5 pages
Coverage
No ratings yet
Coverage
22 pages
Sheet 2 Answers
No ratings yet
Sheet 2 Answers
6 pages
Ca Mid1 2017
No ratings yet
Ca Mid1 2017
9 pages
Final Exam Solution - Test Paper Final Exam Solution - Test Paper
No ratings yet
Final Exam Solution - Test Paper Final Exam Solution - Test Paper
82 pages
2001 Spring Exam1 Sol
No ratings yet
2001 Spring Exam1 Sol
6 pages
APS105H1 - 20149 - 631494869194APS105 Final Exam 2
No ratings yet
APS105H1 - 20149 - 631494869194APS105 Final Exam 2
18 pages
Cao 2021 HW2
No ratings yet
Cao 2021 HW2
4 pages
Coss
No ratings yet
Coss
2 pages
Solution Manual of Cmputer Organization and Architectur
44% (27)
Solution Manual of Cmputer Organization and Architectur
29 pages
Midterm Sample Answer: Instructor: Cristiana Amza Department of Electrical and Computer Engineering University of Toronto
No ratings yet
Midterm Sample Answer: Instructor: Cristiana Amza Department of Electrical and Computer Engineering University of Toronto
18 pages
Ca PDF
No ratings yet
Ca PDF
10 pages
CS205 Assignment1 Solution
No ratings yet
CS205 Assignment1 Solution
7 pages
En m3 Ex Sol
No ratings yet
En m3 Ex Sol
35 pages
Exam19s2 Answers
No ratings yet
Exam19s2 Answers
12 pages
ECE 252 - Quiz - 1 - Solutions
No ratings yet
ECE 252 - Quiz - 1 - Solutions
5 pages
Final Exam: 15-213 Introduction To Computer Systems
No ratings yet
Final Exam: 15-213 Introduction To Computer Systems
17 pages
ECE391 Final Sem202 Solution
No ratings yet
ECE391 Final Sem202 Solution
5 pages
ODI Interview Questions and Answers
88% (8)
ODI Interview Questions and Answers
13 pages
Hw5 Solution
No ratings yet
Hw5 Solution
11 pages
COA QP PerformanceQuestions MUST READ
No ratings yet
COA QP PerformanceQuestions MUST READ
4 pages
Cisc 530 Midterm Fall 2022-1
No ratings yet
Cisc 530 Midterm Fall 2022-1
19 pages
CH02 Solution-1 PDF
No ratings yet
CH02 Solution-1 PDF
10 pages
Pooja Vashisth
No ratings yet
Pooja Vashisth
35 pages
ECE3073 PA Validation and Testing Answers PDF
No ratings yet
ECE3073 PA Validation and Testing Answers PDF
6 pages
Coss 2
No ratings yet
Coss 2
2 pages
40 Out
No ratings yet
40 Out
80 pages
DucHuy CA Lab2 2021
No ratings yet
DucHuy CA Lab2 2021
25 pages
Past Paper 10015 21 - Q
No ratings yet
Past Paper 10015 21 - Q
20 pages
Embedded C Programming
100% (1)
Embedded C Programming
57 pages
CENG400-Midterm-Fall 2014
No ratings yet
CENG400-Midterm-Fall 2014
9 pages
Lesson 1
No ratings yet
Lesson 1
4 pages
CPE 221 Final Exam Solution Fall 2018
No ratings yet
CPE 221 Final Exam Solution Fall 2018
6 pages
Computer Arch Test
No ratings yet
Computer Arch Test
8 pages
Practice Final Soln
No ratings yet
Practice Final Soln
17 pages
ECE391 Final 8-8-2021 Solution-1
No ratings yet
ECE391 Final 8-8-2021 Solution-1
6 pages
Quiz Questions
No ratings yet
Quiz Questions
2 pages
18s Cpe221 Final Solution
No ratings yet
18s Cpe221 Final Solution
7 pages
QMM Report Tata Steel
100% (1)
QMM Report Tata Steel
33 pages
Looting in Kenya-Kroll Report (Hapa Kenya Version)
100% (7)
Looting in Kenya-Kroll Report (Hapa Kenya Version)
101 pages
40 câu hỏi giao tiếp
No ratings yet
40 câu hỏi giao tiếp
17 pages
OSS Engine Parts Section
No ratings yet
OSS Engine Parts Section
28 pages
Revision Problems ECE3073/TRC3300 Weeks 7-12
No ratings yet
Revision Problems ECE3073/TRC3300 Weeks 7-12
9 pages
Personal Development Plan
No ratings yet
Personal Development Plan
2 pages
SCM 100 Review
No ratings yet
SCM 100 Review
23 pages
1) Convert The C Function Below To Ia-32 Assembly Language.: Compe 271 Mid-Term Exam #2 Fa16
No ratings yet
1) Convert The C Function Below To Ia-32 Assembly Language.: Compe 271 Mid-Term Exam #2 Fa16
4 pages
Soumya Ranjan Dash - Es20913
No ratings yet
Soumya Ranjan Dash - Es20913
1 page
Sessional Marks (Theory)
0% (1)
Sessional Marks (Theory)
1 page
EMD001 - Medical Companion
No ratings yet
EMD001 - Medical Companion
115 pages
01 - Assignment TX Line Solutions
100% (2)
01 - Assignment TX Line Solutions
4 pages
CHM2032L Lab Manual 8 Spectrophotometry Yavuz-Petrowski Fall 2021 Tde88JS
No ratings yet
CHM2032L Lab Manual 8 Spectrophotometry Yavuz-Petrowski Fall 2021 Tde88JS
21 pages
2009 CO Midterm - Sol
No ratings yet
2009 CO Midterm - Sol
11 pages
REFERENCES
No ratings yet
REFERENCES
7 pages
Chapter 2 Basic Physics of Semiconductors
No ratings yet
Chapter 2 Basic Physics of Semiconductors
42 pages
Book 3 Unit 8. Communicating With Staff: Group Name: 4 Arya Nugroho Indri Novianti Rahayu Yiyin
No ratings yet
Book 3 Unit 8. Communicating With Staff: Group Name: 4 Arya Nugroho Indri Novianti Rahayu Yiyin
10 pages
Kerry Anderson Resume 2017 Weebly
No ratings yet
Kerry Anderson Resume 2017 Weebly
3 pages
Cambridge First B2, Test Book 1, Test 2, Speaking Script For Exam Practice.
No ratings yet
Cambridge First B2, Test Book 1, Test 2, Speaking Script For Exam Practice.
4 pages
What Is Weather in Canada
No ratings yet
What Is Weather in Canada
5 pages
Pre-Call Report 4 Template - MKT3403-S2020
No ratings yet
Pre-Call Report 4 Template - MKT3403-S2020
5 pages
Mobilink Packages FF
No ratings yet
Mobilink Packages FF
6 pages
ES Unit 3 Solutions: Solution
No ratings yet
ES Unit 3 Solutions: Solution
11 pages
Spectrum MediaStore5000 Datasheet PDF
No ratings yet
Spectrum MediaStore5000 Datasheet PDF
2 pages
Grade 6 2nd Q Final
No ratings yet
Grade 6 2nd Q Final
5 pages
FT-14D Digital Flexitest™ Switch
No ratings yet
FT-14D Digital Flexitest™ Switch
4 pages
Balloon Tutorial
No ratings yet
Balloon Tutorial
19 pages
Service Manual: DSC-P10/P12
No ratings yet
Service Manual: DSC-P10/P12
1 page
Lesson Planning in Teaching
No ratings yet
Lesson Planning in Teaching
10 pages
KYC Template Individual AnnexB1
No ratings yet
KYC Template Individual AnnexB1
1 page
Mentee Application Form: PERIOD: 1 October 2022 - 1 May 2023
No ratings yet
Mentee Application Form: PERIOD: 1 October 2022 - 1 May 2023
2 pages
Principles of Digital Electronics
From Everand
Principles of Digital Electronics
Sapana Rane
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

CS222 - COAL - SOLUTION - Final - Spring2023

Uploaded by

CS222 - COAL - SOLUTION - Final - Spring2023

Uploaded by

GIK Institute of Engineering Sciences and Technology, Topi

Spring 2023 (FCSE) Final Exam

Course Code: CS222 Course Name: Computer Organization and Assembly

Instructor name: Taj Muhammad Khan Section: AI

Vetter name: Vetter signature:

Student Name: Registration No:

Time: 150 minutes. Max Marks: 80

Question Total Marks Obtained

Question 1 (10+10+10 marks) :

= 1 × 50 × 106 + 1 × 110 × 106 + 4 × 80 × 106 + 2 × 16 × 106

Time TCPU = clock cycles/clock rate = (512 × 106) / (2 × 109) = 0.256 s

CPIimproved fp = (256 − 462)/50 < 0

= 1 × 50 × 106 + 1 × 110 × 106 + 4 × 80 × 106 + 2 × 16 × 106

Time TCPU = clock cycles/clock rate = (512 × 106) / (2 × 109) = 0.256 s

= 0.6 × 1 × 50 × 106 + 0.6 ×1 × 110 × 106 + 0.7 × 4 × 80 × 106 + 0.7 × 2 ×

= (30 + 66 + 224 + 22.4) × 106

It swaps the two halves of the array, i.e., given an array

it would make the array {5, 6, 7, 8, 1, 2, 3, 4}.

f2( int arr[], int N) {

See comments above for details.

for(i=0; i<a; i++)

for (I=0; I<8; I++)

i. How many 64-bit integers can be stored in a 16-byte cache block?

2 => 1 64-bit integer will take 8 bytes, so 2 in a block.

ii. Which variable references exhibit temporal locality?

I : accessed repeatedly in both loops

iii. Which variable references exhibit spatial locality?

iv. Which variable references exhibit temporal locality?

I : accessed repeatedly in both loops

v. Which variable references exhibit spatial locality?

Option 1: a cache with block size of 1 word:

Address Tag Index Offset inside Hit or miss?

Option 2: a cache with block size of 2 words:

Address Tag Index Offset inside Hit or miss?

Address Tag Index Offset inside Hit or miss?

L2 miss rate signifies percentage of total accesses, or percentage of L2 accesses,

so there are different answers by different students.

I am therefore awarding full marks to everyone in this question.

What speedup can you get on 10 processors?

This is solved using Amdahl’s Law.

where f = portion of program non-parallelizable, and p=number of processors

What would be the maximum speedup on an infinite number of processors?

max Sp = 1 / (.1 + .9/∞) = 1 / .1 = 10

SISD Single Instruction Single Data:

SIMD Single Instruction Multipe Data:

MIMD Multipe instruction Multiple Data:

SPMD Single Program Multiple Data:

You might also like