0% found this document useful (0 votes)
56 views48 pages

COA - Advanced Sheet 2023

Uploaded by

as5626531
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views48 pages

COA - Advanced Sheet 2023

Uploaded by

as5626531
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

GATE Computer Science & IT

ADVANCED
COMPUTER
ORGANIZATION AND
ARCHITECTURE

Practice Questions
Booklet
ANALYSIS OF CSO IN GATE PAPER

Years Marks
2015 5
2016 8
2017 12
2018 10
2019 9
2020 10
2021 Set -1 8
2021 Set -2 9

CSO GATE SYLLABUS

 Number System: Octal, Hexadecimal and Decimal Representation,


Complements, l's Complement, 2's Complement, Fixed-Point Representation,
Floating-Point Representation, Binary Codes etc.
 Cache Memory: Associative Mapping, Direct Mapping, Set-Associative
Mapping and Writing into Cache etc.
 Instruction Sets: Addressing Modes and Formats: Three-Address Instructions,
Two -Address Instructions, One-Address Instructions and Zero-Address
Instructions, Data Transfer and Manipulation Instruction etc.
 Assembly Language: Rules of the Language, Translation to Binary etc.
 Instruction Pipelining: Ideal Instruction Pipeline, Data Dependency, Handling
of Branch Instructions etc.
 External Memory: Disk Structure and Disk Data transfer.
 Input-Output Organization: Input-Output Interface, Memory-Mapped I/O,
Modes of Transfer, Programmed Interrupt-Initiated I/O, Priority Interrupt,
Direct Memory Access (DMA), DMA Controller and DA Transfer etc.
 ALU, Data‐Path and Control Unit.

CSO GATE REFERENCE BOOKS

 Computer Organization by Carl Hamature


 Computer System and Architecture by M. Morris Mano
 Computer Organization and Architecture by William Stallings
 Computer Organization and Design by David A. Patterson and John L. Henness
UNITS AND ABBREVIATIONS

 Units of File size (how big a file is on your computer) is usually measured in
units of "kilobytes", "megabytes", and "gigabytes." In this computing (binary, but
not data transfer) usage, 'K' (uppercase) represents a multiplier of 1,024,
„B’(uppercase) represent bytes and „b‟(lowercase) represent bits. Other
abbreviations use this same base of 1,024:
 1 KB (one KiloByte) = 1,024 Bytes (approximately 1 thousand Bytes)

 1 MB (one MegaByte) = 1,024 KB (approximately 1 million Bytes)


 1 GB (one GigaByte) = 1,024 MB (approximately 1 billion Bytes)
 1 TB (one TeraByte) = 1,024 GB (approximately 1 trillion Bytes)
 Units of Data transfer on the other hand is expressed in bits. In this computing
(data transfer) usage 'k' (lowercase) represents a multiplier of 1,000. In bit rates
the abbreviations are as follows:
 1 kbps = 1,000 bits per second
 1 Mbps = 1,000,000 bits per second.
 1 Gbps = 1,000,000,000 bits per second.
Where:-
kbps (kilobits/sec) means thousands of bits per second (where "thousand"= 103)
mbps or Mbps (megabits/sec) means millions of bits per second (where
"millions"= 106)
gbps or Gbps (gigabits/sec) means billions of bits per second (where "billion"=
109)
tbps (terabits/sec) means trillions of bits per second (where "trillions"= 1012)
pbps (petabits/sec) means quadrillions of bits per second (where "trillions"=
1015)
 Units of Time
 1 second = 103 milliseconds (ms)
 1 second = 106 microseconds (s)
 1 second = 109 nanoseconds (ns)
 1 second = 1012 picoseconds (ps)
 1 second = 1015 femtoseconds (fs)
 1 milliseconds (ms) = 10-3seconds
 1 microseconds (s) = 10-6seconds
 1 nanoseconds (ns) = 10-9seconds
 1 picoseconds (ps) = 10-12seconds
 1 femtoseconds (fs) = 10-15secon
COMPUTER ORGANIZATION AND ARCHITECTURE
Number System
Q1. For the number -74, which of the following option has the correct 8-bit binary
values for 1‟s complement, 2‟s complement and signed magnitude respectively?
(a) 10110110, 10110101, 11001010
(b) 10110101, 10110110, 11001010
(c) 10110101, 10110100, 11001010
(d) 11001010, 10110110, 10110101
Q2. Match the following expression with their result
Expression Result
1. (1A1B1)16 - (BDE8)16 a. (AE15)16
2. (15DA2)16 - (884F)16 b. (D553)16
3. (16D14)16 - (BEFF)16 c. (E3C9)16
d. (8918)16
(a) 1-b, 2-d, 3-a (b) 1-c, 2-b, 3-d
(c) 1-c, 2-b, 3-a (d) 1-b, 2-c, 3-d

Q3. [MSQ]
The following binary numbers are 4-bit 2's complement binary numbers. Which
of the following operations generate overflow?
(a) 0011 + 1100
(b) 0111 + 1111
(c) 1110 + 1000
(d) 0110 + 0010

Q4. For the relation (2x)9 = (3y)7, the possible values of x and y are
(a) 4, 2 (b) 5, 2 (c) 8, 7 (d) 9, 6

Q5. Consider the equation (123)5 = (xY)8which x and y as unknown. The number of
possible solutions is___________
(a)3 (b) 4 (c) 5 (d) 6

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 1


Q6. Let us consider the following equation in a 6-bit binary number system X = A + B
A is given as (001010)2 in 1‟s complement binary number system, B is given as
(111010)2 in signed number system. What would be X in 2‟s complement number
system?
(a) (110000)2 (b) (010000)2 (c) (101011)2 (d) (110101)2

Q7. The number of bytes required to represent the decimal number 220 in packed
BCD (Binary Coded Decimal) form is __________.
Q8. A signed integer has been stored in a byte using the 2's complement format. We
wish to store the same integer in a 16-bit word. We should copy the original byte
to the less significant byte of the word and fill the more significant byte with
(a) 0
(b) 1
(c) equal to the MSB of the original byte
(d) Complement of the MSB of the original byte.

Q9. Let (32) = 𝑃1 × 𝑃2 where𝑃1 and 𝑃2 are primes. What is/are the value of base 𝒃?
(a) 8 (b) 9 (c) 11 (d) 13

Q10. Consider the following representation of a number in IEEE 754 single-precision


floating point format with a bias of 127.
S:1 E: 10000010 F: 1101100….0
Here S, E and F denote the sign, exponent, and fraction components of the
floating-point representation. The decimal value corresponding to the above
representation (rounded to 2 decimal places) is ____________.
Q11. The format of the single-precision floating point representation of a real number
as per the IEEE 754 standard is as follows:
1-bit Sign | 8-bitsExponent | 23-bits Mantissa
A float type variable X is assigned the decimal value of -2-147. The representation
of X in hexadecimal notation is
(a) 0x80000001 (b) 0x80000002
(c) 0x80000003 (d) 0x80000004

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 2


Q12. Consider three float type variables X, Y, and Z that store numbers in IEEE-754
single precision floating point format. Assume that X, Y and Z contain the values
(in hexadecimal notation) 0xC1500000, 0xC0800000 and 0x40500000
respectively. Then which of the following relations is/are TRUE between X, Y and
Z?
(a) X = Z – Y (b) Z = X * Y (c) Z = X/Y (d) X = Z*Y
Q13. Suppose we have a 7-bit computer that uses IEEE floating-point arithmetic
where a floating-point number has 1 sign bit, 3 exponent bits, and 3 fraction
bits. Recall that denormalized numbers will have an exponent of 000, and the
bias for a 3-bit exponent is 23–1 – 1 = 3. Consider three float type variables a, b,
and c that store numbers in above mentioned floating-point format. Assume that
a, b and c contain the values (1110111)2, (0110111)2 and (0000001)2
respectively. Which of the following relations is/are FALSE between a, b and c?
(a) a ⨯ (b + c) = a ⨯ b + a ⨯ c (b) a ⨯ (b ⨯ c) = (c ⨯b) ⨯a
(c) a + (b + c) = (a + b) + c (d) None of the above
Q14. Consider three registers R1, R2, and R3 that store numbers in IEEE-754 single
precision floating point format. Assume that R1 and R2 contain the values (in
hexadecimal notation) 0xC1280000 and 0x417C0000, respectively.
If R3 = R1/R2, what is the most accurate value stored in R3?
(a) 0x3EAAAAAB (b) 0xBEAAAAAB
(c) 0x3F2AAAAB (d) 0xBF2AAAAB

Q15. Consider a new base system for numbers based on 4s. So, every number is going
to represented as a power of 4. A base 4 system has 3 symbols at most
(0...3). e.g., 224 is equal to 2 ⨯ 41 + 2 ⨯ 40 = 8 + 2 = 1010
What is the largest number that this system can represent in 6 digits (assume
unsigned)?
6 6 5 5
(a) 𝑖=0 4 × 3𝑖 (b) 𝑖=0 3 × 4𝑖 (c) 𝑖=0 3 × 4𝑖 (d) 𝑖=0 4 × 3𝑖
Q16. The IEEE single precision floating point standard allows us to represent less
than 232 different numbers. Of these numbers how many are strictly between 2 −5
and 2−4?
(a)223 + 1 (b) 223 - 1 (c) 223 - 2 (d) 222

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 3


Data for the next three questions, consider a new 8-digit floating-point "minifloat"
format with base 3 as S EEE MMMM (1 sign trit, 3 exponent trits, 4 mantissa trits).
All other properties of IEEE754 apply (bias, denormalized numbers, Infinity, NaNs,
etc), which includes normalized numbers having an implicit leading 1 and
denormalized numbers having an implicit leading 0. The sign digit only takes values
of 0 and 1. So the representation of normalized and denormalized number is:
Normalized: (-1)sign⨯ 3exponent + bias ⨯1.mantissa
Denormalized: (-1)sign⨯ 3exponent + bias + 1⨯ 0.mantissa
Assume that the biasing value in this format is (-10)10.
Q17. The decimal equivalent of the minifloat 0 111 0201 is_______(Rounded off to two
decimal places)

Data for the next four question, consider the following two changes in the IEEE754
single precision floating point format:
 Option 1: Adding a bit to signficand and removing a bit from the exponent
 Option 2: Adding a bit to the exponent and removing a bit from the significand
For each of the following questions select whether option 1, option 2, neither, or both
will accomplish the presented task. Assume that the bias also shifts to be 2exp-bits - 1 – 1.
Q18. Represent pi(Π) more accurately than our IEEE 754 single precision floating
point.
(a) Option 1 (b) Option 2 (c) Neither (d) Both

Q19. Represent smaller positive numbers than IEEE 754 single precision floating
point.
(a) Option 1 (b) Option 2 (c) Neither (d) Both

Q20. Represent more numbers in the range [1, 2) than IEEE 754 single precision
floating point.
(a) Option 1 (b) Option 2 (c) Neither (d) Both

Q21. Represent more numbers than IEEE 754 single precision floating point.
(a) Option 1 (b) Option 2 (c) Neither (d) Both
Data for the next three question, assume that IEEE decided to add a new 12-bit

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 4


representation, with its main characteristics consistent with the other IEEE standards.
The format is similar to IEEE754 sign|exponent|mantissa. In this 12-bit
representation, the value 155/256 is represented exactly as 001000110110. Assume
that biasing used in the format is 2exp_bits - 1 – 1.
Q22. How many bits are used as exponent in this representation?
(a) 2 (b) 3 (c) 4 (d) 5
Q23. How many bits are used as mantissa in this representation?
(a) 5 (b) 6 (c) 7 (d) 8
Q24. In this 12-bit representation, what is the smallest positive number that can be
represented?
(a) 2-1 (b) 2-2 (c) 2-3 (d) 2-4
Q25. The range of representable normalized numbers in the IEEE floating–point
binary fractional representational a 32 – bit word with 1 bit sign, 8–bit excess
128 biased exponent, and the 23- bit mantissa is
(a) 2-180 to (1-2-23) × 2127 (b) (1-2-23) × 2-127 to 2128
(c) (1-2-23) × 2-127 to 233 (d) 2-128 to (2 - 223) × 2127

Q26. To find the 10‟s complement of an N-digit decimal number Y, we do 10N – Y.


Which ONE of the following statements is TRUE?
(a) In an N-digit 10‟s complement number system, the leftmost digit has a weight
of 10N
(b) In an N-digit 10‟s complement number system, the leftmost digit has a weight
of 10N-1
(c) In an N-digit 10‟s complement number system, the leftmost digit has a weight
of -10N
(d) In an N-digit 10‟s complement number system, the leftmost digit has a weight
of -10N-1
Q27. Circle all statements below that are TRUE on a 32-bit architecture:
(a)It is possible to lose precision when converting from an int to a float.
(b)It is possible to lose precision when converting from a float to an int.
(c)It is possible to lose precision when converting from an int into a double.
(d)It is possible to lose precision when converting from a double into an int.

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 5


Cache Memory
Data for the next two questions, consider a cache optimization known as “sub-
blocking” (also called “sectored caches”):
• The number of sets and ways is unchanged.
• Each cache block is broken up into smaller “sub-blocks”.
– Each sub-block has its own valid bit.
– On a cache miss, only the cache sub-blocks accessed by the user‟s program are
loaded in.
∗ Other sub-blocks remain in the “invalid” state until they are also loaded in.
• Make sure you understand that “sets” are not “sub-blocks”!
Suppose that we have an 8 KB, two-way set associative cache with 4-word (16-byte)
cache lines. The replacement policy is LRU. Each cache block is broken up into four
smaller sub blocks. Evaluate the following two loops:
LoopA LoopB
sum = 0;
for (int i = 0; i < 128; i++)
for (int j = 0; j < 32; j++)
sum += buf[i*32 + j]; sum = 0;
for (int j = 0; j < 32; j++)
for (int i = 0; i < 128; i++)
sum += buf[i * 32 + j];
Assume that size of each array element is 4 bytes.
Q28. What is the number of misses for Loop A and for Loop B with the sectored
cache?
(a) 1024, 1024 (b) 1024, 4096
(c) 4096, 1024 (d) 4096, 4096

Q29. What is the number of misses for Loop A and for Loop B if the cache is not
sectored (i.e. no sub-blocks)? (a) 1024,
1024 (b) 1024, 4096
(c) 4096, 1024 (d) 4096, 4096

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 6


Q30. Consider a system with the inner-level (L1) and outer-level (L2) cache. Suppose
that we removed an outer-level cache to free up area on our chip. With this new
area, we doubled the size of our L1 cache. Suppose that this optimization
worsened the L1 hit time from 1 cycle to two cycles, and increased the miss
penalty from 50 cycles to 100 cycles. Before this optimization, the L1 miss rate
was 10%. What does the new miss rate (in %) have to be for our new
optimization to improve the average L1 cache access time? ____________
Q31. Consider the following two functions:
void versionA(int *A, int *B, int n) { void versionB(int *A, int *B, int n) {
for (int i = 0; i < n; ++i) for (int j = 0; j < 8; ++j)
for (int j = 0; j < 8; ++j) for (int i = 0; i < n; ++i)
A[i] = A[i] * B[j]; A[i] = A[i] * B[j];
} }
Assuming n is very large and the compiler does not reorder or omit memory
accesses to the arrays A or B (or keep values from A or B in registers), which is
likely to experience fewer cache misses? Assume that both array A and B store in
row major order in memory.
(a) versionA
(b) versionB
(c) They will have about the same number of cache misses.
(d) Cannot say anything
Data for the next four questions, Let A be a 1024×1024 matrix of 32-bit int elements
stored in row-major order, aligned to the beginning of a cache line.
for (int i = 1; i < 16; i++) {
int x = A[0][i-1];
int y = A[i][i];
A[i][i] = x + y;
}
Assume that memory accesses are executed in the order shown in the program – i.e.,
the compiler does not reorder load and store instructions. Variables x, y, and i are held
in registers.

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 7


Q32. Consider a 4 KB direct-mapped L1 data cache with 16-byte cache lines. Assume
that the cache is initially empty. The number of conflict misses occurs during the
execution of the above loop is_____________.

Q33. What is the average memory access time (AMAT) in cycles if the hit time of the
direct-mapped cache is 1 cycle and the L1 miss penalty to DRAM is 100 cycles?
____________(Rounded off two to decimal places)

Q34. Now suppose we double the capacity by switching to an 8 KB two-way set-


associative L1 data cache with LRU eviction and a write-allocate policy. Assume
that the cache is initially empty. The cache line size remains 16 bytes. The
number of conflict misses occurs during the execution of the above loop
is_____________.
Q35. What is the AMAT in cycles if the hit time of the two-way set-associative cache is
1 cycle and the L1 miss penalty to DRAM is 100 cycles? ________ (Rounded off
two to decimal places)
Q36. Consider a direct-mapped cache that has 4 lines, each containing one 4 bytes
word. Assume that the cache has a hit access time of 1 ns and a miss penalty of
50 ns (i.e., the full access time is 51 ns when there is a miss: the miss penalty +
the usual access time). Consider the following program fragment:
112: ADD R1, 0, 0 // R1 ⟵ 0 + 0
116: ADD R2, 0, 100 // R2 ⟵ 0 + 100
120: LOOP: ADD R3, R3, R2 // R3 ⟵ R3 + R2
124: ADD R1, R1, 1 // R1 ⟵ R1 + 1
128: BNE R1, R2, LOOP // if R1 ≠ R2 goto loop
What is the average memory access time(in ns) for the following program
fragment, assuming that none of the instructions are in the cache to begin with?
___________

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 8


Data for the next three questions, a direct-mapped cache consists of eight blocks. A
byte-addressable main memory contains 4K (212) blocks of eight bytes each. Access time
for the cache is 22 ns and the time required to fill a cache slot from main memory is
300 ns (this time will allow us to determine the block is missing and bring it into cache).
Assume a request is always started in parallel to both cache and to main memory (so if
it is not found in cache, we do not have to add this cache search time to the memory
access). If a block is missing from cache, the entire block is brought into the cache and
the access is restarted. Initially, the cache is empty.
Q37. Let the addresses of two consecutive bytes in main memory be (2022) 16 and
(2023)16. What are the tag and cache line address (in hex) for main memory
address (2022)16 respectively?
(a) 0x080, 0x4 (b) 0x202, 0x2(c) 0x020, 0x2 (d) 0x040, 0x4

Q38. What is the hit ratio (in %) of the program that loop 4 times from the locations 0
to 6710 in memory?_______________

Q39. What is the effective access time (in ns) of this program? _________ (Rounded off
to two decimal places)

Data for the next two questions, Solar radiation can randomly flip bits in the
computer system. Therefore, a cache on a space-faring vehicle, which is exposed to
solar radiation, utilizes error-correcting codes (ECC) for each of its cache blocks to
detect if bits have been flipped. These ECC bits add to the overhead of the cache, in
addition to the usual overhead bits such as valid bits and tags, etc. On a memory
access, the cache operates as normal, but in addition to checking hit/miss it will also
check if the content in the cache block has been corrupted. This is done by checking
the ECC bits. How exactly ECC bits are used to detect corruption is irrelevant to this
problem. If the ECC bits associated with a block indicate that the data in the block is
corrupted, that cache access is regarded as a cache miss. For the sake of the problem,
assume that the memory is incorruptible. The physical memory is byte addressable,
and is 64 KB in size. The cache is 1KB 2-way associative with the LRU replacement
policy. To implement LRU policy 1 bit per set is used to identify the least recently used
blocked. Each cache block can store 4B of data and contains 6 extra bits for the error-
correcting codes, 1 valid bit and tag bits.
Q40. What is the total size of memory needed at the cache controller to store
metadata (tags) for the cache?
(a) 4864 bits (b) 6144 bits (c) 6656 bits (d) 3712 bits

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 9


Q41. For this cache to function properly, which of the following cache policy should
be used upon a write hit? should it use a write-back policy or a write-through
policy upon a write hit?
(a) write-back policy with no write-allocate policy
(b) write-back policy with write-allocate policy
(c) write-through policy with no write-allocate policy
(d) write-through policy with write-allocate policy
Data for next two questions, consider the array defined using the following C code:
int array[5] = {0x123456, 0x234567, 0xDDDD8, 0xEEEE8, 0xFFFF8};
Suppose the array above is defined on a little-endian system where ints are four bytes
where the first byte of the array is stored at memory address 0x70000.
Q42. What is the value of the sixth byte of array (i.e., the byte at address 0x70005)?
(a) 0x23 (b) 0x45 (c) 0x00 (d) 0x67
Q43. What is the result of (*(array + 2)) & 0x37?
(a) 0x00 (b) 0x20 (c) 0x10 (d) 0x30

Q44. [MSQ]
Suppose one increased the size of cache blocks but kept the total data size and
associativity of a cache the same. Which of the following are likely results?
(a) increasing the number of conflict misses
(b) decreasing the number of conflict misses
(c) decreasing the number of compulsory misses
(d) increasing the number of compulsory misses
Data for next four questions, consider a system with a cache memory. You don‟t have
any information related to cache configuration (cache size, block size etc.) but you have
three patterns that access various bytes in the system

Assume that the cache is initially empty at the beginning of the first sequence, but not
at the beginning of the second and third sequences. The sequences are executed back-
to-back, i.e., no other accesses take place between the three sequences.
Q45. What is size of cache block?
(a) 8 bytes (b) 16 bytes (c) 32 bytes (d) 64 bytes

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 10


Q46. What is size of cache associativity?
(a) 2 way (b) 4 way (c) 8 way (d) 16 way
Q47. What is the size of cache?
(a) 4KB (b) 8KB (c) 16KB (d) 32 KB

Q48. Which of the following cache replacement policy is used?


(a) FIFO (b) LRU (c) MRU (d) LFU
Q49. Consider the following C code executed on a processor with an 8-way set-
associative cache with 16-byte cache lines and LRU replacement.
int A[32][256];
int sum = 0
for (int i = 0; i < 32; j++) {
for (int j = 0; j < 256; i++) {
sum += A[i][j];
}
Assume that size of each element of array is 4 bytes. What is the minimum
number of sets that this cache needs such that this code will only produce
compulsory misses? ___________
Data for the next four questions, the Average Memory Access Time equation (AMAT)
has three components: hit time, miss time, and miss penalty. For each of the following
cache optimizations, indicate which component of the L1 AMAT equation may be
improved.
Q50. Using a second-level cache
(a) hit time (b) miss rate
(c) miss penalty (d) none of the above
Q51. Using larger blocks
(a) hit time (b) miss rate
(c) miss penalty (d) none of the above
Q52. Using a smaller first-level cache
(a) hit time (b) miss rate
(c) miss penalty (d) none of the above
Q53. Using a larger first-level cache
(a) hit time (b) miss rate
(c) miss penalty (d) none of the above

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 11


Q54. Which of the following statements is/are TRUE?
(a) 8MB 2-way set-associative cache with a Most Recently Used (MRU)
replacement policy will always have better or equal cache performance when
compared to a 4MB direct-mapped cache.
(b) A direct mapped cache can sometimes have a higher hit rate than a fully
associative cache with an LRU replacement policy (on the same reference
pattern)
(c) Under demand paging, if we think of memory as a cache for disk, conflict
misses are not possible.
(d) Increasing the size of a cache always decreases the number of total cache
misses, all things held constant (replacement policy, workload, associativity).
Q55. You are asked to design a virtually indexed, physically tagged cache. A page
is 4096 bytes. The cache must have 64 lines of 256 bytes each. What
associativity must the cache have in order for there to be no aliasing?
(a) 1-Way (b) 2-way (c) 4-way (d) 8-way
Q56. A computer uses 32–bit virtual address, 32–bit physical address, and a three–
level paged page table organization. The processor used in the computer has a 1
MB 8-way set associative virtually indexed physically tagged cache. The cache
block size is 64 bytes and the page size is 8KB. What is the minimum number
of page colors needed to guarantee that no two synonyms map to different sets
in the processor cache of this computer?
(a) 4 (b) 8 (c) 16 (d) 32
Q57. TLBs entries have valid bits and dirty bits. Data caches have them also. Which
of the following is/are true?
(a) The valid bit means the same in both: if valid = 0, it must miss in both TLBs
and Caches.
(b) The valid bit has different meanings. For caches, it means this entry is valid
if the address requested matches the tag. For TLBs, it determines whether
there is a page fault (valid=0) or not (valid=1).
(c) The dirty bit means the same in both: the data in this block in the TLB or
Cache has been changed.
(d) The dirty bit has different meanings. For caches, it means the data block has
been changed. For TLBs, it means that the page corresponding to this TLB
entry has been changed.

Q58. Consider the cache which randomly reads the valid bit of any cache entry as 0

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 12


with probability p. In other words, if a cache entry is valid, the cache will
assume this entry is invalid with probability p. On the other hand, if the cache
entry is invalid, the cache will correctly assume it is invalid. The memory access
time is 40 ns, cache access time is 10ns, and probability of finding an entry in
cache (before reading valid bit) is 0.9. If effective access time (after reading the
valid bit) is 41 ns, what is the value of probability p?
(a) 0.25 (b) 0.5 (c) 0.75 (d) 0.33

Q59. Consider a system with 32-bit addresses, 4-byte words, and 4KB pages.
Demand paging can be thought of as using main memory as a cache for disk.
The properties of this cache are
(a) Fully associative, write-through and block size of 4KB
(b) Fully associative, write-back and block size of 4KB
(b) Set associative, write-through and block size of 1KB
(d) Set associative, write-back and block size of 1KB

Data for the next two question, assume an 8KB cache with 32B blocks, on a machine
that uses 32-bit virtual and physical addresses. Suppose the cache is 2 - way set
associative write-back cache that uses an LRU replacement policy. Consider the
following sequence of reads and writes:

Assume that caching happens at the block level, i.e., the unit of transfer between cache
and main memory is one block. Also, assume that a write leads to caching the data, the
same as a read. Initially, assume the cache is empty.
Q60. How many misses does the above access pattern exhibit?__________

Q61. How many writes are taking place between cache and memory?_________

Q62. Consider the following three type of data cache options:

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 13


Assume Load instructions have a CPI of 1.2 if they hit in the data cache and if
they miss, they have a CPI of (1.2 + Miss Penalty). All other instructions have a
CPI of 1.2. Furthermore assume 20% of the instructions are loads. 0
Which cache should you use if CPI is the only criterion?
(a) Direct-Mapped
(b) 2-way set associative
(c) 4-way set associative
(d) all give same performance
Q63. A pipelined processor with a 100MHz clock is connected to a 50MHz bus
through a 4Kbyte instruction and 8Kbyte data cache. Their block sizes are 16
bytes and 32 bytes, respectively. Do not assume any extra cycles for dirty block
copy-back for a cache fill since a write buffer will handle it later. Main memory
access time is 200ns for the first 32-bit data and the subsequent 32-bit data
takes 50ns. Assume instruction cache and data cache responds in the same
cycle when hit. Assume miss ratio is 6% for the instruction cache and 4% for
the data cache. Also, assume that 25% instruction refer data cache. What is the
system‟s MIPS? __________ (Rounded off to two decimal places)
Q64. [MSQ]
Which of the following statements is/are TRUE?
(a) The local miss rate of one level of a cache is always greater than the global
miss rate of that cache.
(b) Any cache miss that occurs when the cache is full is a capacity miss.
(c) The only way to remove capacity miss is to increase the cache capacity.
(d) The only way to reduce compulsory miss is to increasing the cache block
size.

Q65. [MSQ]

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 14


Which of the following statements is/are TRUE?
(a) For the same cache size and block size, a 4-way set associative cache will
have more index bits than a direct-mapped cache.
(b) The hit rate of a combined cache is usually worse than the two split caches
which have the same size in sum with the combined cache.
(c) The index of a cache block, together with the tag contents of that block,
uniquely specifies the memory address of the word contained in the cache
block.
(d) A direct mapped cache can sometimes have a higher hit rate than a fully
associative cache with an LRU replacement policy (on the same reference
pattern).
Q66. Suppose your system consists of:
 An L1 cache that has a hit time of 5 cycles and has a local miss rate of 20%.
 An L2 cache that has a hit time of 20 cycles and has a local miss rate of
15%.
 An L3 cache that has a hit time of 200 cycles and has a local miss rate of
5%.
 Main memory hits in 1000 cycles.
What is the global miss rate of L3 (in percent)?_________

Q67. Consider the following direct mapped cache:

To read a word from the cache, the input address is set by the processor. Then

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 15


the index portion of the address is decoded to access the proper row in the tag
memory array and in the data memory array. The selected tag is compared to
the tag portion of the input address to determine if the access is a hit or not. At
the same time, the corresponding cache block is read and the proper line is
selected through a MUX. Assume a 256-KB cache with 8- word (32-byte) cache
lines. The address is 32 bits and byte-addressed, so the two least significant
bits of the address are ignored since a cache access is word-aligned. The data
output is also 32 bits (1 word), and the MUX selects one word out of the eight
words in a cache line. Using the delay equations given in Table:

What is the access time (in ps) of the cache (the delay of the critical path)?
Assume that a 2-input gate (AND, OR) delay is 50 ps. _____________
Q68. A memory system uses three caches. The caches have hit rates and times given.
Cache hit rate hit time
I-cache 0.90 1 ns.
D-cache 0.80 5 ns.
L2 0.85 20 ns.
Main memory has an access time of 50 ns. Consider a program running on this
system has 30% Load and Store instructions. For an instruction access, first
the I-cache is checked. If there is a miss, the L2 cache is checked. If there is a
miss in the L2 cache, the main memory is checked. Similarly for data access
first the d-cache is checked. If there is a miss, the L2 cache is checked. If there
is a miss in the L2 cache, the main memory is checked. What is the effective
memory access time (in ns) of the system? _____________

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 16


Q69. Consider a system with CPI of 1.2 cycles assuming memory access always
results in cache hit. Processor run at 2 GHz. Consider a program running on
this system has 30% Load and store instructions. The processor has an I-cache
with miss rate of 2% and a D-cache with miss rate of 5%. The hit time is 1 clock
cycle. The miss penalty for both the cache is 50 ns. What is the effective
memory access time (in ns) of the system? _____________
Q70. Instruction execution time for a processor is 2 cycles and provides an on-chip
cache with access time of 1 cycle. Each instruction requires, on average, 1
memory access for the instruction and 2 accesses for data. The processor can
overlap instruction execution with access to the internal (L1) cache. The hit
ratio to the L1 instruction cache is 92% and the hit ratio to the L1 data cache is
87%. A unified external (L2) cache requires 6 cycles to access and has a global
hit ratio of 98%. The main memory requires 25 cycles to access. What is the
average MIPS rate of the processor if it is clocked at 400MHz? ______________
Data for the next two questions, suppose that in 1000 memory references there are
40 misses in the first-level cache and 20 misses in the second-level cache. Assume the
miss penalty from the L2 cache to memory is 200 clock cycles, the hit time of the L2
cache is 10 clock cycles, the hit time of L1 is 1 clock cycle, and there are 1.5 memory
references per instruction.
Q71. What is the average memory access time in cycles? ________

Q72. What are the average stall cycles per instruction in cycles? _________

Q73. Assume a k-set-associative cache where k > 1 and a direct-mapped cache both
have the same address size, same data capacity, and same number of index
bits. Which of the following statement is/are TRUE based on the given
information?
(a) The direct-mapped cache has more tag bits
(b) The direct-mapped cache has fewer tag bits
(c) The direct-mapped cache has more block offset bits
(d) The direct-mapped cache has fewer block offset bits
Q74. Consider a computer system without a cache, a memory access time of 300ns
and an average CPU operation time of 50 ns. The system spends 40% of its time
computing CPU operations and the rest for memory access. This system will be
upgraded with a L1 cache with an average access time of 30 ns and a hit ratio
of 85%. The main memory will also be upgraded at the same time and the
resulting system will be 4 times faster compared to the system prior to the
upgrade. What is the highest possible memory access time (in ns) of the
upgraded computer system, that fulfils the speed increment
criteria?______________(Rounded to the second decimal place).

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 17


ISA
Data for next two questions, consider a machine with 6 registers and 64 addresses
and two types of 16-bit instructions. The first type is type-A instructions that has 3
register operands. The second type is type-B instructions that has 1 address and 2
registers. Both instructions exist and the encoding space is completely utilized.
Q75. What is the maximum total number of instructions can be possible in this
machine?
(a)119 (b) 120 (c) 121 (d) 122
Q76. What is the minimum total number of instructions can be possible in this
machine?
(a) 23 (b) 22 (c) 21 (d) 20

Data for the next two questions, A machine has 12-bit instruction with two different
formats-one address instructions and two –address instructions. Each address is 4 bits
long and all instruction consists of only an opcode and the address(es). Assuming that
the instruction encoding space is completely utilized and both kinds of instructions
exist.
Q77. What is the maximum numbers of one–addresses instruction? ___________

Q78.
What is the minimum numbers of one–addresses instruction? ___________

Data for the next two questions, consider a digital computer has a memory unit with
32 bits per word. The instruction set consists of 110 different operations. All
instructions have an operation code part (opcode) and two address fields: one for a
memory address and one for a register address. This particular system includes eight
general-purpose, user-addressable registers. Registers may be loaded directly from
memory, and memory may be updated directly from the registers. Direct memory-to-
memory data movement operations are not supported. Each instruction is stored in one
word of memory.
Q79. What is the maximum allowable size for memory?
(a) 23 bytes (b) 27 bytes (c) 222 bytes (d) 224 bytes
Q80. What is the largest unsigned binary number that can be accommodated in one
word of memory?
(a) 23 –1 (b) 27 –1 (c) 222 –1 (d) 232 –1

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 18


Q81. What will be the size of instruction (if variable size instruction format is used) in
bits (instruction size is multiple of 8), if an address takes 12 bits and register
number 3 bits long. Also, we have these 4 classes of instructions:
Class I: 16 instructions, each with 3 addresses and 1 register
Class J: 255 instructions, each with 2 addresses and 1 register
Class K: 50 instructions, each with 1 address and 1 register
Class L: 50 instructions, with no addresses or registers.
(a) 48 (b) 44 (c) 40 (d) 56
Q82. Consider the following code:
Loop1: bne R3, R4, Loop2 // if R3 ≠ R4 the goto Loop2
add R0, R1, R2 // R1 + R2 →
jmp Loop3 // goto Loop3
Loop2: sub R0, R1, R2 // R1 – R2 → R0
Loop3: Halt // Stop
Use the following mapping of variables and registers: f, g, h, i, and j are mapped
to registers R0, R1, R2, R3, and R4 respectively. What is the equivalent C code?
(a) if (i == j) (b) if (i == j)
f = g+h; f = g-h;
f = g - h; f = g+h;

(c) if (i != j) (d) if (i != j)
f = g + h; f = g-h;
else else
f = g-h; f = g+h;

Q83. Suppose we execute the following MIPS instructions

1. Loadi T0, 2 // 2 → T0
2. Loadi T1, 5 // 5 → T1
3. SLT T2, T1, T0 //Set T2 = 1 if T1 < T0, otherwise 0
4. BEQ T2, R0, SKIP // if T2 = R0 then skip
5. ADDi T1, T2, 3 // T1 = T2 + 3
6. SKIP:
7. Loadi V0, 42 // V0 = 42
Assume that R0 initialized as 0. What is the value of register T1 after completion
of the code?
(a)0 (b) 2 (c) 5 (d) 42

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 19


Q84. Consider the following code fragment:
ADDI R1, R0, #1 // R1 ← R0 +1 and R0 = 0
ADD R2, R0, R0 // R2 ← R0 + R0
ADDI R3, R0, #128 // R3 ← R0 + 128
Loop: MUL R1, R1, #2 // R1 ← R1 * 2
ADDI R2, R2, #1 // R2 ← R2 + 1
BNE R2, R3, loop // if R2 = R3 then Exit
Exit:
How many instructions are executed dynamically to complete the above code?
_____
Q85. Consider the following code:
Loadi R0, 9 // load immediate
Loadi R4, 0 // load immediate
Loadi R3, 3 // load immediate
Loop:
Beq R3, R4, Exit // branch if R3== R4
Addi R0, R0, 4 //R0 ← R0 + 4
Addi R3, R3, -1 //R3 ← R3 + -1
Jump Loop
Exit:
Store R0, F
The value of memory location F at the end of the program will be? __________

Q86. Consider the following assembly code:


LW R4 # 400 // Load immediate
L1: LW R1, 0 (R4) // Load first operand
LW R2, 400 (R4) // Load second operand
ADDI R3, R1, R2 // Add operands
SW R3, 0 (R4) // Store result
SUB R4, R4, #4 // Calculate address of next element
BNEZ R4, L1 // Loop if (R4)! = 0
Assume Mem[X] = X, i.e. Mem[800] = 800, Mem[400] = 400 and so on. How
many data memory access is done by the above code?
(a) 300 (b) 303 (c) 502 (d) 601

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 20


Q87. Consider the following code
LD R1, 3 //R1 ← 3
LD R2, 2 //R2 ← 2
LD R3, –1 //R3 ← –1
ADD R2, R2, R3 //R2 ← R2 + R3
Repeat: SUB R1, R1, R2 //R1 ← R1 – R2
BN Skip //Branch to “Skip” if result is Negative
SUB R2, R1, R2 //R2 ← R1 – R2
SUB R2, R2, R3 //R2 ← R2 – R3
BP Repeat //Branch to “Repeat” if result is Positive
Skip: ADD R3, R2, R3 //R3 ← R2 + R3
What is the content of register R3 after the execution of above code? ________
Q88. Consider the following program segment. Here R1, R2 and R3 are the general-
purpose registers.
MOV R1, [1000] // R1 ← M[1000]
MOV R2, [R3] // R2 ← M[R3]
LOOP: ADD R2, R1 // R2 ← R1 + R2
MOV (R3), R2 // M[R3] ← R2
ADD R3, 4 // R3 ← R3 + 4
SUB R1, 4 // R1 ← R1 - 4
BNZ LOOP // Branch on not zero
HALT // Stop
Assume that memory location 1000 contains 100 and the base address of array
is store in register R3 and the base address of array is 501. All the number is in
decimal.
Memory Location 501 502 503 … 599 600
Contain 101 102 103 … 199 200
The content of memory location 600 after the execution of the above code
is__________
Data for the next two questions, consider a hypothetical branch-if-equal instruction
that is 32 bits long:6 bits are used to encode the opcode
- 6 bits are used to encode one register number
- 6 bits are used to encode another register number
- 14 bits are used to encode an offset that will be added to the program counter (PC) if
the branch ends up being taken, and a new instruction address is required.
 (The number is not in 2s complement form, and all 14 bits can encode a constant.)
Thus, the instruction syntax might be: BEQ R12, R11, X
- If R12 == R11, the PC will be set to PC + X instead of PC + 4.
Given this instruction:

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 21


Q89. The maximum possible address of the instruction "Sub R1, R2, R3" in the
hypothetical system is__________

Q90. Assuming that the PC has already been incremented by 4 when the comparison
for the BEQ instruction at address 5004 is made, how many instructions away
from the BEQ instruction could we reach?

Q91. Consider a byte addressable machine that required two types of instruction:
 The first type of instructions (Type A) has the following general format:
Opcode Ra, Rb, Imm
Instructions of this type operate as follows. We perform an operation between
the values in registers Ra and Rb, and store the result to the memory address
specified by the immediate value Imm. The immediate value is treated as an
absolute (as opposed to relative) memory address. The binary encoding for
this type of instructions is the following. The most significant bit is always 0,
indicating that that this is a Type A instruction.

 The second type of instructions (Type B) has the following general format:
Opcode Ra, Rb, Rc
Instructions of this type operate as follows. We perform an operation between
the values in registers Ra and Rb, and store the result to the memory address
specified by value in register Rc. The binary encoding for this type of
instructions is the following. The most significant bit is always 1, indicating
that that this is a Type B instruction.

The total memory capacity is 220 Bytes. Each register in this machine holds 4
bytes of data. For each instruction type, support 8 different arithmetic and logic
operations (same 8 for each type). Also assume that the length of Type A
instructions is 4 bytes. Assume that we have a program with 1000 instructions.
20% of them are of Type A, and 80% of Type B. How much space in the memory
(in bytes) is occupied by this program?

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 22


Q92. Consider the case of a processor with an instruction length of 12 bits and with
32 general-purpose registers so the size of the address fields is 5 bits. The
computer has three types of instruction with the following count of instruction:
 3 two-address instructions
 30 one-address instructions
 X zero-address instructions
What can be the maximum possible value of X if all three types of instruction
must be exist in the computer? ___________

Q93. Assume the following values are in memory, and register R1 is the index register
and is storing200.

Suppose value loaded into the accumulator with the instruction “Load 500” is x
if addressing mode consider to be immediate, if addressing mode consider to be
direct addressing then value of accumulator is y, with indirect addressing z is
loaded in to accumulator, then the value of x + y + z is___________
Q94. A digital computer has a memory unit with 32 bits per word. The instruction set
consists of 110 different operations. All instructions have an operation code part
(opcode) and two address fields: one for a memory address and one for a register
address. This particular system includes eight general-purpose, user-
addressable registers. Registers may be loaded directly from memory, and
memory may be updated directly from the registers. Direct memory-to-memory
data movement operations are not supported. Each instruction is stored in one
word of memory. If the memory is word addressable, then what is the maximum
allowable size for memory (in MB)?__________
Data for the next two questions, a certain machine has 22-bit instructions and 6-bit

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 23


addresses. There are three types of instructions: type-A, type-B and type-C, as shown
below:

Assume that there is at least one instruction for each type, and the encoding space is
completely utilized.
Q95. What is the number of type-C instructions machine support simultaneously
with the maximum number of type-B instruction? ________

Q96. What will be the maximum number of type-B instructions? _________

Instruction Pipeline
Q97. The overall speedup of a system that spends 40% of its time in calculations with
a processor upgrade that provides for 100% greater through-put is _________ %.
Q98. Suppose you have written a program and determined that 80% of the time is
spent in one segment of code. You examine the code and determine that you
can decrease the running time in that segment of code by half (i.e., a speedup of
2). How much time faster your program as a whole will run with the new code?
_________
Q99. In an enhancement of a design of a CPU, the speed of a floating-point unit has
been increased by 20% and the speed of a fixed-point unit has been increased
by 10%. What is the overall speedup achieved if the ratio of the number of
floating-point operations to the number of fixed-point operations is 2:3 and the
floating-point operation used to take twice the time taken by the fixed-point
operation in the original design?
(a)1.155 (b)1.185 (c)1.255 (d)1.285
Q100. A benchmark program runs for 200 seconds. We want to improve the execution
time of the benchmark by a factor of 2.5. We enhance the floating-point unit to

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 24


make the floating-point instructions run 4 times faster. How much of the initial
execution time would floating-point instructions have to account for to show an
overall speedup of 2.5 on this benchmark?

Data for the next two questions, assume that to spell check a large file, 820,000,000
instructions are needed. The instructions in the program are broken down into 4
different classes, and each class requires N clock cycles to execute. Specific
information is given in the table below.

Q101. If the total execution time for this program is found to be 1.57 seconds, what is
the clock rate (in GHz) of the computer on which it was run? _________(Rounded
off to two decimal places)
Q102. Assume that as part of the 820,000,000-instruction spell check, 25% of all load
instructions are immediately followed by an ALU / R-type instruction that uses
the data that was just loaded. To speed up this program, we are contemplating
adding a new type of instruction
– an ALU instruction where one of the source operands is a value from memory.
- This new instruction will replace the previous, 2 instruction sequence.
- It will take 7 clock cycles.
The speedup over the original design is_________. Assume that the clock rate
does not change. (Rounded off to two decimal places

Q103.
For this question, assume that the processor you are working with executes

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 25


different instruction types in integer numbers of short clock cycles (i.e., a multi-
cycle approach). The machineʼs clock rate is 1 GHz and 50,000,000 instructions
are executed. The profile of a common workload is shown below:

You are considering 3 potential improvements to improve the performance of


this workload.
1. You can spend time improving the compiler. This will reduce the total
instruction count from 50,000,000 to 40,000,000. You can assume that the
above percentages will still hold.
2. You can improve the clock rate from 1 GHz to 1.25 GHz. However, each
instruction will take 1 CC longer to execute.
3. You can keep the instruction count and the clock rate the same, but reduce
the number CCs per floating point instruction from 10 to 5.
Which option is best?
(a) Option 1 (b) Option 2
(c) Option 3 (d) all give the same performance
Data for the next two questions, consider the following piece of code:
ADD R1, R0, #100
L1: ADD R1, R1, #-1
BEQZ R1, END //Branch 1
ADD R12, R0, #2
L2: ADD R12, R12, #-1
BNEZ R12, L2 //Branch 2
J L1
END: HALT
Assume R0 stores 0. Branch 1 is executed 100 times and branch 2 is executed a total
of 198 times.
Q104.
Consider the following 1-bit branch predictor:

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 26


For each branch, how many correct predictions will occur if we use the 1-bit
predictor initialized to T (taken) separately for each branch.
(a) Branch 1: 98 correct predictions, Branch 2: 97 correct predictions.
(b) Branch 1: 98 correct predictions, Branch 2: 1 correct prediction.
(c) Branch 1: 99 correct predictions, Branch 2: 1 correct prediction.
(d) Branch 1: 97 correct predictions, Branch 2: 98 correct predictions.
Q105. Consider the following 2-bit branch predictor:

For each branch, how many correct predictions will occur if we use the 2-bit
predictor initialized to 11 (Predict taken) common for both branches.
(a) Branch 1: 98 correct predictions, Branch 2: 97 correct predictions.
(b) Branch 1: 98 correct predictions, Branch 2: 1 correct prediction.
(c) Branch 1: 99 correct predictions, Branch 2: 1 correct prediction.
(d) Branch 1: 97 correct predictions, Branch 2: 98 correct predictions.
Q106. Consider the following code running on a system with static branch predictor.
Or R1, 0, 0 // immediate addressing
Loadi R2,1000 // Load immediate
A: Beq R1,R2,L1 // jump if R1=R2
Addi R1,R1,1 // R1 ← R1+1
Jump A
L1: Halt
Under which prediction program will perform better?
(a) Always not taken
(b) Always taken
(c) Same performance under any predictions
(d) Data insufficient to evaluate performance
Data for next two questions, consider the classical 5-stage pipeline processor with
ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 27
following specifications
 A static branch predictor is used which always predicts branch not taken in the
Fetch stage
 Unconditional branches are redirect from the Decode stage
 Conditional branches are resolved in the Execute stage
 Full operand forwarding is implemented.
Consider the following assembly code:

Q107. How many cycles are required to complete the two iterations of the above loop
on given processor? __________

Q108. What is the CPI of the loop as the number of iterations approaches infinity?
__________
Q109. Consider the stages of the four-stage pipelined processor:
• Fetch and Decode (FD)
• Execute 1 (EX1)
• Execute 2 (EX2)
• Memory and Writeback (MWB)
Suppose the following code is executed on above processor:
MOV 8(R1), R2 // M[8 + R1] → R2
ADD R2, R1 // R1 + R2 → R1
SUB R1, R2 // R1 + R2 → R2
MOV R2, 8(R1) // R2 → M[8 + R1]
The split execute stage requires the operands for an addition or subtraction to
be available near the beginning of the execute 1 stage, and only has results
available near the end of the execute 2 stage. Assume this processor
implements full forwarding. Operands are forwarded form EX2 to EX1 and form
MWB to EX1. When this processor is executing the EX1 stage of instruction
SUB, it is executing the stage of the add instruction?
(a)Fetch and decode (b)Execute 1
(c)Execute 2 (d)Memory and writeback
Q110. Consider the classic five stage pipeline (IF, ID, EX, MEM, WB) with the following
modifications:

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 28


• The MEM stage takes 4 cycles but is pipelined.
• Branches are resolved in the decode stage, and have one branch delay slot.
• All instructions take one cycle in the EX-stage. Address calculation for
memory instructions occurs in the EX-stage.
• Assume all forwarding paths as needed.
Suppose the following code is executed on above processor:
1. Loop: LD R3, 0(R1)
2. LD F2, 0(R3)
3. ADD F1, F1, F2
4. SUB R2, R2, #1
5. BNEZ R2, Loop
6. ADD R1, R1, #4
How many cycles are required to execute above code assuming no branching?
_____________
Q111. High quality branch predictors are useful because: (Select all that apply)
(a) Predicting branches correctly reduces the number of things that a processor
needs to fetch
(b) Predicting branches correctly results in faster programs
(c) Predicting branches correctly reduces the number of instructions stored in
main memory
(d) High quality branch predictors increase the amount of space available for
computation on a chip.

Q112. Which of the following are more likely characteristics of RISCS ISAs than CISC
ISA?
(a) MRICS ISAs result in shorter assembly programs
(b) RICS ISAs more registers
(c) Allowing one of the operands in a subtract instruction to be a memory
location
(d) RICS ISAas have variable length instructions

Data for next two questions, Consider the following code:


LOOP: R1 ← R6+R4 // INST A

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 29


R2 ← R3+R1 // INST B
R3 ← R3+R1 // INST C
R4 ← R4 – 4 // INST D
R3 ← R3+R2 // INST E
IF(R4 != 0) GOTO LOOP // INST F
EXIT: HALT
Assume that R4 is initialized to 400.
Q113. Assuming that branches are predicted not-taken and there is a 3-cycle penalty
for miss-prediction. Other than miss-predicted branches our processor always
finishes 1 instruction per cycle. If the branch is taken many many times, what
would be the CPI of the processor on this code?________

Q114. Say we considered replacing the branch with a branch that had three branch
delay slots. Which of the following is the correct ordering of the instructions A-F
(and perhaps add noops) to get the best performances?
(a) A B D F C E Noop (b)A B C F D E Noop
(c) A B C D A F E NoopNoop (d) A D F B C E

Q115. Consider the classical 5-stage pipeline (IF, ID, EX, MA, WB) processor with
following specifications
 A static branch predictor is used which always predicts branch taken in the
Fetch stage
 Unconditional branches are redirect from the Decode stage
 Conditional branches are resolved in the Execute stage
 Full operand forwarding is implemented.
Suppose the following code is executed on above processor:

Assume that 0($2) = 4. How many cycles are required to execute above code
assuming that branches are predicted to be taken.? _____________
Q116.
Consider the following code:

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 30


The number of RAW, WAR and WAW dependencies respectively are
(a) 6, 1, 1 (b) 5, 0, 0 (c) 6, 0, 0 (d) 7, 1, 0

Q117. Consider the following MIPS assembly language code:


I1: ORI $s0, $0, 5
I2: ADDI $s1, $0, 10
I3: ADD $s1, $s0, $s1
I4: LW $s0, -4($s1)
I5: ADD $s0, $s0, $s0
I6: SW $s0, -4($s1)
The number of TRUE(FLOW), ANTI and OUTPUT dependencies respectively are
(a) 6, 1, 3 (b) 5, 1, 1 (c) 6, 0, 2 (d) 6, 1, 0
Q118. Consider the classical 5-stage pipeline processor and the following pair of
instructions:
1. LD F2, 28(R5) and MULTD F0, F2, F4;
2. DIVD F6, F0, F10 and BEQZ F6, LOOP;
3. MULTD F0, F2, F4 and SD F8, 50(R5);
Assume that second can be dependent instruction on first instruction. Also,
assume that the processor uses a unified single-ported cache for data and
instruction accesses. What kind of hazards can occurs between above pair of
instructions?
(a) 1 – Structure hazard, 2 – Control hazard, 3 – Structure hazard
(b) 1 – Data hazard, 2 – Control hazard, 3 – Structure hazard
(c) 1 – Data hazard, 2 – Data hazard, 3 – Structure hazard
(d) 1 – Structure hazard, 2 – Data hazard, 3 – Structure hazard

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 31


Q119. Consider the following in-order 5-stage pipeline:
 Instruction Fetch (IF).
 Instruction Decode (ID).
 Operand Fetch (OF).
 Perform Operation (PO).
 Memory Access & Write Back (MWB).
Assuming the following assembly instructions are executed above pipeline:
Instruction Meaning
1. SUB R2, R3, R1 // R2  R3 + R1
2. LW R5, 0(R2) // R5  Mem[0+R2]
3. ADDi R4, R5, 1 // R4  R5 + 1
4. ADD R5, R3, R1 // R5  R3 + R1
5. SW R5, 0(R2) // R5  Mem[0+R2]
Assume that operands are forwarded to OF stage in case of normal pipeline and
to PO stage in case of pipeline with operand forwarding handwear. How much
(expressed as a percentage) did the full operand forwarding improve
performance on this code fragment? ___________(Rounded off to second decimal
places)
Q120. The instruction pipeline of a processor has the following 5 - stages:
 Instruction Fetch (IF),
 Instruction Decode (ID),
 Operand Fetch (OF),
 Perform Operation (PO)
 Memory Access & Write Back (MAWB).
Consider a sequence of 100 instructions. All the stages take 1 clock cycle each
for every instruction except that in the PO stage, 40 instructions take additional
3 clock cycles each, 35 instructions take additional 2 clock cycles each, and the
remaining 25 instructions take additional 1 clock cycle each. Assume that there
are no hazards of any kind. The number of clock cycles required for completion
of execution of the sequence of instruction is_________
Q121.
A non-pipelined single cycle processor operating at 20 MHz is converted into a

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 32


synchronous pipelined processor with five stages requiring 10 nsec,10 nsec, 10
nsec, 10 nsec and 10 nsec, respectively. The delay of the latches is 2 nsec.
In a given program, assume that 30% are memory instructions, 60% are ALU
instructions and the rest are branch instructions. 5% of the memory
instructions cause stalls of 50 clock cycles each due to cache misses, 50% of
the branch instructions cause stalls of 2 cycles each and 30% of the ALU
instructions cause stalls of 3 cycles each. For this program, the speedup
achieved by the pipelined processor over the non-pipelined processor (round off
to 2 decimal places) is __________ .
Data for the next three questions, Consider the following specifications:
 Processor contains 5 stages: IF, ID, EX, M and W;
 Each stage requires one clock cycle;
 All memory references hit in cache;
Suppose the following code is executed on above processor:
1. LOADi R4 #400 // 400 → R4
2. LOOP: LOAD R1, 0(R4) // M[0+R4] → R1
3. LOAD R2, 400(R4) // M[400+R4] → R2
4. ADDI R3, R1, R2 // R1 + R2 → R3
5. STORE R3, 0(R4) // R3 → M[0+R4]
6. SUB R4, R4, #4 // R4 – 4 → R4
7. BNEZ R4, LOOP // if R4 ≠ 0 then jump to loop
Q122. How many clock cycles will take execution of above code on the non-pipelined
processor? _________
Q123. How many clock cycles will take execution of above code on the classical
pipeline without forwarding or bypassing when result of the branch instruction
(new PC content) is available after WB stage? ___________
Q124. How much speed up (in percent) will increases when we use simple pipeline
with normal forwarding and bypassing when result of branch instruction (new
PC content) is available after completion of the ID stage as compared to without
forwarding or bypassing when result of the branch instruction (new PC content)
is available after WB stage? ___________

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 33


Q125. Consider the following specification:
 Pipeline contains 5-stages: IF, ID, RO (Read operand), EX and WB(Write
Back);
 Each stage except EX requires one clock cycle;
 System contains 4 FUs for FP operations, FP-load / store, FP-addition /
subtraction, FP-multiplication and FP-division:
o EX-stage for Load / Store operations contains 1 clock cycle (EX);
o EX-stage for ADDD or SUBD operations contains 1 clock cycle (A or S);
o EX-stage for MULTD operation contains 3 clock cycles (M1, M2, M3);
o EX-stage for DIVD operation contains 4 clock cycles (D1, D2, D3, D4);
 All memory references hit in cache;
 Pipeline has forwarding for all FUs, except FP-Load / Store where operand is
ready after W-stage and forward to RO-stage;
Suppose the following instruction are executed in the sequence on above
processor:
1. LOAD F6 20(R5) // M[20+R5] → F6
2. LOAD F2, 28(R5) // M[28+R5] → F2
3. MULT F0, F2, F4 // F2 * F4 → F0
4. SUB F8, F6, F3 // F6 - F3 → F8
5. DIV F10, F0, F6 // F0 / F6 → F10
6. ADD F6, F8, F2 // F2 + F8 → F6
7. STORE F8, 50(R5) // F8 → M[50+R5]
Then number of cycles required to complete the execution of above code is
_________
Q126. Computer designers are considering two implementations (A and B) and two
compilers (I and II) for an ISA. So, in total there are four possible option
available to computer designer. Instructions are divided into three categories,1,
2, and 3.
 In implementation A, instruction execution times (in cycles) are CPI1(A) = 2,
CPI2(A) = 2, and CPI3(A) = 3.
 In implementation B, instruction execution times (in cycles) are CPI 1(B) = 3,
CPI2(B) = 1, and CPI3(B) = 3.
 The number of executed instructions (by category) of a test program compiled
using compiler I are IC1(I) = 1500, IC2(I) = 1500, and IC3(I) = 5000.
 The number of executed instructions (by category) of a test program compiled
using compiler II are IC1(II) = 900, IC2(II) = 2500, and IC3(II) = 5000.
Which of the following option gives the better performance in term of execution
time on a system with a 1 MHz clock?
(a) Implementation A and Compiler I
(b) Implementation A and Compiler II
(c) Implementation B and Compiler I

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 34


(d) Implementation B and Compiler II
Q127. Consider an instruction pipeline with eight stages (IF, ID, EX1, EX2, MEM1,
MEM2, MEM3 and WB). The pipeline registers are required between each stage
and at the end of the last stage. The delay of all register is 10PS and Delays for
the stages are given in the figure:
IF ID EX1 EX2 MEM1 MEM2 MEM3 WB
100 120 150 190 200 240 100
70 S
PS PS PS PS PS PS PS
Assume that operand is forwarded from EX2 and MEM3 to EX1 in case of
pipeline with operand forwarding and WB to ID in case of pipeline without
operand forwarding. Also assume that other than this given path no other
forwarding path is available in the pipeline and the write to the register file
occurs in the first half of the clock cycle and the read from the register file
occurs in the second half. Suppose the following instruction are executed in the
sequence on above processor:
1. LOAD F6 20(R5) // M[20+R5] → F6
2. LOAD F2, 28(R5) // M[28+R5] → F2
3. MULT F0, F2, F4 // F2 * F4 → F0
4. SUB F8, F6, F3 // F6 - F3 → F8
5. STORE F8, 50(R5) // F8 → M[50+R5]The speed up of the pipeline
with operand forwarding over the without operand forwarding ___________

Q128. Consider the classical 5-stage pipeline (IF, ID, EX, MA, WB). By examining a few
representative programs, we find the following relationships for dependent
instructions:

Assuming we only stall due to data dependencies. The Average CPI of the
pipeline processor is__________

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 35


Q129. Consider a program with following instruction mix
Instruction Condition Frequency
ALU 40%
Branch/Jump Taken 15%
Branch/Jump Not Taken 15%
Store 15%
Load No Dependency 10%
Load Data Dependency 5%
Assume that given program runs on 5-stage pipeline processor where a branch
not taken incurs no penalty but a branch taken incurs a three-cycle stall, and
that data forwarding resolves all data hazards, except when the instruction
following a load word depends on the data loaded, in which case a one cycle
stall occurs. What is the average CPI for the CPU assuming all instructions
taken one cycle in all the phases? _________

Q130. Consider the following code to be executed on a classical 5-stage pipeline


processor:
I1 LD R1, 0(R2) ; load R1 from address 0+R2
I2 ADDi R1, R1, #1 ; R1 = R1 + 1
I3 SD R1, 0(R2), ; store R1 at address 0+R2
I4 ADDi R2, R2, #4 ; R2 = R2 + 4
I5 SUB R4, R3, R2 ; R4 = R3 - R2
I6 BNEZ R4, Loop ; branch to Loop if R4 != 0
Assuming the branch has one delay slot, which of the following instructions can
be candidate of the delay slot?
(a) I1 (b) I2 (c) I3 (d) NOP
Q131. Suppose 25% of all instructions in a program are conditional branches (of
which 65% are taken), and 10% are unconditional jumps. Consider a four-deep
pipeline (IF, ID, EX, WB), with each stage being executed in one cycle. Thus,
each instruction takes 4 cycles to be executed, in the 1st of which the
instruction is fetched. The branch is resolved at the end of the 2nd cycle for
unconditional jumps, and at the end of the 3rd cycle for conditional branches.
Assuming that only the 1st pipeline stage can always be done independently of
whether the branch is taken, and ignoring all other pipeline stalls, how much
faster would the machine be without any branches in the workload? Assume
that without any branches, the machine completes on the average, one
instruction every 1.5 clock cycles. __________

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 36


Q132. Consider an application in which 10% of the instructions executed are jumps
and calls (unconditional branches) and 20% of the instructions executed are
conditional branches. Moreover, 60% of the conditional branches are taken.
This machine has four-stage pipeline (IF, ID, EX, WB) where the branch is
resolved at the end of the second stage for an unconditional branch, and at the
end of the third cycle for a conditional branch. Assume that conditional
branches are always predicted as not taken, and that there is no penalty when
this prediction is correct. How much faster would this machine be without any
branch hazard? Ignore all other pipeline stalls. _________

Q133. Consider a pipelined processor with a cycle time of 10 ns and an average CPI of
1.6 for all stalls due to the branches and others. 10% of the instructions are
branches and branch prediction scheme is 90% accurate. Every branch miss-
prediction result in 2 cycles delay. Now consider a change in given processor
which results in the cycle time is 9 ns by increasing the depth of the pipeline. In
this new design the cost of a miss-prediction will increase to 7 clock cycles but
everything else will stay the same. What is the average CPI on the new
processor? _____________

Q134. Assume that a 5-stage pipeline processor with forwarding running a program
with following instruction mix:
 23% of instructions are loads (50% of the time, the next instruction uses the
loaded value)
 13% of instructions are stores
 19% of instructions are conditional branches
 2% of instructions are unconditional branches
 remaining are ALU instructions
There is a penalty of 1 cycle if an instruction immediately needs a loaded value,
unconditional branches results in the stall of 1 cycle, 75% of conditional
branches are predicted correctly and branch misprediction penalty is 2 cycle.
What is the average CPI of program on the pipeline if the base CPI is 1?_________
(rounded off to three decimal places)

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 37


Q135. A non-pipelined processor has cycle time of 1 nanosecond. On this processor
an ALU instruction takes 4 cycles for execution and 40% instructions are ALU
instructions. Memory instructions take 5 cycles for execution and 40%
instructions are memory instruction. Branch instructions take 4 cycles and
20% of Instructions are branch instructions. Pipelining the machine adds 20%
to the cycle time. What is the speedup of an ideal 5-stage pipelined version of
the machine over the non-pipelined one (Assume there are no pipeline
stalls).____________(Rounded off to two decimal places)
Q136. Assume that the processor is pipelined and that load instructions are 5% of the
instruction count and store instructions are also 5%. However, the program
spends 30% of its execution time executing load instructions and 15% of its
execution time executing store instructions. The designers discovered that the
Data cache is producing many cache misses causing the CPU to stall. They
decided to improve the design of the data cache and improve the execution time
of the load instructions by a factor of 3x (3 times faster) and the store
instructions by a factor of 2x. The overall speedup of the program due to the
improvements done to the data cache is ___________
Q137. A processor executes two instructions in the time it fetch-and-decode one
instruction. If the processor takes 1 nanoseconds to fetch-and-decode an
instruction and if the speed of the execution unit is doubled, then how many
microseconds will it take to fetch-decode-and-execute an instruction on this
processor?
(a) 1.25 × 10-9 (b) 1.5 × 10-3 (c) 1.25 × 10-3 (d) 2 × 10-9
Q138. Which of the following statement(s) is/are TRUE?
(a) A pipelined datapath must have separate instruction and data memories
because the format of instructions is different from the format of data.
(b) Allowing ALU instructions to write back their result in the 4th stage rather
than the 5th stage, improves the performance of a classical 5-stage pipeline.
(c) In the classical 5-stage pipeline, some but not all RAW data hazards can be
eliminated by forwarding.
(d) Name dependences such as Write-After-Read and Write-After-Write do not
cause any hazard in the classical 5-stage pipeline and do not require special
handling.

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 38


Data for the next two questions, consider the delay time of a processor:
Instruction memory access time = 500 ps
Data memory access time = 500 ps
Instruction Decode and Register read = 300 ps
Register write = 200 ps
ALU delay = 300 ps
Ignore the other delays in the multiplexers, wires, etc. Assume the following
instruction mix: 35% ALU, 20% load, 10% store, 25% branch, and 10% jump
Q139. What is the speedup factor of the multi-cycle over the single-cycle processor?
____________________
Q140. What is the speedup factor of the 5-stage pipelined over the single-cycle
processor?________________________
Q141. Assume a typical program has the following instruction type breakdown:
• 35% loads
• 10% stores
• 50% adds
• 3% multiplies
• 2% divides
Assume the current-generation processor has the following instruction
latencies:
• loads: 4 cycles
• stores: 4 cycles
• adds: 2 cycles
• multiplies: 16 cycles
• divides: 50 cycles
If for the next-generation design you could pick one type of instruction to make
twice as fast (half the latency), which instruction type would you pick?
(a) Load
(b) Store
(c) Multiplies
(d) Divides

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 39


IO
Data for the next two questions, suppose that we have a disk with the following
parameters:
• 750GB in size
• 12000 RPM, Data transfer rate of 40 Mbytes/s (40 × 106 bytes/sec)
• Average seek time of 8ms
• Disk controller with 2ms controller initiation time
• A block size of 4Kbytes (4096 bytes)
Q142. What is the average time (in ms) to read a random block from the disk? _____
(Rounded off to two decimal places)

Q143. Given the same parameters from above, assume that the operating system has
exploited locality by grouping related blocks together in the filesystem. As a
result, the typical access pattern is not as random. It typically retrieves 10
blocks sequentially at a time and spends only 1 ms for each seek. What is the
average time (in ms) to read a single block now? __________(Rounded off to two
decimal places)
Q144. Suppose that audio files are laid out in 64K (65536 bytes) chunks on the disk
(i.e. 64K in successive sectors on a track). Assume that the disk controller
automatically DMAs the data to kernel memory in a fashion that is overlapped
with reading it from the disk (so that you do not have to worry about DMA for
this operation). Compute the overhead for reading such a 64K chunk from a
random place on the disk. Assume the disk parameters given below:
• 750GB in size
• 10000 RPM, Data transfer rate of 50 Mbytes/s (50 × 106 bytes/sec)
• Average seek time of 4ms
• Disk controller with 1ms controller initiation time
• A block size of 4Kbytes (4096 bytes)
What is the total time (in ms) to read 64K chunk from a random place on the
disk into memory?

Data for the next three questions, suppose that we build our disk subsystem to

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 40


handle a high rate of I/O by coupling many disks together. Properties of this system
are as follows:
 Has a total of 24 disks, each of which is 2TB in size
 Uses disks that rotate at 12,000 RPM, have a data transfer rate of 81.92 MBytes/s
(for each disk), and have an 4.5 ms average seek time, 4KB sector size
 Has a small computer systems interface (SCSI) with an 600μs controller command
time. Assume that a group of consecutive sectors can be fetched with a single
request.
 Has a file system that groups sectors into 32KB blocks
Each disk can handle only one request at a time, but each disk in the system can be
handling a different request. The data is not striped (all I/O for each request has to go
to one disk).
Q145. What is the average service time to retrieve a single disk block from a random
location on a single disk, assuming no queuing time (i.e. the unloaded request
time)?______________

Q146. Assume that the OS is not particularly clever about disk scheduling and passes
requests to the disk in the same order that it receives them from the application
(FIFO). If the application requests are randomly distributed over a single disk,
what is the bandwidth (Mbytes/sec) that can be achieved?__________(Rounded
off to one decimal place)

Q147. Suppose that the application has requests outstanding for all disks (but they
are still randomly distributed, handed FIFO to disks), what is the maximum
number of I/Os per second (IOPS) for the whole disk subsystem (an “I/O” here
is a block request)? _____________
Data for the next two questions, the average seeks time and average rotational delay
in a disk system is 10 ms and 5 ms, respectively. The rate of data transfer to or from
the disk is 200Mbps (1M = 106) and all disk accesses are for 4 Kbytes (1Kbytes = 103
bytes) of data. Disk DMA controller, the processor and the main memory are all
attached to a single bus. The bus data width is 32 bits and a bus transfer to or from
the main memory takes 10 nanoseconds.

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 41


Q148. What is the maximum number of disk units that can be simultaneously
transferring data to or from the main memory? ______________

Q149. What percentage of main memory cycles are stolen by a disk unit, on average
over a long period of time during which a sequence of independent 8K-byte
transfers takes place? ________(Rounded off to three decimal places)

Q150. Consider a processor with data bus capable of supporting data transfers of up
to 500 MB/sec, and a device interrupts the CPU every time an 8-bit character is
ready so that the CPU can read that character. Device generates the interrupt in
every 10 sec and it takes 1sec by CPU to handle the interrupt, what is the
maximum number of bytes/second that we can transfer from this device?
___________
Q151. Consider a device connected to a DMA controller capable of sending or receiving
a 32-bit word every 100 nsec. A response takes equally long. How fast does the
bus (in MB/Sec) have to be to avoid being a bottleneck? __________
Q152. Consider a computer can read or write a memory word in 10 nsec. Also suppose
that when an interrupt occurs, all 32 CPU registers plus the Program Counter
and PSW are pushed onto the stack. What is the maximum number of
interrupts per second this machine can process (in unit of 106)? __________

Q153. Consider the hard disk that transfers data in 4-word chunks of 2-byte each and
can transfer at 4MB per sec. The data is transferring using Interrupt-driven I/O
and overhead of transfer including interrupt is 500 clock cycle. What percent of
500-MHz processor is consumed if hard drive is only transferring data 10% of
the time? _____
Q154. Consider the hard disk that transfers data in 4-word chunks of 2-byte each and
can transfer at 4MB per sec. The data is transferring using DMA. Assume that
initial DMA setup takes 1000 cycles and overhead of interrupt at completion is
500 cycles. If the average transfer is 8KB, what fraction (in percent) of the 500-
MHz processor is consumed if the disk is active 100% of the time (ignore
processor/DMA controller bus contention)? ____________ (Rounded off to two
decimal places)

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 42


Q155. Consider a disk with three tracks. The time to seek between two adjacent tracks
is S. it takes 2S to seek across two tracks (e.g., from the outer to the inner
track). The rotational delay is R and transfer time is negligible. Given a FCFS
scheduler, what is the worst-case time for three read requests to different
blocks?
(a) 6S + 3R (b) 6S + R (c) 2S + 2R (d) 2S + R

Q156. Consider an 8-MHz input clock processor, with a 32-bit external data bus.
Assume that this processor has a bus cycle whose minimum duration equals
four input clock cycles. What is the maximum data transfer (in Mbps) rate
across the bus that this processor can sustain? ___________

Q157. Suppose that a 2 GHz processor needs to read 1000 bytes of data from a
particular I/O device. The I/O device supplies 1 byte of data every 0.01ms. The
code to process the data and store it in a memory takes 2000 cycles. If
processor performs polling in every 1000 cycle and one polling operation takes
100 cycles, how many cycles of CPU consumed by the entire operation take?
__________

Q158. A hard disk with a transfer rate of 5 Mbytes/second is constantly transferring


data to memory using DMA. The processor runs at 1GHz, and takes 500 and
1000 clock cycles to initiate and complete DMA transfer respectively. If the size
of the transfer is 1 KB, the max. Percentage of time that CPU gets blocked
during DMA operation? ____________

Q159. Match the correct I/O function on both side

1. CPU polling I/O for doing service i) DMA I/O

2. Block data access ii) Programmed I/O

3. Individual I/O instructions iii) Memory-mapped I/O


4. Same format for I/O and memory iv) I/O-mapped I/O
instructions
5. I/O requires CPU to do service immediately v) Interrupt-driven I/O
(a) 1-ii, 2-i, 3-iv, 4-iii, 5-v (b) 1-ii, 2-iii, 3-i, 4-iv, 5-v
(c) 1-ii, 2-i, 3-iii, 4-iv, 5-v (d) 1-iv, 2-i, 3-v, 4-iii, 5-ii

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 43


Q160. An interrupt service circuit uses a dynamic algorithm to adjust the priorities of
two devices x, y that can interrupt a computer. Each time it examines the
interrupt requests, the circuit services the devices as follows:
1. If no device has interrupted, nothing is done,
2. If only one device has interrupted, that device is serviced, and the priorities
are left unchanged,
3. If two devices have interrupted the device that has interrupted with high
priority is serviced and the priorities of the two devices are switched.
Assuming that x is assigned the high priority (priority 1), y is assigned low
priority (priority 2) when the circuit begins first its operation. Which of the
following is the correct sequence of services are rendered for the sequence of
requests x, xy, xy, x, y, xy, x, xy, x?
(a) x, x, y, x, y, x, x, y, x (b) x, x, y, x, x, y, x, y, x
(c) x, x, y, x, x, y, x, x, y (d) x, y, x, x, y, y, x, y, x
Q161. Match machine instruction in List - I with corresponding micro-instructions of
execution phase of that instruction in List – II:
Machine instruction Micro- Operations
I.
MAR ← IR(address of operand)
1. Load R1, x
MDR ← R1
Memory(MAR) ← MDR
II.
MAR ← IR(address of operand)
2. Store x, R1
MDR ← Memory(MAR)
R1 ← R1 + MDR
III.
MAR ← IR(address of operand)
3. ADD R1, x
MDR ← Memory(MAR)
R1 ← MDR
Codes
1 2 3
(a) II III I
(b) III I II
(c) I II III
(d) III II I

ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 44


ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 45

You might also like