COA - Advanced Sheet 2023
COA - Advanced Sheet 2023
ADVANCED
COMPUTER
ORGANIZATION AND
ARCHITECTURE
Practice Questions
Booklet
ANALYSIS OF CSO IN GATE PAPER
Years Marks
2015 5
2016 8
2017 12
2018 10
2019 9
2020 10
2021 Set -1 8
2021 Set -2 9
Units of File size (how big a file is on your computer) is usually measured in
units of "kilobytes", "megabytes", and "gigabytes." In this computing (binary, but
not data transfer) usage, 'K' (uppercase) represents a multiplier of 1,024,
„B’(uppercase) represent bytes and „b‟(lowercase) represent bits. Other
abbreviations use this same base of 1,024:
1 KB (one KiloByte) = 1,024 Bytes (approximately 1 thousand Bytes)
Q3. [MSQ]
The following binary numbers are 4-bit 2's complement binary numbers. Which
of the following operations generate overflow?
(a) 0011 + 1100
(b) 0111 + 1111
(c) 1110 + 1000
(d) 0110 + 0010
Q4. For the relation (2x)9 = (3y)7, the possible values of x and y are
(a) 4, 2 (b) 5, 2 (c) 8, 7 (d) 9, 6
Q5. Consider the equation (123)5 = (xY)8which x and y as unknown. The number of
possible solutions is___________
(a)3 (b) 4 (c) 5 (d) 6
Q7. The number of bytes required to represent the decimal number 220 in packed
BCD (Binary Coded Decimal) form is __________.
Q8. A signed integer has been stored in a byte using the 2's complement format. We
wish to store the same integer in a 16-bit word. We should copy the original byte
to the less significant byte of the word and fill the more significant byte with
(a) 0
(b) 1
(c) equal to the MSB of the original byte
(d) Complement of the MSB of the original byte.
Q9. Let (32) = 𝑃1 × 𝑃2 where𝑃1 and 𝑃2 are primes. What is/are the value of base 𝒃?
(a) 8 (b) 9 (c) 11 (d) 13
Q15. Consider a new base system for numbers based on 4s. So, every number is going
to represented as a power of 4. A base 4 system has 3 symbols at most
(0...3). e.g., 224 is equal to 2 ⨯ 41 + 2 ⨯ 40 = 8 + 2 = 1010
What is the largest number that this system can represent in 6 digits (assume
unsigned)?
6 6 5 5
(a) 𝑖=0 4 × 3𝑖 (b) 𝑖=0 3 × 4𝑖 (c) 𝑖=0 3 × 4𝑖 (d) 𝑖=0 4 × 3𝑖
Q16. The IEEE single precision floating point standard allows us to represent less
than 232 different numbers. Of these numbers how many are strictly between 2 −5
and 2−4?
(a)223 + 1 (b) 223 - 1 (c) 223 - 2 (d) 222
Data for the next four question, consider the following two changes in the IEEE754
single precision floating point format:
Option 1: Adding a bit to signficand and removing a bit from the exponent
Option 2: Adding a bit to the exponent and removing a bit from the significand
For each of the following questions select whether option 1, option 2, neither, or both
will accomplish the presented task. Assume that the bias also shifts to be 2exp-bits - 1 – 1.
Q18. Represent pi(Π) more accurately than our IEEE 754 single precision floating
point.
(a) Option 1 (b) Option 2 (c) Neither (d) Both
Q19. Represent smaller positive numbers than IEEE 754 single precision floating
point.
(a) Option 1 (b) Option 2 (c) Neither (d) Both
Q20. Represent more numbers in the range [1, 2) than IEEE 754 single precision
floating point.
(a) Option 1 (b) Option 2 (c) Neither (d) Both
Q21. Represent more numbers than IEEE 754 single precision floating point.
(a) Option 1 (b) Option 2 (c) Neither (d) Both
Data for the next three question, assume that IEEE decided to add a new 12-bit
Q29. What is the number of misses for Loop A and for Loop B if the cache is not
sectored (i.e. no sub-blocks)? (a) 1024,
1024 (b) 1024, 4096
(c) 4096, 1024 (d) 4096, 4096
Q33. What is the average memory access time (AMAT) in cycles if the hit time of the
direct-mapped cache is 1 cycle and the L1 miss penalty to DRAM is 100 cycles?
____________(Rounded off two to decimal places)
Q38. What is the hit ratio (in %) of the program that loop 4 times from the locations 0
to 6710 in memory?_______________
Q39. What is the effective access time (in ns) of this program? _________ (Rounded off
to two decimal places)
Data for the next two questions, Solar radiation can randomly flip bits in the
computer system. Therefore, a cache on a space-faring vehicle, which is exposed to
solar radiation, utilizes error-correcting codes (ECC) for each of its cache blocks to
detect if bits have been flipped. These ECC bits add to the overhead of the cache, in
addition to the usual overhead bits such as valid bits and tags, etc. On a memory
access, the cache operates as normal, but in addition to checking hit/miss it will also
check if the content in the cache block has been corrupted. This is done by checking
the ECC bits. How exactly ECC bits are used to detect corruption is irrelevant to this
problem. If the ECC bits associated with a block indicate that the data in the block is
corrupted, that cache access is regarded as a cache miss. For the sake of the problem,
assume that the memory is incorruptible. The physical memory is byte addressable,
and is 64 KB in size. The cache is 1KB 2-way associative with the LRU replacement
policy. To implement LRU policy 1 bit per set is used to identify the least recently used
blocked. Each cache block can store 4B of data and contains 6 extra bits for the error-
correcting codes, 1 valid bit and tag bits.
Q40. What is the total size of memory needed at the cache controller to store
metadata (tags) for the cache?
(a) 4864 bits (b) 6144 bits (c) 6656 bits (d) 3712 bits
Q44. [MSQ]
Suppose one increased the size of cache blocks but kept the total data size and
associativity of a cache the same. Which of the following are likely results?
(a) increasing the number of conflict misses
(b) decreasing the number of conflict misses
(c) decreasing the number of compulsory misses
(d) increasing the number of compulsory misses
Data for next four questions, consider a system with a cache memory. You don‟t have
any information related to cache configuration (cache size, block size etc.) but you have
three patterns that access various bytes in the system
Assume that the cache is initially empty at the beginning of the first sequence, but not
at the beginning of the second and third sequences. The sequences are executed back-
to-back, i.e., no other accesses take place between the three sequences.
Q45. What is size of cache block?
(a) 8 bytes (b) 16 bytes (c) 32 bytes (d) 64 bytes
Q58. Consider the cache which randomly reads the valid bit of any cache entry as 0
Q59. Consider a system with 32-bit addresses, 4-byte words, and 4KB pages.
Demand paging can be thought of as using main memory as a cache for disk.
The properties of this cache are
(a) Fully associative, write-through and block size of 4KB
(b) Fully associative, write-back and block size of 4KB
(b) Set associative, write-through and block size of 1KB
(d) Set associative, write-back and block size of 1KB
Data for the next two question, assume an 8KB cache with 32B blocks, on a machine
that uses 32-bit virtual and physical addresses. Suppose the cache is 2 - way set
associative write-back cache that uses an LRU replacement policy. Consider the
following sequence of reads and writes:
Assume that caching happens at the block level, i.e., the unit of transfer between cache
and main memory is one block. Also, assume that a write leads to caching the data, the
same as a read. Initially, assume the cache is empty.
Q60. How many misses does the above access pattern exhibit?__________
Q61. How many writes are taking place between cache and memory?_________
Q65. [MSQ]
To read a word from the cache, the input address is set by the processor. Then
What is the access time (in ps) of the cache (the delay of the critical path)?
Assume that a 2-input gate (AND, OR) delay is 50 ps. _____________
Q68. A memory system uses three caches. The caches have hit rates and times given.
Cache hit rate hit time
I-cache 0.90 1 ns.
D-cache 0.80 5 ns.
L2 0.85 20 ns.
Main memory has an access time of 50 ns. Consider a program running on this
system has 30% Load and Store instructions. For an instruction access, first
the I-cache is checked. If there is a miss, the L2 cache is checked. If there is a
miss in the L2 cache, the main memory is checked. Similarly for data access
first the d-cache is checked. If there is a miss, the L2 cache is checked. If there
is a miss in the L2 cache, the main memory is checked. What is the effective
memory access time (in ns) of the system? _____________
Q72. What are the average stall cycles per instruction in cycles? _________
Q73. Assume a k-set-associative cache where k > 1 and a direct-mapped cache both
have the same address size, same data capacity, and same number of index
bits. Which of the following statement is/are TRUE based on the given
information?
(a) The direct-mapped cache has more tag bits
(b) The direct-mapped cache has fewer tag bits
(c) The direct-mapped cache has more block offset bits
(d) The direct-mapped cache has fewer block offset bits
Q74. Consider a computer system without a cache, a memory access time of 300ns
and an average CPU operation time of 50 ns. The system spends 40% of its time
computing CPU operations and the rest for memory access. This system will be
upgraded with a L1 cache with an average access time of 30 ns and a hit ratio
of 85%. The main memory will also be upgraded at the same time and the
resulting system will be 4 times faster compared to the system prior to the
upgrade. What is the highest possible memory access time (in ns) of the
upgraded computer system, that fulfils the speed increment
criteria?______________(Rounded to the second decimal place).
Data for the next two questions, A machine has 12-bit instruction with two different
formats-one address instructions and two –address instructions. Each address is 4 bits
long and all instruction consists of only an opcode and the address(es). Assuming that
the instruction encoding space is completely utilized and both kinds of instructions
exist.
Q77. What is the maximum numbers of one–addresses instruction? ___________
Q78.
What is the minimum numbers of one–addresses instruction? ___________
Data for the next two questions, consider a digital computer has a memory unit with
32 bits per word. The instruction set consists of 110 different operations. All
instructions have an operation code part (opcode) and two address fields: one for a
memory address and one for a register address. This particular system includes eight
general-purpose, user-addressable registers. Registers may be loaded directly from
memory, and memory may be updated directly from the registers. Direct memory-to-
memory data movement operations are not supported. Each instruction is stored in one
word of memory.
Q79. What is the maximum allowable size for memory?
(a) 23 bytes (b) 27 bytes (c) 222 bytes (d) 224 bytes
Q80. What is the largest unsigned binary number that can be accommodated in one
word of memory?
(a) 23 –1 (b) 27 –1 (c) 222 –1 (d) 232 –1
(c) if (i != j) (d) if (i != j)
f = g + h; f = g-h;
else else
f = g-h; f = g+h;
1. Loadi T0, 2 // 2 → T0
2. Loadi T1, 5 // 5 → T1
3. SLT T2, T1, T0 //Set T2 = 1 if T1 < T0, otherwise 0
4. BEQ T2, R0, SKIP // if T2 = R0 then skip
5. ADDi T1, T2, 3 // T1 = T2 + 3
6. SKIP:
7. Loadi V0, 42 // V0 = 42
Assume that R0 initialized as 0. What is the value of register T1 after completion
of the code?
(a)0 (b) 2 (c) 5 (d) 42
Q90. Assuming that the PC has already been incremented by 4 when the comparison
for the BEQ instruction at address 5004 is made, how many instructions away
from the BEQ instruction could we reach?
Q91. Consider a byte addressable machine that required two types of instruction:
The first type of instructions (Type A) has the following general format:
Opcode Ra, Rb, Imm
Instructions of this type operate as follows. We perform an operation between
the values in registers Ra and Rb, and store the result to the memory address
specified by the immediate value Imm. The immediate value is treated as an
absolute (as opposed to relative) memory address. The binary encoding for
this type of instructions is the following. The most significant bit is always 0,
indicating that that this is a Type A instruction.
The second type of instructions (Type B) has the following general format:
Opcode Ra, Rb, Rc
Instructions of this type operate as follows. We perform an operation between
the values in registers Ra and Rb, and store the result to the memory address
specified by value in register Rc. The binary encoding for this type of
instructions is the following. The most significant bit is always 1, indicating
that that this is a Type B instruction.
The total memory capacity is 220 Bytes. Each register in this machine holds 4
bytes of data. For each instruction type, support 8 different arithmetic and logic
operations (same 8 for each type). Also assume that the length of Type A
instructions is 4 bytes. Assume that we have a program with 1000 instructions.
20% of them are of Type A, and 80% of Type B. How much space in the memory
(in bytes) is occupied by this program?
Q93. Assume the following values are in memory, and register R1 is the index register
and is storing200.
Suppose value loaded into the accumulator with the instruction “Load 500” is x
if addressing mode consider to be immediate, if addressing mode consider to be
direct addressing then value of accumulator is y, with indirect addressing z is
loaded in to accumulator, then the value of x + y + z is___________
Q94. A digital computer has a memory unit with 32 bits per word. The instruction set
consists of 110 different operations. All instructions have an operation code part
(opcode) and two address fields: one for a memory address and one for a register
address. This particular system includes eight general-purpose, user-
addressable registers. Registers may be loaded directly from memory, and
memory may be updated directly from the registers. Direct memory-to-memory
data movement operations are not supported. Each instruction is stored in one
word of memory. If the memory is word addressable, then what is the maximum
allowable size for memory (in MB)?__________
Data for the next two questions, a certain machine has 22-bit instructions and 6-bit
Assume that there is at least one instruction for each type, and the encoding space is
completely utilized.
Q95. What is the number of type-C instructions machine support simultaneously
with the maximum number of type-B instruction? ________
Instruction Pipeline
Q97. The overall speedup of a system that spends 40% of its time in calculations with
a processor upgrade that provides for 100% greater through-put is _________ %.
Q98. Suppose you have written a program and determined that 80% of the time is
spent in one segment of code. You examine the code and determine that you
can decrease the running time in that segment of code by half (i.e., a speedup of
2). How much time faster your program as a whole will run with the new code?
_________
Q99. In an enhancement of a design of a CPU, the speed of a floating-point unit has
been increased by 20% and the speed of a fixed-point unit has been increased
by 10%. What is the overall speedup achieved if the ratio of the number of
floating-point operations to the number of fixed-point operations is 2:3 and the
floating-point operation used to take twice the time taken by the fixed-point
operation in the original design?
(a)1.155 (b)1.185 (c)1.255 (d)1.285
Q100. A benchmark program runs for 200 seconds. We want to improve the execution
time of the benchmark by a factor of 2.5. We enhance the floating-point unit to
Data for the next two questions, assume that to spell check a large file, 820,000,000
instructions are needed. The instructions in the program are broken down into 4
different classes, and each class requires N clock cycles to execute. Specific
information is given in the table below.
Q101. If the total execution time for this program is found to be 1.57 seconds, what is
the clock rate (in GHz) of the computer on which it was run? _________(Rounded
off to two decimal places)
Q102. Assume that as part of the 820,000,000-instruction spell check, 25% of all load
instructions are immediately followed by an ALU / R-type instruction that uses
the data that was just loaded. To speed up this program, we are contemplating
adding a new type of instruction
– an ALU instruction where one of the source operands is a value from memory.
- This new instruction will replace the previous, 2 instruction sequence.
- It will take 7 clock cycles.
The speedup over the original design is_________. Assume that the clock rate
does not change. (Rounded off to two decimal places
Q103.
For this question, assume that the processor you are working with executes
For each branch, how many correct predictions will occur if we use the 2-bit
predictor initialized to 11 (Predict taken) common for both branches.
(a) Branch 1: 98 correct predictions, Branch 2: 97 correct predictions.
(b) Branch 1: 98 correct predictions, Branch 2: 1 correct prediction.
(c) Branch 1: 99 correct predictions, Branch 2: 1 correct prediction.
(d) Branch 1: 97 correct predictions, Branch 2: 98 correct predictions.
Q106. Consider the following code running on a system with static branch predictor.
Or R1, 0, 0 // immediate addressing
Loadi R2,1000 // Load immediate
A: Beq R1,R2,L1 // jump if R1=R2
Addi R1,R1,1 // R1 ← R1+1
Jump A
L1: Halt
Under which prediction program will perform better?
(a) Always not taken
(b) Always taken
(c) Same performance under any predictions
(d) Data insufficient to evaluate performance
Data for next two questions, consider the classical 5-stage pipeline processor with
ADVANCED COMPUTER ORGANIZATION & ARCHITECTURE- 2023 27
following specifications
A static branch predictor is used which always predicts branch not taken in the
Fetch stage
Unconditional branches are redirect from the Decode stage
Conditional branches are resolved in the Execute stage
Full operand forwarding is implemented.
Consider the following assembly code:
Q107. How many cycles are required to complete the two iterations of the above loop
on given processor? __________
Q108. What is the CPI of the loop as the number of iterations approaches infinity?
__________
Q109. Consider the stages of the four-stage pipelined processor:
• Fetch and Decode (FD)
• Execute 1 (EX1)
• Execute 2 (EX2)
• Memory and Writeback (MWB)
Suppose the following code is executed on above processor:
MOV 8(R1), R2 // M[8 + R1] → R2
ADD R2, R1 // R1 + R2 → R1
SUB R1, R2 // R1 + R2 → R2
MOV R2, 8(R1) // R2 → M[8 + R1]
The split execute stage requires the operands for an addition or subtraction to
be available near the beginning of the execute 1 stage, and only has results
available near the end of the execute 2 stage. Assume this processor
implements full forwarding. Operands are forwarded form EX2 to EX1 and form
MWB to EX1. When this processor is executing the EX1 stage of instruction
SUB, it is executing the stage of the add instruction?
(a)Fetch and decode (b)Execute 1
(c)Execute 2 (d)Memory and writeback
Q110. Consider the classic five stage pipeline (IF, ID, EX, MEM, WB) with the following
modifications:
Q112. Which of the following are more likely characteristics of RISCS ISAs than CISC
ISA?
(a) MRICS ISAs result in shorter assembly programs
(b) RICS ISAs more registers
(c) Allowing one of the operands in a subtract instruction to be a memory
location
(d) RICS ISAas have variable length instructions
Q114. Say we considered replacing the branch with a branch that had three branch
delay slots. Which of the following is the correct ordering of the instructions A-F
(and perhaps add noops) to get the best performances?
(a) A B D F C E Noop (b)A B C F D E Noop
(c) A B C D A F E NoopNoop (d) A D F B C E
Q115. Consider the classical 5-stage pipeline (IF, ID, EX, MA, WB) processor with
following specifications
A static branch predictor is used which always predicts branch taken in the
Fetch stage
Unconditional branches are redirect from the Decode stage
Conditional branches are resolved in the Execute stage
Full operand forwarding is implemented.
Suppose the following code is executed on above processor:
Assume that 0($2) = 4. How many cycles are required to execute above code
assuming that branches are predicted to be taken.? _____________
Q116.
Consider the following code:
Q128. Consider the classical 5-stage pipeline (IF, ID, EX, MA, WB). By examining a few
representative programs, we find the following relationships for dependent
instructions:
Assuming we only stall due to data dependencies. The Average CPI of the
pipeline processor is__________
Q133. Consider a pipelined processor with a cycle time of 10 ns and an average CPI of
1.6 for all stalls due to the branches and others. 10% of the instructions are
branches and branch prediction scheme is 90% accurate. Every branch miss-
prediction result in 2 cycles delay. Now consider a change in given processor
which results in the cycle time is 9 ns by increasing the depth of the pipeline. In
this new design the cost of a miss-prediction will increase to 7 clock cycles but
everything else will stay the same. What is the average CPI on the new
processor? _____________
Q134. Assume that a 5-stage pipeline processor with forwarding running a program
with following instruction mix:
23% of instructions are loads (50% of the time, the next instruction uses the
loaded value)
13% of instructions are stores
19% of instructions are conditional branches
2% of instructions are unconditional branches
remaining are ALU instructions
There is a penalty of 1 cycle if an instruction immediately needs a loaded value,
unconditional branches results in the stall of 1 cycle, 75% of conditional
branches are predicted correctly and branch misprediction penalty is 2 cycle.
What is the average CPI of program on the pipeline if the base CPI is 1?_________
(rounded off to three decimal places)
Q143. Given the same parameters from above, assume that the operating system has
exploited locality by grouping related blocks together in the filesystem. As a
result, the typical access pattern is not as random. It typically retrieves 10
blocks sequentially at a time and spends only 1 ms for each seek. What is the
average time (in ms) to read a single block now? __________(Rounded off to two
decimal places)
Q144. Suppose that audio files are laid out in 64K (65536 bytes) chunks on the disk
(i.e. 64K in successive sectors on a track). Assume that the disk controller
automatically DMAs the data to kernel memory in a fashion that is overlapped
with reading it from the disk (so that you do not have to worry about DMA for
this operation). Compute the overhead for reading such a 64K chunk from a
random place on the disk. Assume the disk parameters given below:
• 750GB in size
• 10000 RPM, Data transfer rate of 50 Mbytes/s (50 × 106 bytes/sec)
• Average seek time of 4ms
• Disk controller with 1ms controller initiation time
• A block size of 4Kbytes (4096 bytes)
What is the total time (in ms) to read 64K chunk from a random place on the
disk into memory?
Data for the next three questions, suppose that we build our disk subsystem to
Q146. Assume that the OS is not particularly clever about disk scheduling and passes
requests to the disk in the same order that it receives them from the application
(FIFO). If the application requests are randomly distributed over a single disk,
what is the bandwidth (Mbytes/sec) that can be achieved?__________(Rounded
off to one decimal place)
Q147. Suppose that the application has requests outstanding for all disks (but they
are still randomly distributed, handed FIFO to disks), what is the maximum
number of I/Os per second (IOPS) for the whole disk subsystem (an “I/O” here
is a block request)? _____________
Data for the next two questions, the average seeks time and average rotational delay
in a disk system is 10 ms and 5 ms, respectively. The rate of data transfer to or from
the disk is 200Mbps (1M = 106) and all disk accesses are for 4 Kbytes (1Kbytes = 103
bytes) of data. Disk DMA controller, the processor and the main memory are all
attached to a single bus. The bus data width is 32 bits and a bus transfer to or from
the main memory takes 10 nanoseconds.
Q149. What percentage of main memory cycles are stolen by a disk unit, on average
over a long period of time during which a sequence of independent 8K-byte
transfers takes place? ________(Rounded off to three decimal places)
Q150. Consider a processor with data bus capable of supporting data transfers of up
to 500 MB/sec, and a device interrupts the CPU every time an 8-bit character is
ready so that the CPU can read that character. Device generates the interrupt in
every 10 sec and it takes 1sec by CPU to handle the interrupt, what is the
maximum number of bytes/second that we can transfer from this device?
___________
Q151. Consider a device connected to a DMA controller capable of sending or receiving
a 32-bit word every 100 nsec. A response takes equally long. How fast does the
bus (in MB/Sec) have to be to avoid being a bottleneck? __________
Q152. Consider a computer can read or write a memory word in 10 nsec. Also suppose
that when an interrupt occurs, all 32 CPU registers plus the Program Counter
and PSW are pushed onto the stack. What is the maximum number of
interrupts per second this machine can process (in unit of 106)? __________
Q153. Consider the hard disk that transfers data in 4-word chunks of 2-byte each and
can transfer at 4MB per sec. The data is transferring using Interrupt-driven I/O
and overhead of transfer including interrupt is 500 clock cycle. What percent of
500-MHz processor is consumed if hard drive is only transferring data 10% of
the time? _____
Q154. Consider the hard disk that transfers data in 4-word chunks of 2-byte each and
can transfer at 4MB per sec. The data is transferring using DMA. Assume that
initial DMA setup takes 1000 cycles and overhead of interrupt at completion is
500 cycles. If the average transfer is 8KB, what fraction (in percent) of the 500-
MHz processor is consumed if the disk is active 100% of the time (ignore
processor/DMA controller bus contention)? ____________ (Rounded off to two
decimal places)
Q156. Consider an 8-MHz input clock processor, with a 32-bit external data bus.
Assume that this processor has a bus cycle whose minimum duration equals
four input clock cycles. What is the maximum data transfer (in Mbps) rate
across the bus that this processor can sustain? ___________
Q157. Suppose that a 2 GHz processor needs to read 1000 bytes of data from a
particular I/O device. The I/O device supplies 1 byte of data every 0.01ms. The
code to process the data and store it in a memory takes 2000 cycles. If
processor performs polling in every 1000 cycle and one polling operation takes
100 cycles, how many cycles of CPU consumed by the entire operation take?
__________