Mock Paper Computer Architecture Answers
Mock Paper Computer Architecture Answers
Mock Paper Computer Architecture Answers
FACULTY OF ENGINEERING
COMPUTER ARCHITECTURE
TIME ALLOWED:
2 hours
~( ( x >> 2) * OxF4 )
i —10(10)
ee et
+> 11110110
Next, since x is signed (implying an arithmetic vs. logical right-shift), we can evaluate
the expression as follows
11110110
=F shea ge) dL oe aL ae
—10(10)
Q2. A m-output, 1-bit demultiplexer connects a 1-bit input x to one of m separate 1-bit
outputs (say r; for O < i < m). The output is selected using an /-bit control signal c (or,
equivalently, c is a collection of / separate 1-bit control signals). If m = 5, what value
of / is required?
A. 0
B. 1
C. 2
D. 3
E. 4
[1 mark]
Solution: Given an l-bit control signal c, the demultiplexer can select between at most
2l outputs: we treat c as an unsigned, l-bit integer which will clearly range in value
between 0 and 2l − 1. In general, we want an l st. 2l ≥ m so each output can be
specified; typically m is a power-of-two, since this matches the maximum number of
outputs that can be specified. However, in this case we have m = 5.
Since 22 = 4 < 5 and 23 = 8 > 5 we know a 2-bit control signal is not enough (it
cannot select r4 since 0 ≤ c < 4), but a 3-bit control signal is (although it could cope
with upto m = 8, and since 0 ≤ c < 8 select r5 , r6 and r7 if they existed). In summary
then, l = 3 is the correct answer.
Q3. Consider a DRAM-based memory device with a capacity of 65536 addressable bytes. Of
the following options
A. 8 address pins, 65536 cells
B. 16 address pins, 65536 cells
C. 8 address pins, 524288 cells
D. 16 address pins, 524288 cells
E. None of the above
which offers the most likely description of said device?
[1 mark]
Solution: There are 216 addressable bytes, meaning a 16-bit address needs to be sup-
plied. However, in contrast to an SRAM memory, a DRAM memory will normally use a
2-step (or more, potentially) approach: half the address is supplied by each of the steps
(under control of row and column address strobe signals), which requires only half the
number of address pins.
The memory stores bytes, i.e., 8-bit elements, so we expect there to be 8 duplicated
arrays each consisting of 65536 cells. Overall, there will be 8·65536 = 524288 cells. So,
in summary, an answer of 8-bit address pins, and 524288 cells is correct; the alternative
of 16-bit address pins, and 524288 cells is not wrong per se, but certainly less likely in
practice.
Q4. Which of the following options would you not expect (or at least it would be uncommon)
to see in the specification of an ISA:
Solution: The option that stands out as unlikely to be included in the ISA is execution
latency. This is essentially a performance metric, measuring how long, e.g., number
of cycles, an instruction takes to execute. As a result, it is a property of the micro-
architecture not the ISA. Put another way, since the ISA is an interface that allows
flexibility wrt. the micro-architectural implementation, it is unlike the former would
include such a measure: doing so would constrain the later, reducing said flexibility.
Q5. Which of the following statements best describes the semantics of a relative branch
instruction?
A. The program counter is reset to zero, so is independent from the current
program counter value
B. The branch target is at an offset from, and so is dependent on the current
program counter value
C. The branch target is relatively far from the current program counter value
D. The branch target is greater than the current program counter value
E. The program counter is only updated if a condition is true
[1 mark]
Solution: Some of the statements describe other ways to classify a branch; for example,
the last statement describes a conditional branch. Some of the statements are nonsense;
for example, the term “relatively far” is subjective so has no clear meaning in this
context.
The semantics of a relative branch instruction can be written as
PC ← PC + x,
st. the branch target (i.e., new program counter), is an offset from the current program
counter value. The question does not specify whether the offset x is an immediate, but,
either way, the branch target is clearly dependant on the current program counter value.
Page 4 of 26
Q6. Consider the equivalence
(y ∧ ¬x) ∨ (x ∧ ¬y ) ≡ (x ∨ y ) ∧ ¬(x ∧ y ),
the left-hand side of which can be manipulated into the right-hand side by applying the
following sequence of Boolean axioms:
The final axiom is missing, i.e., replaced with X: which of the following options for X
yields a valid derivation?
A. Absorption
B. Idempotency
C. Implication
D. Null
E. de Morgan
[2 marks]
Question Q7 and Question Q8 both relate to Figure 1, which describes the implementation
of two components denoted C0 and C1 . Each component Ci produces one output ri given
two inputs x and y , and has been implemented using MOSFET transistors.
Q7. The truth table below includes 5 possibilities for outputs r0 and r1 (stemming from
instances of C0 and C1 ), given x and y . Recall that Vss and Vdd are used to represent 0
A. B. C. D. E.
z }| { z }| { z }| { z }| { z }| {
x y r0 r1 r0 r1 r0 r1 r0 r1 r0 r1
0 0 1 0 0 0 1 0 Z 0 1 Z
0 1 1 1 0 0 0 0 Z Z Z Z
1 0 1 1 0 0 0 0 Z Z Z Z
1 1 0 1 1 0 0 0 1 Z Z 0
[2 marks]
Solution: Note that if a given ri is not connected to either Vdd or Vss , it is deemed to
have the high impedance value Z. This suggests the correct truth table is
x y r0 r1
0 0 1 Z
0 1 Z Z
1 0 Z Z
1 1 Z 0
The reason is because C0 is st. r0 connects to Vdd via two (pull-up) P-type MOSFETs;
since these MOSFETs only connect source to drain if the gate is Vss , we can say that
r0 = 1 if x = y = 0 and r0 = Z (i.e., disconnected) otherwise. Conversely, C1 is st.
r1 connects to Vss via two (pull-down) N-type MOSFETs; since these MOSFETs only
connect source to drain if the gate is Vdd , we can say that r1 = 0 if x = y = 1 and
r1 = Z (i.e., disconnected) otherwise.
Q8. The vendor of these components claims they can be used to implement any Boolean
function; their reasoning is based on the fact that a NAND gate can be implemented
using instances of C0 and C1 . Assuming you want to minimise the number of C0 and C1
instances, how many of each are required to implement such a NAND gate?
A. B. C. D. E.
z }| { z }| { z }| { z }| { z }| {
C0 C1 C0 C1 C0 C1 C0 C1 C0 C1
1 1 5 3 3 5 3 3 5 5
[2 marks]
Solution: Note that the option using 1 instance of C0 and 1 instance of C1 sort of
makes sense: one can implement a NAND gate using 2 P-type and 2 N-type MOSFETS,
matching those that exist within instances of C0 and C1 . However, the question explicitly
says we need to use instances of C0 and C1 : we cannot, for example, “merge” their
internal implementation to make this option viable. So, as a first step, we implement a
NOT gate as follows:
t0
C0
x r
C1
t1
This is useful because we can reuse it when implementing a NAND gate, but also
because it explains the design approach involved: the idea is basically that the output
is driven by one instance of C0 or C1 at a time, with all the others producing the high
impedance value (which is “overridden” by the driving value). The behaviour can be
described as follows:
x t0 t1 r
0 1 Z 1
1 Z 0 0
Using the same design approach, we can now implement an NAND gate as follows:
x y
t0
C0
t1
C0
C0
t2
C1
t3
x y t0 t1 t2 t3 r
0 0 1 Z Z Z 1
0 1 Z 1 Z Z 1
1 0 Z Z 1 Z 1
1 1 Z Z Z 0 0
matches the truth table for NAND: remembering to count the components within each
NOT gate, we therefore use 5 instances of C0 and 3 instances of C1 .
As an aside, note that one can implement a NOR gate by swapping the components
types in the NAND implementation: we therefore implement the required behaviour
using 3 instances of C0 and 5 instances of C1 .
Q9. Consider the specification of an ISA, which includes a) a fixed-length, 32-bit instruction
encoding, and b) a byte addressable memory, with a 32-bit address space; instructions
are required to be aligned in memory. Imagine the ISA is implemented by some micro-
architecture, in which the program counter is a register comprised of n D-type latches:
what is the minimum n possible?
A. 0
B. 14
C. 16
D. 30
E. 32
[2 marks]
Question Q10 to Question Q13 all relate to Figure 2 and Figure 3, which describe an FSM
implementation and an associated waveform. When read left-to-right, the waveform captures
how values of Φ1 and Φ2 (a 2-phase clock), and r st (a reset signal) change over time; the
other input s maintains the value A6(16) throughout. Note that the waveform is annotated
with some instances and periods in time (e.g., ρ, and each ti ).
Page 8 of 26
Q10. What is the value of r at time t0 ?
A. 0
B. 1
C. undefined
[1 mark]
Solution: Before t0 , we can see that a pulse on r st at the same time as Φ2 = 1; this
acts as a reset, storing s (as a result of the multiplexers) into the top register. Then,
at t0 we find that Φ1 = 1: during this period, the design stores a the bottom register
as provided by the top register (which, at that point, is fixed since Φ2 = 0). As such,
at t0 we expect the bottom register to store s and hence r to be the MSB of s, i.e.,
r = s7 = 1.
Solution: At t1 the design has performed one cycle relative to t0 : the value stored in
the bottom register at t0 is updated by the middle of the design, then stored in the top
register, and finally stored back in the bottom register (ready for the next cycle). The
middle of the design is fairly simple. Ignoring the less-significant end since this does not
impact r (yet), it basically just shifts the bits toward the more-significant end. At t1 ,
we therefore expect the bottom register to be st. r = s6 = 0.
Solution: This design is a Linear Feedback Shift Register (LFSR); such a design might
be used to support a variety of use-cases, with a common example being the generation
of (pseudo-)random bits. As the name suggests, an LFSR is essentially an n-bit shift
register. After initialising (or seeding) the register state with s, successive updates are
performed; each such update a) shifts-out an output bit (wlog. the MSB), which forms
r = x7
x 0 = (x 1) k ( i∈T xi )
L
= (x6 k x5 k · · · k x0 ) k (x3 ⊕ x4 ⊕ x5 ⊕ x7 )
As such, we can use a table to trace the state and output as it is updated:
i x x0 r
A6(16) seed x with s
0 A6(16) 4C (16) 1 generate 0-th output bit
1 4C (16) 99(16) 0 generate 1-st output bit
2 99(16) 33(16) 1 generate 2-nd output bit
3 33(16) 66(16) 0 generate 3-rd output bit
4 66(16) CD(16) 0 generate 4-th output bit
5 CD(16) 9A(16) 1 generate 5-th output bit
6 9A(16) 35(16) 1 generate 6-th output bit
7 35(16) 6A(16) 0 generate 7-th output bit
8 6A(16) D4(16) 0 generate 8-th output bit
.. .. .. ..
. . . .
Using this table, we can infer that at time t2 (where the 8-th output bit is generated,
which is the first bit which is computed from x vs. matching s), r = 0.
relating to components used within Figure 2. The waveform is annotated with ρ, which
illustrates the clock period. If a 2-input NAND gate imposes a gate delay of Tnand =
10ns, which value most closely reflects the maximum possible clock frequency?
A. 1.0MHz
B. 1.2GHz
C. 3.8MHz
D. 5.9MHz
E. 6.6MHz
[3 marks]
Solution: Within the clock period (i.e., within the “time limit” which ρ dictates), two
steps must be completed; those steps are completed when Φ1 = 1 and Φ2 = 1 re-
spectively, and can be described as 1) the top register must be updated with a value
computed by the middle of the design (i.e., the combinatorial logic) from the value in
the bottom register, then 2) the bottom register must be updated with the value in
the top register. So if Tlatch and Tlogic are the critical paths associated with a D-type
latch and said combinatorial logic respectively, then we can write
Adding more detail, we could then reflect the critical path of components constituting
the combinatorial logic: writing
then reflects the fact that the critical path includes two XOR gates and one multiplexer.
Overall then, we have
ρ ≥ (Txor + Txor + Tmux + Tlatch ) + (Tlatch )
≥ 2 · Tlatch + 2 · Txor + Tmux
Since we have the design of each component, we can, as a next step, be more concrete
about each term above: inspecting the NAND based designs, we can deduce
Tlatch = 4 · Tnand = 40ns
Txor = 3 · Tnand = 30ns
Tmux = 3 · Tnand = 30ns
and thus
ρ ≥ 2 · Tlatch + 2 · Txor + Tmux
≥ 2 · 40ns + 2 · 30ns + 30ns
≥ 80ns + 60ns + 30ns
≥ 170ns
Tlatch arguably represents the more tricky case, noting that the cross-coupled right-
hand side means the path is through 4 NAND gates. Finally, the maximum clock
frequency is inversely proportional to this critical path so we find
fmax = 1/ρ
= 1/170ns
' 5.9MHz
is correct.
C0 = (l = 0, v0 = 0, v1 = 2, v2 = 1, v3 = 0)
and a subsequent trace of execution, decide which of the following options best describes
the purpose of this program.
A. Compare the values in R1 and R2 , setting R3 to reflect the result
B. Add the values in R1 and R2 , setting R3 to reflect the result
C. Swap the values in R1 and R2
D. Copy the value in R1 into R2 , retaining the value in R1
E. Copy the value in R1 into R2 , clearing the value in R1
[3 marks]
Solution: Producing a solution to this question requires two steps. First, we need to
decode the machine code program: using Figure 7, we find that
Second, we produce a trace of execution for the program: starting with the initial
C0 = (0, 0, 2, 1, 0)
L0 if R2 = 0 then goto L3 else goto L1
C1 = (1, 0, 2, 1, 0)
L1 R2 ← R2 − 1 then goto L2
C2 = (2, 0, 2, 0, 0)
L2 if R0 = 0 then goto L0 else goto L3
C3 = (0, 0, 2, 0, 0)
L0 if R2 = 0 then goto L3 else goto L1
C4 = (3, 0, 2, 0, 0)
L3 if R1 = 0 then goto L7 else goto L4
C5 = (4, 0, 2, 0, 0)
L4 R1 ← R1 − 1 then goto L5
C6 = (5, 0, 1, 0, 0)
L5 R2 ← R2 + 1 then goto L6
C7 = (6, 0, 1, 1, 0)
L6 if R0 = 0 then goto L3 else goto L7
C8 = (3, 0, 1, 1, 0)
L3 if R1 = 0 then goto L7 else goto L4
C9 = (4, 0, 1, 1, 0)
L4 R1 ← R1 − 1 then goto L5
C10 = (5, 0, 0, 1, 0)
L5 R2 ← R2 + 1 then goto L6
C11 = (6, 0, 0, 2, 0)
L6 if R0 = 0 then goto L3 else goto L7
C12 = (3, 0, 0, 2, 0)
L3 if R1 = 0 then goto L7 else goto L4
C13 = (7, 0, 0, 2, 0)
L7 halt
where the final configuration halts execution. As a result, stating that the program will
“copy the value in R1 into R2 , clearing the value in R1 ” is the best match.
Note that the program itself is in two parts: L0 to L2 clear (or zero) R2 , and L3 to L6
move R1 into R2 . Also note that it depends on having R0 = 0, allowing the construction
of unconditional branches in L2 and L6 .
SUB rl, r2 r1 < rl — r2; set the carry flag (CF) if the result is negative
SBB rl, r2 r1 + rl —(r2+ CF)
AND ri, r2 r1 + r1l&r2
ADD rl, r2 ri + rl1+r2
which are inspired by the x86 ISA, and operate on 4-bit registers. If we set r1 = 3, r2
= 9, and r8 = 0, then execute the following program
sub ri, r2
sbb r3, r3
and r3, ri
add r2, r3
what values do the three registers r1, r2, and r3 have afterwards?
A. ri =3,r2 =9,r38=1
B. ri =9$9,r2=3,r3=1
C. rl = -6, r2 = 3, r3 = -6
D. ri = 6,r2 =9,r3 = 6
E.ri =9,r2? =3,r3 = 0
[2 marks]
D. Subroutine parameters
E. Local variables
[1 mark]
What is the average number of clock cycles per instruction (CPI) for this program?
A. 22.59
B. 0.17
C. 4.25
D. 3.84
E. 384.00
[2 marks]
Solution: The CPI is computed as the weighted average of the instruction cycles where
the weights are the frequencies.
5 × 32 + 3 × 15 + 8 × 18 + 1 × 35
CP I = = 3.84
100
Q18. In a given computer system, accesses to main memory by the processor are supported
by a 16-way set-associative cache. Memory addresses are 16 bits, and each addressable
element has a word size of 1 byte. The cache has a capacity of 32 KiB (32,768 bytes),
cache blocks are of size 64 bytes, and cache sets are numbered starting at 0 (which
contains the lowest memory addresses).
Consider the memory address 0110100100110101, where here the highest (or most-
significant) bits are on the left-hand side. Which set is this address stored in, and what
tag is stored in the tag store?
A. Set 12, tag 01101001001
Question Q19 and Question Q20 both refer to the the two’s complement addition of the
8-bit numbers 00101111 (47 in decimal) and 01010001 (81 in decimal).
Q19. Compute the two’s complement addition of the two 8-bit numbers.
A. 10000000 (128 in decimal)
B. 10000000 (-128 in decimal)
C. 01111110 (126 in decimal)
D. 01111100 (124 in decimal)
[2 marks]
Solution:
B. No
[1 mark]
Solution: The carry out bit if 0 and most-significant bit of the answer 1 differ, therefore
overflow has occurred.
An alternative explanation is in Two’s Complement, the addition to the two positive
numbers yields a negative number.
Q22. The Hex 8 ISA has a word length of 8 bits. Each instruction is formed of a 4-bit
opcode and a 4-bit operand, which limits the number of instructions to 16.
Some of the instructions in the Hex 8 ISA do not require operands. Which of the following
schemes would allow the ISA to be expanded to contain 31 instructions, while retaining
4-bit opcodes which can be decoded in a single cycle?
A. Include an instruction to specify the operand is to be interpreted as an opcode
of an instruction which doesn’t require an operand. E.g. EXE ADD
B. Send an interrupt to indicate the instruction opcode is larger than 16. E.g. A
4-bit opcode refers to two instructions depending on an interrupt signal
C. Use a mechanism similar to the PFIX instruction to construct the opcode from
two instructions
D. Write a micro-program to implement the additional instructions
[3 marks]
Solution: The only way to get 31 instructions of the solutions listed is to use one
instruction of the original 16 to designate the operand is the opcode. This yields
15 + 16 = 31 instructions.
The other methods yield 32 and are impractical, require two cycles or are incorrect
(respectively).
Q25. What type of data dependency hazard occurs in following instruction stream?
r1 ← r2 + r3
r4 ← r3 + r4
r5 ← r3 + r1
Page 18 of 26
Solution: The register r 1 is read on line 3 after being written on line 1, therefore the
answer is Read after Write (RAW).
Line 2 is not a hazard because although r 2 is read and written, it is within the same
instruction and therefore does not impose a data dependence.
Q26. Figure 8 and Figure 9 show two different circuits for 3-bit multiplication, labelled A
and B; both are constructed using half adders (HA) and full adders (FA). Which circuit
was generated using the Wallace Tree procedure, and why?
A. Multiplication circuit A (i.e., Figure 8) is a Wallace Tree because it follows the
Wallace Tree procedure
B. Multiplication circuit A (i.e., Figure 8) is a Wallace Tree because it has two
layers
C. Multiplication circuit B (i.e., Figure 9) is a Wallace Tree because it follows the
Wallace Tree procedure
D. Multiplication circuit B (i.e., Figure 9) is a Wallace Tree because it has two
layers
E. Multiplication circuit B (i.e., Figure 9) is a Wallace Tree because it passes
through the lowest partial sum unmodified
[2 marks]
Solution: Figure 8 applies the Wallace Tree procedure of adding a layer using half adders
to sum 2 inputs and full adders to sum 3 (or more) inputs. For 3-bit multiplication, the
procedure is only applied once, rendering a final layer which is a simple 6-bit ripple-carry
adder.
Figure 9 follows instead the shift-and-add multiplication circuit.
The question asks for the Wallace Tree approach which is Figure 8.
Q27. During the assembly of an assembly program, labels are resolved. Consider the following
Hex 8 assembly program:
Line 1 - label:
Line 2 - ldac 0
Line 3 - ldbc -5
Line 4 - loop: br loop
Line 5 - br next
Line 6 - next: ldbc loop
During label resolution, what constant offsets are the operands on Line 4 (loop) and
Line 6 (loop) replaced with?
A. Line 4 becomes br -1 and line 6 becomes ldbc -1
B. Line 4 becomes br 4 and line 6 becomes ldbc 4
C. Line 4 becomes br 0 and line 6 becomes ldbc 4
D. Line 4 becomes br 0 and line 6 becomes ldbc 0
E. Line 4 becomes br -1 and line 6 becomes ldbc 4
[2 marks]
Line 1 - NOOP
Line 2 - ldac 0
Line 3 - ldbc -5
Line 4 - br -1
Line 5 - br 0
Line 6 - ldbc 4
Page 20 of 26
Additional figures and tables
Vdd Vdd
y r1
x
y
r0
x
Vss Vss
(a) C0 (using P-type MOSFETs). (b) C1 (using N-type MOSFETs).
¬Q
D Q
¬Q
en
en
s7
y c
r
x
D Q
¬Q
D Q
¬Q
en
en
s6
y c
r
x
D Q
¬Q
D Q
¬Q
en
en
s5
y c
r
x
D Q
¬Q
D Q
¬Q
en
en
s4
y c
r
x
D Q
¬Q
D Q
¬Q
en
en
s3
y c
r
x
D Q
¬Q
D Q
¬Q
en
en
s2
y c
r
x
D Q
¬Q
D Q
¬Q
en
en
s1
y c
r
x
D Q
¬Q
D Q
¬Q
en
en
s0
y c
r
x
Φ2
Φ1
r st
Figure 2: An FSM implementation, which has 4 inputs (1-bit Φ1 , Φ2 and r st on the left-hand
side; 8-bit s spread within the design) and 1 output (1-bit r on the right-hand side).
Page 22 of 26
Φ2
Φ1
r st
t0 t1 ρ t2
en
¬Q
R0
Figure 4: A NAND-based implementation of a D-type latch.
r
y
c
Figure 6: A NAND-based implementation of a 2-input, 1-bit multipliexer.
Page 24 of 26
8 7 6 5 4 3 2 1 0
Li : if Raddr = 0 then goto Ltarget else goto Li+1 7→ 010 addr target
8 7 6 5 4 3 2 1 0
HA FA HA
0 0
FA FA FA FA HA
x2 y0 x1 y0
x1 y1 x0 y1 x0 y0
x2 y1
x2 y2 FA HA
x0 y2
HA
x1 y2 HA
FA FA
Page 26 of 26
END OF PAPER