Introduction To Computer Architecture: Exam: 1 CMOS Logic (2.5 Points)
Introduction To Computer Architecture: Exam: 1 CMOS Logic (2.5 Points)
R. Pacalet
2022-12-01
The text in black is the original one. The text in red is examples of the expected
correct answers. Only this text was expected, possibly in shorter form, nothing
more. The text in blue is extra comments about the expected correct answers.
Warning: the course changes frequently (content, vocabulary, examples. . . ); some
questions and answer proposals can thus be partly or completely out of scope.
Warning: some questions can be answered in many different ways; the proposed
answers are just examples and they are not exhaustive.
You can use any document but communicating devices are strictly forbidden.
Please number the different pages of your paper and indicate on each page your
first and last names. You can write your answers in French or in English, as you
wish. Precede your answers with the question’s number. If some information
or hypotheses are missing to answer a question, add them. If you consider a
question as absurd and thus decide to not answer, explain why. If you do not
have time to answer a question but know how to, briefly explain your ideas.
Note: copying verbatim the slides of the lectures or any other provided material
is not considered as a valid answer. Advice: quickly go through the document
and answer the easy parts first.
A X B
1
2. Write the boolean equation of the X output of baz using the NOT, AND and
OR operators and parentheses. Do not assume any precedence between the
boolean operators, use parentheses to make your equation non ambiguous.
3. Imagine a graphical symbol for baz and draw it.
1. The truth table is:
A B X
0 0 1
0 1 1
1 0 0
1 1 1
2. X = (NOT A) OR B.
3. Using the style seen in class we could represent the baz gate as shown on
Figure 2.
A X
B
2
3 CMOS logic (2.5 points)
The qux logic gate has 3 inputs A, B and C, one output X and the following truth
table:
A B C X
0 0 0 1
0 0 1 0
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0
C
B
A X
C
B
3
A
B X
C
f ALU operation
0 addition (S ← A + B)
1 bitwise XOR (S ← A XOR B)
At the heart of the ALU there will 32 identical ael elements connected together
as shown on Figure 5. The ith ael receives one bit of the first ALU operand
(ai ), one bit of the second ALU operand (bi ), one carry input (ci ) from the
previous ael (or 0 for the first ael), plus the control input f that selects the
ALU operation. It outputs a one bit result (si ) and a carry output (ci+1 ). Inside
ael the inputs and outputs are named f , a, b, x, s and y, as shown on the
leftmost ael instance on Figure 5.
a0 b0 a1 b1 ai bi a31 b31
f
af b c1 c2 ci ci+1 c31
0 x y ael ael ael c32
s
s0 s1 si s31
connected
lines 2-inputs NAND 2-inputs NOR 2-inputs XNOR
foo bar
not connected
lines constant 0 constant 1 wire renaming
f x a b y s
0 0 0 0 0 0
0 0 0 1 0 1
0 0 1 0 0 1
0 0 1 1 1 0
0 1 0 0 0 1
0 1 0 1 1 0
0 1 1 0 1 0
0 1 1 1 1 1
1 0 0 0 - 0
1 0 0 1 - 1
1 0 1 0 - 1
1 0 1 1 - 0
1 1 0 0 - 0
1 1 0 1 - 1
1 1 1 0 - 1
1 1 1 1 - 0
5
The y output is always that of the addition operation, even for the bitwise
XOR operation, because in this case we don’t care y while anything else
would cost extra hardware. It is computed as seen in class for the carry
output of a full adder: y = ((a XOR b) AND x) OR (a AND b).
The s output is computed as s = (a XOR b) XOR (x AND (NOT f )).
When f = 0 (addition), this simplifies as s = a XOR b XOR x, which is
indeed the equation of the sum bit of a full adder. When f = 1 (bitwise
XOR), this simplifies as s = a XOR b XOR 0 = a XOR b.
a
b
x y
3. As seen in class the equation of the vs flag is vs = c31 XOR c32 . The
schematic is thus as shown on Figure 8.
c32 vs
c31
4. The unsigned overflow flag vu is the output carry c32 itself. The schematic
is thus as shown on Figure 9 where the diamond symbol represents the
renaming.
c32 vu
5. The sign flag sgn is the same as the most significant bit of the 33 bits output
we would have if we were sign-extending the two inputs from 32 to 33 bits
(a32 = a31 and b32 = b31 ). This is thus sgn = a32 XOR b32 XOR c32 =
a31 XOR b31 XOR c32 . The schematic is thus as shown on Figure 10.
6
a31
b31 sgn
c32
1 foo :
2 add t0 , a0 , a1 # t 0 <− a0+a1 ( s <− a+b )
3 sub t1 , a0 , a1 # t 1 <− a0−a1 ( d <− a−b )
4 xor a0 , t0 , t 1 # a0 <− t 0 XOR t 1 ( a0 <− s XOR d )
5 jalr zero , 0 ( ra ) # r e t u r n a t ra , d i s c a r d PC+4
7
4. 10 (−29 = −512 ≤ −265 ≤ 511 = 29 − 1 but −265 < −256 = −28 )
5. 11 (−1024 = −210 )
6. 9 (−28 = −256 ≤ 128 ≤ 255 = 28 − 1 but 27 − 1 = 127 < 128)
7. 6710 = 010000112 (67 = 64 + 2 + 1 = 26 + 21 + 20 )
8. −10210 = 111001102 (102 = 64 + 32 + 4 + 2 = 26 + 25 + 22 + 21 )
9. −9610 = 111000002 (96 = 64 + 32 = 26 + 25 )
10. 12510 = 011111012 (125 = 64+32+16+8+4+1 = 26 +25 +24 +23 +22 +20 )
1 bar :
2 a d d i sp , sp , −32
3 sw s0 , 2 8 ( s p )
4 a d d i s0 , sp , 3 2
5 sw ra , −4( s 0 )
6 lw t0 , 0 ( a0 )
7 lw t1 , 0 ( a1 )
8 a n d i t2 , a2 , 1
9 beq t2 , z e r o , l a b e l
10 a d d i t2 , z e r o , t 0
11 a d d i t0 , z e r o , t 1
12 a d d i t1 , z e r o , t 2
13 l a b e l :
14 sw t0 , 0 ( a0 )
15 sw t1 , 0 ( a1 )
16 lw ra , −4( s 0 )
17 lw s0 , 2 8 ( s p )
18 a d d i sp , sp , 3 2
19 j a l r zero , 0 ( ra )
1. Explain what the input arguments and output results are and what the
bar function does.
2. Assuming the input arguments are random and independent what is the
average number of clock cycles taken by the bar function?
3. Do you think this code is correct? If not explain what is wrong with it
and write a new code with the errors fixed.
4. Could the code be optimized for speed? If yes propose a faster version and
calculate the new average number of clock cycles per bar execution.
1. bar takes 3 input parameters in registers a0, a1 and a2. a0 and a1 are
the addresses in memory of two 32-bits words. a2 is an integer value. bar
loads the two 32-bits words. It then swaps them if a2 is odd. Finally it
stores back the two 32-bits words in memory. There is no output result
(or, equivalently, the output results are the unmodified input parameters).
2. If a2 is even bar takes 14 clock cycles. If a2 is odd bar takes 17 clock
cycles. In average bar takes (14 + 17)/2 = 15.5 clock cycles.
8
3. The code is not correct because registers s0 and ra are saved at the same
position in the stack frame (if s0 = sp + 32, 28(sp) = -4(s0)). Storing
and restoring ra at address -8(s0) instead of -4(s0) suffices to fix the
bug:
1 bar :
2 a d d i sp , sp , −32
3 sw s0 , 2 8 ( s p )
4 a d d i s0 , sp , 3 2
5 sw ra , −8( s 0 )
6 lw t0 , 0 ( a0 )
7 lw t1 , 0 ( a1 )
8 a n d i t2 , a2 , 1
9 beq t2 , z e r o , l a b e l
10 a d d i t2 , z e r o , t 0
11 a d d i t0 , z e r o , t 1
12 a d d i t1 , z e r o , t 2
13 l a b e l :
14 sw t0 , 0 ( a0 )
15 sw t1 , 0 ( a1 )
16 lw ra , −8( s 0 )
17 lw s0 , 2 8 ( s p )
18 a d d i sp , sp , 3 2
19 j a l r zero , 0 ( ra )
4. The code could be optimized by not using the stack, by testing a2 first,
and with a smarter swap of the two 32-bits words:
1 bar :
2 andi t0 , a2 , 1
3 beq t0 , zero , l a b e l
4 lw t0 , 0 ( a0 )
5 lw t1 , 0 ( a1 )
6 sw t0 , 0 ( a1 )
7 sw t1 , 0 ( a0 )
8 label :
9 jalr zero , 0 ( ra )