Optional Hardware Solution
Optional Hardware Solution
Optional Hardware Solution
1 Verilog
Please answer the following four questions about Verilog.
(a) Does the following code result in a sequential circuit or a combinational circuit? Explain why.
1 module concat ( input clk , input data_in1 , input data_in2 ,
2 output reg [1:0] data_out );
3 always @ ( posedge clk , data_in1 )
4 if ( data_in1 )
5 data_out = { data_in1 , data_in2 };
6 else if ( data_in2 )
7 data_out = { data_in2 , data_in1 };
8 endmodule
Explanation.
This code results in a sequential circuit because data_in2 is not in the sensitivity list, and thus a latch is
inferred for data_out.
1/25
(b) The input clk is a clock signal. What is the hexadecimal value of the output c right after the third
positive edge of clk if initially c = 8’hE3 and a = 4’d8 and b = 4’o2 during the entire time?
1 module mod1 ( input clk , input [3:0] a , input [3:0] b , output reg [7:0] c );
2 always @ ( posedge clk )
3 begin
4 c <= {c , &a , | b };
5 c [0] <= ^ c [7:6];
6 end
7 endmodule
Explanation.
Cycle 1: c <= {c, &a, |b} → c <= {1110_0011, 0, 1} → c <= {1000_1101}
c[0] <= ˆc[7:6] → c[0] <= ˆ{11} → c[0] <= 0
At the first positive edge of clk, c = 80 b1000_1100
Cycle 2: c <= {c, &a, |b} → c <= {1000_1100, 0, 1} → c <= {0011_0001}
c[0] <= ˆc[7:6] → c[0] <= ˆ{10} → c[0] <= 1
At the second positive edge of clk, c = 80 b0011_0001
Cycle 3: c <= {c, &a, |b} → c <= {0011_0001, 0, 1} → c <= {1100_0101}
c[0] <= ˆc[7:6] → c[0] <= ˆ{00} → c[0] <= 0
At the third positive edge of clk, c = 80 b1100_0100 → c = 80 hC4
Note that since the assignments to c are non-blocking, c[7 : 6] in line 5 is not affected by the assignment
to c in line 4 in the same cycle.
(c) Is the following code syntactically correct? If not, please explain the mistake(s) and how to fix it/them.
1 module 1 nn3r ( input [3:0] d , input op , output [1:0] s );
2 assign s = op ? ( d [1:0] - d [3:2]) :
3 ( d [3:2] + d [1:0]);
4 endmodule
5
6 module top ( input wire [6:0] instr , input wire op , output reg z );
7
8 reg [1:0] r1 , r2 ;
9
14 endmodule
2/25
Answer and concise explanation:
The code is not syntactically correct.
Explanation.
• Module names cannot start with a number → ’1nn3r’ is not a legal module name.
• The output signal ’z’ has to be declared as a ’wire’ but not ’reg’.
• ’r1’ and ’r2’ has to be declared as ’wire’s.
• The module ’1nn3r’ does not have ports named ’instr’ and ’z’. Those need to be changed to ’d’ and
’s’, respectively.
3/25
(d) Does the following code correctly implement a counter that counts from 1 to 11 by increments of 2 (e.g.,
1, 3, 5, 7, 9, 11, 1, 3 ...)? If so, say "Correct". If not, correct the code with minimal modification.
1 module odd_counter ( clk , count );
2 input wire clk ;
3 output reg [2:0] count ;
4 reg [2:0] count_next ;
5
6 always@ *
7 begin
8 count_next = count ;
9 if ( count != 11)
10 count_next = count_next + 2;
11 else
12 count_next <= 1;
13 end
14
Explanation.
The correct implementation:
always@* begin
count_next = count;
if (count != 11)
count_next = count_next + 2;
else
count_next = 1;
end
always@(posedge clk)
count <= count_next;
endmodule
4/25
(e) Does the following code correctly instantiate a 4-bit adder? If so, say "Correct". If not, correct the code
with minimal modification.
1 module adder ( input a , input b , input c , output sum , output carry );
2 assign sum = a ^ b ^ c ;
3 assign carry = ( a & b ) | ( b & c ) | ( c & a );
4 endmodule
5
7 module adder_4bits ( input [3:0] a , input [3:0] b , output [3:0] sum , output carry );
8 wire [2:0] s ;
9
5/25
2 Finite State Machine
You are given the following FSM with two one-bit input signals (TA and TB ) and one two-bit output signal
(O). You need to implement this FSM, but you are unsure about how you should encode the states. Answer
the following questions to get a better sense of the FSM and how the three different types of state encoding
we dicussed in the lecture (i.e., one-hot, binary, output) will affect the implementation.
TA
A __ B
O: 10 TB O: 11
TB __ __
TA TB
C D
O: 01 O: 00
TB
(a) There is one critical component of an FSM that is missing in this diagram. Please write what is missing
in the answer box below.
The reset line or indication for initial state.
6/25
(c) List one major advantage of each type of state encoding below.
One-hot encoding
reduces next-state logic
Output encoding
reduces the output logic
(d) Fully describe the FSM with equations given that the states are encoded with one-hot encoding. Assign
state encodings such that numerical values of states increase monotonically for states A through D while
using the minimum possible number of bits to represent the states with one-hot encoding. Indicate the
values you assign to each state and simplify all equations:
State assignments: A: 0001, B: 0010, C: 0100, D: 1000
NS[3] = TB * TS[3] + TS[2]
NS[2] = TB * TS[0] + T A * TS[1]
NS[1] = T B * (TS[0] + TS[3])
NS[0] = TS[1] * TA
O[1] = TS[0] + TS[1]
O[0] = TS[1] + TS[2]
7/25
(e) Fully describe the FSM with equations given that the states are encoded with binary encoding. Assign
state encodings such that numerical values of states increase monotonically for states A through D while
using the minimum possible number of bits to represent the states with binary encoding. Indicate the
values you assign to each state and simplify all equations:
State assignments: A: 00, B: 01, C: 10, D: 11
NS[1] = T S[1] * (T S[0] * TB + TS[0] T A) + TS[1] * (T S[0] + TS[0] * TB)
NS[0] = T S[1] * T S[0] * T B + TS[1]
O[1] = T S[1]
O[0] = TS[1] XOR TS[0]
(f) Fully describe the FSM with equations given that the states are encoded with output encoding. Use
the minimum possible number of bits to represent the states with output encoding. Indicate the values
you assign to each state and simplify all equations:
State assignments: A: 10, B: 11, C: 01, D: 00
NS[1] = TS[1] * T S[0] * T B + TS[1] * TS[0] * TA + T S[1] * T S[0] * T B
= T S[0] * T B + TS[1] * TS[0] * TA
NS[0] = TS[1] * T S[0] + TS[1] * TS[0] * T A + T S[1] * T S[0] * T B
= TS[1] * (T S[0] + TS[0] * T A) + T S[1] * T S[0] * T B
O[1] = TS[1]
O[0] = TS[0]
8/25
(g) Assume the following conditions:
• We can only implement our FSM with 2-input AND gates, 2-input OR gates, and D flip-flops.
• 2-input AND gates and 2-input OR gates occupy the same area.
• D flip-flops occupy 3x the area of 2-input AND gates.
Which state-encoding do you choose to implement in order to minimize the total area of this FSM?
one-hot: 10 logic gates, 4 FFs
binary: 16 logic gates, 2 FFs
output: 10 logic gates, 2 FFs
9/25
3 Big versus Little Endian Addressing
Consider the 32-bit hexadecimal number 0xcafe2b3a.
1. What is the binary representation of this number in little endian format? Please clearly mark the bytes
and number them from low (0) to high (3).
3a 2b fe ca
0 1 2 3
2. What is the binary representation of this number in big endian format? Please clearly mark the bytes
and number them from low (0) to high (3).
ca fe 2b 3a
0 1 2 3
10/25
4 The MIPS ISA
4.1 Warmup: Computing a Fibonacci Number
The Fibonacci number Fn is recursively defined as
F (n) = F (n − 1) + F (n − 2),
where F (1) = 1 and F (2) = 1. So, F (3) = F (2) + F (1) = 1 + 1 = 2, and so on. Write the MIPS assembly
for the fib(n) function, which computes the Fibonacci number F (n):
int fib(int n)
{
int a = 0;
int b = 1;
int c = a + b;
while (n > 1) {
c = a + b;
a = b;
b = c;
n--;
}
return c;
}
Remember to follow MIPS calling convention and its register usage (just for your reference, you may not
need to use all of these registers):
• The argument n is passed in register $4.
• The result (i.e., c) should be returned in $2.
• $8 to $15 are caller-saved temporary registers.
• $16 to $23 are callee-saved temporary registers.
• $29 is the stack pointer register.
• $31 stores the return address.
Note: A summary of the MIPS ISA is provided at the end of this handout.
11/25
NOTE: More than one correct solution exists, this is just one potential solution.
fib:
addi $sp, $sp, -16 // allocate stack space
sw $16, 0($sp) // save r16
add $16, $4, $0 // r16 for arg n
sw $17, 4($sp) // save r17
add $17, $0, $0 // r17 for a, init to 0
sw $18, 8($sp) // save r18
addi $18, $0, 1 // r18 for b, init to 1
sw $31, 12($sp) // save return address
add $2, $17, $18 // c = a + b
branch:
slti $3, $16, 2 // use r3 as temp
bne $3, $0, done
add $2, $17, $18 // c = a + b
add $17, $18, $0 // a = b
add $18, $2, $0 // b = c
addi $16, $16, -1 // n = n - 1
j branch
done:
lw $31, 12($sp) // restore r31
lw $18, 8($sp) // restore r18
lw $17, 4($sp) // restore r17
lw $16, 0($sp) // restore r16
addi $sp, $sp, 16 // restore stack pointer
jr $31 // return to caller
12/25
4.2 MIPS Assembly for REP MOVSB
MIPS is a simple ISA. Complex ISAs—such as Intel’s x86—often use one instruction to perform the function
of many instructions in a simple ISA. Here you will implement the MIPS equivalent for a single Intel x86
instruction, REP MOVSB, which is specified as follows.
The REP MOVSB instruction uses three fixed x86 registers: ECX (count), ESI (source), and EDI
(destination). The “repeat” (REP) prefix on the instruction indicates that it will repeat ECX times. Each
iteration, it moves one byte from memory at address ESI to memory at address EDI, and then increments
both pointers by one. Thus, the instruction copies ECX bytes from address ESI to address EDI.
(a) Write the corresponding assembly code in MIPS ISA that accomplishes the same function as this in-
struction. You can use any general purpose register. Indicate which MIPS registers you have chosen to
correspond to the x86 registers used by REP MOVSB. Try to minimize code size as much as possible.
Assume: $1 = ECX, $2 = ESI, $3 = EDI
(b) What is the size of the MIPS assembly code you wrote in (a), in bytes? How does it compare to REP
MOVSB in x86 (note: REP MOVSB occupies 2 bytes)?
The size of the MIPS assembly code is 4 bytes × 7 = 28 bytes, as compared to 2 bytes for x86 REP
MOVSB.
(c) Assume the contents of the x86 register file are as follows before the execution of the REP MOVSB:
EAX: 0xccccaaaa
EBP: 0x00002222
ECX: 0xFEE1DEAD
EDX: 0xfeed4444
ESI: 0xdecaffff
EDI: 0xdeaddeed
EBP: 0xe0000000
ESP: 0xe0000000
Now, consider the MIPS assembly code you wrote in (a). How many total instructions will be executed
by your code to accomplish the same fuction as the single REP MOVSB in x86 accomplishes for the
given register state?
The count (value in ECX) is 0xfee1dead = 4276215469. Therefore, the loop body is executed 4276215469
times. As there are 6 instructions in the loop body, total instructions executed = 6 * 4276215469 + 1 =
25657292814 + 1 (beq instruction outside of the loop) = 25657292815.
13/25
(d) Assume the contents of the x86 register file are as follows before the execution of the REP MOVSB:
EAX: 0xccccaaaa
EBP: 0x00002222
ECX: 0x00000000
EDX: 0xfeed4444
ESI: 0xdecaffff
EDI: 0xdeaddeed
EBP: 0xe0000000
ESP: 0xe0000000
Now, answer the same question in (c) for the above register values.
The count (value in ECX) is 0x00000000 = 0. Therefore, the loop body is executed 0 times. Total
instructions executed = 1 (beq instruction outside of the loop).
14/25
5 Data Flow Programs
Draw the data flow graph for the fib(n) function from Question 4.1. You may use the following data flow
nodes in your graph:
• + (addition)
• > (left operand is greater than right operand)
• Copy (copy the value on the input to both outputs)
• BR (branch, with the semantics discussed in class, label the True and False outputs)
You can use constant inputs (e.g., 1) that feed into the nodes. Clearly label all the nodes, program
inputs, and program outputs. Try to the use fewest number of data flow nodes possible.
15/25
16/25
6 Microarchitecture vs. ISA
a) Briefly explain the difference between the microarchitecture level and the ISA level in the transformation
hierarchy. What information does the compiler need to know about the microarchitecture of the machine
in order to compile a given program correctly?
The ISA level is the interface a machine exposes to the software. The microarchitecture is the actual
underlying implementation of the machine. Therefore, the microarchitecture and changes to the microar-
chitecture are transparent to the compiler/programmer (except in terms of performance), while changes to
the ISA affect the compiler/programmer. The compiler does not need to know about the microarchitecture
of the machine in order to compile the program correctly
b) Classify the following attributes of a machine as either a property of its microarchitecture or ISA:
Microarchitecture? ISA? Attribute
X The machine does not have a subtract instruction
X The ALU of the machine does not have a subtract unit
X The machine does not have condition codes
X A 5-bit immediate can be specified in an ADD instruction
X It takes n cycles to execute an ADD instruction
X There are 8 general purpose registers
X A 2-to-1 mux feeds one of the inputs to ALU
X The register file has one input port and two output ports
17/25
7 Performance Metrics
• If a given program runs on a processor with a higher frequency, does it imply that the processor always
executes more instructions per second (compared to a processor with a lower frequency)? (Use less than
10 words.)
No, the lower frequency processor might have much higher IPC (instructions per cycle).
More detail: A processor with a lower frequency might be able to execute multiple instructions per cycle
while a processor with a higher frequency might only execute one instruction per cycle.
• If a processor executes more of a given program’s instructions per second, does it imply that the processor
always finishes the program faster (compared to a processor that executes fewer instructions per second)?
(Use less than 10 words.)
No, because the former processor may execute many more instructions.
More detail: The total number of instructions required to execute the full program could be different on
different processors.
18/25
8 Performance Evaluation
Your job is to evaluate the potential performance of two processors, each implementing a different ISA. The
evaluation is based on its performance on a particular benchmark. On the processor implementing ISA A,
the best compiled code for this benchmark performs at the rate of 10 IPC. That processor has a 500 MHz
clock. On the processor implementing ISA B, the best compiled code for this benchmark performs at the
rate of 2 IPC. That processor has a 600 MHz clock.
• What is the performance in Millions of Instructions per Second (MIPS) of the processor implementing
ISA A?
ISA A: 10 instructions
cycle
cycle
∗ 500, 000, 000 second = 5000 MIPS
19/25
9 Single-Cycle Processor Datapath
In this problem, you will modify the single-cycle datapath we built up in Lecture 11 to support the JAL
instruction. The datapath that we will start with is provided below. Your job is to implement the necessary
data and control signals to support the JAL instruction, which we define to have the following semantics:
JAL : R31 ← PC + 4
PC ← PC31...28 || Immediate || 02
Add to the datapath on the next page the necessary data and control signals to implement the JAL instruction.
Draw and label all components and wires very clearly (give control signals meaningful names; if selecting a
subset of bits from many, specify exactly which bits are selected; and so on).
20/25
10 REP MOVSB
Let’s say you are the lead architect of the next flagship processor at Advanced Number Devices (AND). You
have decided that you want to use the LC-3b ISA for your next product, but your customers want a smaller
semantic gap and marketing is on your case about it. So, you have decided to implement your favorite x86
instruction, REP MOVSB, in LC-3b.
Specifically, you want to implement the following definition for REP MOVSB (in LC-3b parlance): REP-
MOVSB SR1, SR2, DR which is encoded in LC-3b machine code as:
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1010 DR SR1 0 0 0 SR2
REPMOVSB uses three registers: SR1 (count), SR2 (source), and DR (destination). It moves a byte
from memory at address SR2 to memory at address DR, and then increments both pointers by one. This is
repeated SR1 times. Thus, the instruction copies SR1 bytes from address SR2 to address DR. Assume that
the value in SR1 is greater than or equal to zero.
1. Complete the state diagram shown below, using the notation of the LC-3b state diagram. Describe
inside each bubble what happens in each state and assign each state an appropriate state number. Add
additional states not present in the original LC-3b design as you see fit.
21/25
HW 3 Solutions: Microprogramming Wrap-up and Pipelining
10, 46
SR1 <− SR1 − 1 To 18
[REP] REP = 0
REP = 1
50
51
40
R=0 MDR <− M[MAR[15:1]’0]
R=1
42
MAR <− DR
43
DR <− DR + 1
44
R=0 M[MAR[15:1]’0]<−MDR
R=1
To 46
1/8
22/25
2. Add to the LC-3b datapath any additional structures and any additional control signals needed to
implement REPMOVSB. Clearly label your additional control signals with descriptive names. Describe
what value each control signal would take to control the datapath in a particular way.
Modifications to the data path
REP COND[2]
Bus[15]
J[5]
REGFILE ......
SR2 SR2 6
OUT OUT
REP
16 16 6
Microsequencer Modifications
SR2
MUX
+1 −1 IR[11:9] IR[11:9]
INCDEC INC/DEC 111
DR IR[8:6] SR1
2 MUX IR[8:6]
IR[2:0] IR[2:0]
ALU 2 2
16 DRMUX SR1MUX
Regfile Modifications DRMUX Modifications SR1MUX Modifications
• DRMUX/2:
IR[11:9] ;destination IR[11:9]
R7 ;destination R7
IR[8:6] ;destination IR[8:6]
IR[2:0] ;destination IR[2:0]
• SR1MUX/2:
IR[11:9] ;source IR[11:9]
IR[8:6] ;source IR[8:6]
IR[2:0] ;source IR[2:0]
• COND/3:
23/25
COND0 : Unconditional
COND1 : Memory Ready
COND2 : Branch
3. Describe any changes you need to make to the LC-3b microsequencer. Add any additional logic and
control signals you need. Clearly describe the purpose and function of each signal and the values it
would take to control the microsequencer in a particular way.
Additional control signals
• INCDEC/2: PASSSR2, +1, -1
• DRMUX/2:
– IR[11:9] ;destination IR[11:9]
– R7 ;destination R7
– IR[8:6] ;destination IR[8:6]
– IR[2:0] ;destination IR[2:0]
• SR1MUX/2:
– IR[11:9] ;source IR[11:9]
– IR[8:6] ;source IR[8:6]
– IR[2:0] ;source IR[2:0]
• COND/3:
– COND0: Unconditional
– COND1: Memory Ready
– COND2: Branch
– COND3: Addressing Mode
– COND4: Repeat
24/25
MIPS Instruction Summary
Opcode Example Assembly Semantics
add add $1, $2, $3 $1 = $2 + $3
sub sub $1, $2, $3 $1 = $2 - $3
add immediate addi $1, $2, 100 $1 = $2 + 100
add unsigned addu $1, $2, $3 $1 = $2 + $3
subtract unsigned subu $1, $2, $3 $1 = $2 - $3
add immediate unsigned addiu $1, $2, 100 $1 = $2 + 100
multiply mult $2, $3 hi, lo = $2 * $3
multiply unsigned multu $2, $3 hi, lo = $2 * $3
divide div $2, $3 lo = $2/$3, hi = $2 mod $3
divide unsigned divu $2, $3 lo = $2/$3, hi = $2 mod $3
move from hi mfhi $1 $1 = hi
move from low mflo $1 $1 = lo
and and $1, $2, $3 $1 = $2 & $3
or or $1, $2, $3 $1 = $2 | $3
and immediate andi $1, $2, 100 $1 = $2 & 100
or immediate ori $1, $2, 100 $1 = $2 | 100
shift left logical sll $1, $2, 10 $1 = $2 « 10
shift right logical srl $1, $2, 10 $1 = $2 » 10
load word lw $1, 100($2) $1 = memory[$2 + 100]
store word sw $1, 100($2) memory[$2 + 100] = $1
load upper immediate lui $1, 100 $1 = 100 « 16
branch on equal beq $1, $2, label if ($1 == $2) goto label
branch on not equal bne $1, $2, label if ($1 != $2) goto label
set on less than slt $1, $2, $3 if ($2 < $3) $1 = 1 else $1 = 0
set on less than immediate slti $1, $2, 100 if ($2 < 100) $1 = 1 else $1 = 0
set on less than unsigned sltu $1, $2, $3 if ($2 < $3) $1 = 1 else $1 = 0
set on less than immediate sltui $1, $2, 100 if ($2 < 100) $1 = 1 else $1 = 0
jump j label goto label
jump register jr $31 goto $31
jump and link jal label $31 = PC + 4; goto label
25/25