Processor Report
Processor Report
The Stackintosh
Group 4F
Table of Contents
Introduction ..
Executive Summary ..
Instruction Format
RTL
Supported Instructions
Control Signals .
10
Datapath Diagram
11
12
Xilinx Implementation ..
14
Testing Plan .
14
16
Performance ..
17
18
Conclusion .
19
Appendices
20
20
21
22
23
Introduction
Our processor, the Stackintosh, will perform all operations using a stack. Arguments are pushed
onto the stack before operations are called. These elements are then popped and replaced by the
value resulting from the instruction. The tp register keeps track of the top of this stack, and this
value is automatically adjusted to account for items being pushed and popped. The mtp
instruction can be used to explicitly move this pointer, which is useful for procedure calls. All
operations, unless otherwise specified, will add, remove, and/or manipulate items on the stack.
Executive Summary
This report introduces a new miniscule instruction set general purpose processor centered around
a stack-based architecture. Methods of analysis include a look into the types and purposes of
registered used, the instruction formats used, and the RTL and descriptions of all available
instructions. Once the specifications of the instruction set itself are established, the report will delve
deeper into how to perform common operations such as looping, pushing a value onto the stack,
and conditional statements using this instruction set. Next, an analysis of the hardware
implementation of a datapath that supports these instructions is provided. This includes
descriptions of the different components and control signals that monitor the flow of data, as well
as diagrams depicting a high level overview of the datapath itself and its corresponding finite state
diagram. A description of how to implement this datapath in the Xilinx ISE is provided and the
detailed testing plan used to ensure the validity of the processor is also provided. Finally, our
assembly implementation of determining the greatest common divisor (GCD) using Euclid's
Algorithm is then shown, followed by various performance metrics analyzing the efficiency of the
processor.
For a more detailed look at the design and implementation of our processor, such as the Xilinx
schematic, sample test benches, and the Java implementation of our assembler (with examples of
assembly-to-machine code conversions), refer to the Appendices section of this report.
This report finds that the Stackintosh, if it were to be commercially sold, would be very affordable to
the general public since it only has one dedicated register (tp). However, the primary drawback is
the fact that executing instructions in a stack-based architecture requires many memory accesses,
which reduces the overall performance of the processor.
Recommendations discussed include:
Getting the result of relative prime to display on the LCD display
Merging identical control states together
Reducing the number of control signals
Some limitations of this report include:
Although our processor successfully calculates relative prime using Euclids Algorithm, the
extent to which it has been tested for other procedures is limited, except for the simpler
tasks that must be done in order to calculate relative prime.
All performance data was collected using a simulated version of our processor in Xilinx.
This data may not hold for an actual, pure hardware version of the processor.
2
Registers Used
tp
Holds the address of the top of the stack.
B, C, D, E, F, G
Keep track of values in the datapath between cycles.
Display Reg.
Holds whatever value (in this case, the solution to relative prime) will be displayed to the LCD
screen of the FPGA board.
Instruction Format
M-type
In an M-type instruction, the 8-bit immediate refers to an address in memory.
Opcode
Address
(5 bits)
(11 bits)
I-type
In an I-type instruction, the 8-bit immediate refers to an immediate value.
Opcode
Immediate
(5 bits)
(11 bits)
RTL
Instruction
B = tp 2
G = tp - 2
C = Mem[tp]
D = Mem[B]
F = C op D
C = Mem[tp]
C = Mem[C]
Mem[tp] = C
E = PC +
SE(IR[10:0])
B = tp - 2
C = Mem[tp]
D = Mem[B]
G = tp 4
CD
tp = G
if(beq:(zero==1) ||
bne:(zero!=1)) PC = E
B = tp +
SE(IT[10:0])
D = Mem[B]
B = tp + 2
G = tp + 2
F=D+0
tp = G
Mem[B] = F
jal
F = PC
PC = PC[15:11] ||
ZE(IR[10:0])<<1
B = tp + 2
G = tp + 2
Mem[B] = F
tp = G
Mem[B] = C
tp = G
pushv <addr>
B = tp + 2
G = tp + 2
C=
Mem(ZE(IR[10:0])
)
C = Mem[tp]
G = tp - 2
Mem[ZE(IR[7:0])
]=C
tp = G
C = Mem[tp]
B = tp + 2
G = tp + 2
Mem[B] = C
tp = G
B = tp - 2
C = Mem[tp]
D = Mem[B]
G = tp - 4
pui
B = tp + 2
G = tp + 2
F=
ZE(IR[10:0])<<11
Mem[B] = F
tp = G
mtp
G = tp +
SE(IR[7:0])
tp = G
I-type
operation
C = Mem[tp]
F = C op
SE(Imm) OR
ZE(Imm)
Mem[tp] = F
B = tp + 2
F = SE(Imm)
Mem[B] = F
sll/sra/srl
C = Mem[tp]
F=
C<<ZE(IR[10:0])
Mem[tp] = F
jumpto
C = Mem[tp]
G = tp - 2
PC = C
tp = G
pushIO
G = tp + 2
tp = G
M-type
Operation
load
beq/bne
PC = PC[15:11] ||
ZE(IR[10:0])
jump
pmv <imm>
pop <addr>
IR = Mem[PC]
PC = PC + 1
dup
store
pushi
display
Mem[B] = F
F=D+0
Mem[C] = F
Mem[tp] = I/O
LEDout = I/O
LCDout = Mem[tp]
Supported Instructions
Category
Instruction
Description
Opcode
add
Adds the first two elements at the top of the stack, decrements tp
to tp - 2, and stores sum in new tp
00000
00001
ANDs the first two elements at the top of the stack, decrements
tp to tp - 2, and stores result in new tp
00010
00011
or
ORs the first two elements at the top of the stack, decrements tp
to tp - 2, and stores result in new tp
00100
ori <imm>
ORs immediate with element at the top of the stack, stores result
in tp (replacing the previous value)
00101
sll <imm>
Performs a logical shift left on the element at the top of the stack
by the distance indicated by immediate, replaces top element with
shifted element
00110
Performs set less than on the top two elements of the stack,
removing them and pushing a 1 onto the top of the stack if the
element in the higher position on the stack is less than the
element in the lower position on the stack. Otherwise, a 0 is
pushed onto the stack
01011
sra <imm>
00111
srl <imm>
Performs a logical shift right on the element at the top of the stack
by the distance indicated by immediate, replaces top element with
shifted element
01000
01001
beq <addr>
01100
bne <addr>
01101
01110
Pushes the return address to the top of the stack and moves the
PC to the specified address
01111
11000
dup
Makes a copy of the value at the top of the stack and pushes it
back onto the stack
10110
load
Assumes that the value at the top of the stack is a valid address
in memory. This address is replaced with the value stored at this
address
10000
10111
addi <imm>
and
andi <imm>
Arithmetic and
Logical
Instructions
slt
sub
Branch
Instructions
jump <addr>
Jump
Instructions
jal <addr>
jumpto
Stack
Instructions
mtp <imm>
peek
pmv <imm>
10011
pop <addr>
Stores the element at the top of the stack to the specified memory
address, decrements tp to tp - 2.
10100
Pushes a 16-bit value onto the stack with the upper 5 bits
consisting of the last 5 bits of the 11-bit immediate. The lower 11
bits will consist of zeroes.
01010
pushi <imm>
10001
pushv <addr>
10010
store
10101
pushIO
11001
pui <imm>
Input/Output
display
Writes whatever is at the top of the stack into the LCDout register
11010
Conditional statements:
pushv 0x0FF
comparison.
pushv Ox0FF
comparison.
beq <newAddr>
of the stack.
Iteration:
The following code demonstrates a procedure to (inefficiently) multiply a value by ten. Let
AddressA = 0x0002, AddressB = 0x0004, and AddressC = 0x0006
mtp 10
pushi 0
pushi 69
pushi 10
pop 4
pop 2
pop 6
timesten:
pushv 2
pushv 6
add
pop 6
pushv 4
addi -1
dup
pop 4
pushi 1
bne timesten
pmv -2
pushi 0
bne 1
jumpto
pmv -2
pushi 1
bne 1
jumpto
pmv -2
addi -1
jal fib
pmv -4
addi -2
jal fib
add
mtp -6
pmv 6
mtp 2
jumpto
fib:
Characteristics
Description
Instruction
Register (IR)
tp Register
16-bit storage
16-bit input A
Sign Extender
(SE)
Zero Extender
(ZE)
Barrel Shifter
Instruction
Memory
Stack Memory
Registers B, C, D,
E, F, G, Display
Reg.
ALU
Control
16-bit input B
3-bit ALUOp input
16-bit output
1-bit overflow detector
1-bit zero detector
16 bit immediate output
Control Signals
Control Signal
PCWrite
PCSrc[1:0]
PCWriteCond
BranchType
Description
Allows the target address to be written to the PC
Indicates the source of the value being written to the PC
Allows the target address to be written to the PC if the condition is met (that is, if the output of the
ALU is zero)
Indicates whether the branch instruction is a bne or a beq
ALUsrcA[1:0]
Sets the source for the first ALU input (a value retrieved from memory, the PC address, tp, etc.)
ALUsrcB[2:0]
Sets the source for the second ALU input a value retrieved from memory, an immediate value, etc.)
ALUOp[2:0]
MemLoad1[1:0]
MemLoad2
MemWriteTp
MemWriteTpMod
RegWrite
Determines the operation that the ALU must perform (and, or, add, sub, slt)
Sets the source for the first address input to the stack memory
Sets the source for the data input to the stack memory
Enables values to be written to the the address in the first memory input (tp)
Enables values to be written to the the address in the second memory input (modified tp, usually tp 2)
Enables the input to tp to be written to the register
MemValSrc
ShiftSrc[1:0]
Determines the source input to the shifter (Memory output or a zero extended immediate)
ShiftVal[1:0]
Determines the value that the number must be shifted by (either an immediate value or a constant)
WriteB
WriteE
WriteF
IRWrite
LCDWrite
10
Datapath Diagram
11
12
13
Xilinx Implementation
*assumes that each component has already been created and functional
The datapath begins with a PC counter, a register that will hold the current address. PC has a wire
output heading to the instruction memory, a 4 bit ALUInputA mux and a concatenator that takes the
top 5 bits (this is used for jump instructions). The instruction memory takes the PC and outputs a
16 bit instruction that goes into a instruction register that splits the instruction into 2 outputs. The
top 5 bits represent the opcode, and they will head to a control unit. The last 11 bits are output and
head into a sign extender and a zero extender, both heading into ALUInputB mux. The zero
extender splits into two separate muxes that head into a barrel shifter, which connects into the
ALUInputB mux, a mux that follows into the value you want to write to memory, and the address
concatenator that PC initially connects to.
The tp register output will head into ALUInputA mux and the memory blocks first address
input. ALUInputA and ALUInputB (which, in addition to the inputs already stated, will have a 2, -2,
and -4 immediate inputs) naturally head into the ALU.
The ALU output branches off into many wires. This output can be written to registers B, E, and G,
or it may be written to memory. Memory will access two separate values based on two separate
address inputs. The first top output will head into the temporary register C that feeds back into the
ALUInputA mux and the top mux that connects into the barrel shifter. The second bottom output
connects to the temporary D register and the ALUInputB mux.
Testing Plan
Initial testing phase COMPLETE
- Ran through our data path with various instructions and made sure there werent any timing
conflicts with component usage, and that every value ended up as it should be (PC ends up as
PC+1, stack memory has the new values stored, etc)
Component testing COMPLETE
- Design the individual components necessary in our design, then write test benches for each
one of them individually before combining them. Extra time was devoted to testing our 2 address
access memory, which will be our first major independent design.
Instruction testing - COMPLETE
- After individual components are verified working, well start connecting individual components
for instruction testing. Well start creating larger subsystems to test, those subsystems being:
1. PC + ALU to ensure PC+1 functions
2. PC + instr memory + instr register to make sure instruction codes output correctly
3. Barrel shifter and associated muxes/zero extender to make sure that it will shift correctly
and all the control signals work as intended.
4. Step 2 + tp reg + control + barrel shifter to make sure that each part of the instruction code
split correctly. Might remove the reg file, but precautionary testing just in case.
14
5. Step 4 + ALU and its associated input muxes, with immediates. Test basic calculations as
well as ensure tp-2 will store itself back in the reg file successfully. Also test jumping to
make sure address concatenation and PC write control signals work.
6. Step 5 + Memory, completing the datapath. Begin testing more advanced instructions that
require memory accesses, beginning with basic stack modifications (adding and subtracting
values from the stack), working to more advanced instructions like branching and pmv.
Real world testing - COMPLETE
- After each instruction is verified working independently, well start writing code for our
processor to see if our instructions will actually work in sequence, to make sure there arent any
issues with stack management in between instructions.
Control Signal Testing
In order to test our control module, well create a test bench that feeds various control codes into
the control module input and monitor what is output to each control signal. The control codes well
feed in are add (000001), sub ( 010001), load (001101), beq (001010), and jump (001100). The
first two are two separate R-type instructions that will test to make sure ALUop controls properly,
and the rest are the other 3 unique control cases. The actual control module will be a state
machine that should output a certain sequence of control signals each clock cycle (refer to our
state diagram for specifics), and we will ensure that the waveforms match the expected. Well also
feed nonsense control signals into the control module to make sure that it doesnt output any faulty
information.
15
#pushes 1 to top
#compare the gcd and 1, go to return if ==
#add 1 to m
#jump to beginning of loop
#move tp to the stack position before n
#pushthe m onto the stack where n was
#move the tp to the return address
#return from this procedure call
#Assume that the stack now contains the arguments a and b, followed
#by the return address
gcd:
pmv -4
pushi 0
bne loop
returnB:
mtp -6
pmv 4
pmv 4
jumpto
#move tp below a
#push b on to overwrite a
#push the return address on and overwrite b
#return from this procedure call
loop:
pmv -2
pushi 0
beq returnA
pmv -4
pmv -4
slt
pushi 1
bne else
pmv -2
pmv -6
16
sub
mtp -8
pmv 8
mtp 4
jump loop
#subtract a - b
#move tp below a
#push a - b to tp + 2 to overwrite a
#move tp back to the return address
#jump to loop start
else:
pmv -4
pmv -4
sub
mtp -6
pmv 6
mtp 2
jump loop
returnA:
mtp -4
pmv 4
jumpto
Performance
Euclids algorithm for determining gcd takes 32 lines of assembly code, which is 64 bytes. The
complete algorithm for finding a relative prime, which includes the gcd function, takes 44 lines of
code, which is 88 bytes. The relPrime function requires 4 bytes of storage space in memory, plus
an additional 10 bytes for its call to the gcd function. Therefore there must be a total of 14 bytes
free in the stack memory to successfully run this algorithm.
When relPrime is called with the value 0x13B0 as the argument, a total of 152968 instructions are
executed. The values in the following table are based solely on running the relPrime algorithm with
this value.
Operation
M-type operation
I-type operation
load
store
beq/bne
jump
jumpto
jal
pushv/pui/pop
pushi
pmv
dup
mtp
sll/sra/srl
I/O instruction
Number of Cycles
5
4
4
5
5
3
4
4
3
4
5
3
3
4
2
Executions in relPrime
20374
9
0
0
20404
10196
11
10
0
20405
61173
0
20386
0
0
Frequency
0.133
5.88*10-5
0
0
0.133
0.067
7.19*10-5
6.54*10-5
0
0.133
0.400
0
0.133
0
0
17
Our processor takes a total of 683241 cycles to execute this function under this condition.
Therefore, our average CPI is 4.467.
According to the Synthesis Report generated by Xilinx, our cycle time is 67.524ns. Therefore, the
execution time for the algorithm with this input is 46.15 milliseconds. This report also included the
following information relevant to our design.
18
Conclusion
At the end of the design and implementation process of this project, our group has been successful in creating a
stack-based processor that can perform relative prime calculations using Euclids Algorithm. However, with a CPI
of 4.467, a cycle time of 67.52 ns, and with an algorithm execution time of 46.15 ms, our performance data
demonstrates why such stack machines are rarely extensively used in industry anymore. The primary
performance disadvantage to the Stackintosh is the fact that it requires significantly more memory references,
despite the fact that our design only requires one dedicated register (tp). Despite these challenges, if the
Stackintosh were to be commercially sold, it would be much more affordable than many of the processors that
are currently on the market, since this processor has only the one dedicated register.
Several improvements can be made to our processor to either improve performance or enhance human
interaction. Namely, these improvements are getting the result of the relative prime calculations to display on
the LCD display of the FPGA boards, merging identical control states together, and reducing the number of
control signals.
Overall, we faced several challenges including minor issues with timing that were not accounted for. Most of
these were fixed with control signals to allow writing to specific registers. Also, creating our own double
memory to simulate a single memory with two outputs raised several issues, as it was difficult to ensure that
inputting a value into the double memory module inputs the same value in each of the single memories that
comprised it. In addition to this, all of the FPGA boards in the CS lab behaved differently with our I/O
components (and one of them even got very hot, and started melting the components), which leads our team to
think that most, if not all, of the FPGA boards are faulty.
In conclusion, despite the relatively low performance speed, we feel that the Stackintosh is a highly capable
processor that can support many different types of instructions.
19
Appendices
Appendix A: Datapath Implementation in Xilinx
20
21
22
23
#10;
a = 32767;
b = 0;
#10;
a = 1;
b = 32767;
#10;
a = 16;
b = 42;
#10;
a = -1;
b = -1;
#10;
a = -5;
b = 16;
#10;
a = -32768;
b = -32768;
#10;
a = -32768;
b = 32767;
#100;
op = 1;
a = 0;
b = 0;
#10;
a = 32767;
b = 32767;
#10;
a = 0;
b = 32767;
#10;
a = 32767;
b = 0;
#10;
a = 1;
b = 32767;
#10;
a = 16;
b = 42;
#10;
a = -1;
b = -1;
24
#10;
a = -5;
b = 16;
#10;
a = -32768;
b = -32768;
#10;
a = -32768;
b = 32767;
#100;
op = 2;
a = 0;
b = 0;
#10;
a = 32767;
b = 32767;
#10;
a = 0;
b = 32767;
#10;
a = 32767;
b = 0;
#10;
a = 1;
b = 32767;
#10;
a = 16;
b = 42;
#10;
a = -1;
b = -1;
#10;
a = -5;
b = 16;
#10;
a = -32768;
b = -32768;
#10;
a = -32768;
b = 32767;
#100;
op = 7;
25
a = 0;
b = 0;
#10;
a = 32767;
b = 32767;
#10;
a = 0;
b = 32767;
#10;
a = 32767;
b = 0;
#10;
a = 1;
b = 32767;
#10;
a = 16;
b = 42;
#10;
a = -1;
b = -1;
#10;
a = -5;
b = 16;
#10;
a = -32768;
b = -32768;
#10;
a = -32768;
b = 32767;
#100;
op = 00;
a = 0;
b = 0;
#10;
a = 32767;
b = 32767;
#10;
a = 0;
b = 32767;
#10;
a = 32767;
b = 0;
26
#10;
a = 1;
b = 32767;
#10;
a = 16;
b = 42;
#10;
a = -1;
b = -1;
#10;
a = -5;
b = 16;
#10;
a = -32768;
b = -32768;
#10;
a = -32768;
b = 32767;
end
endmodule
Barrel Shifter
`timescale 1ns / 1ps
////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date:
16:13:29 11/10/2014
// Design Name:
BarrelShifter
// Module Name:
C:/Users/robinsat/Documents/Courses/CSSE232/UpdatedImplementation/Datapath/barrelt
b.v
// Project Name: Datapath
// Target Device:
// Tool versions:
// Description:
//
// Verilog Test Fixture created by ISE for module: BarrelShifter
//
// Dependencies:
//
// Revision:
27
Control
`timescale 1ns / 1ps
////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date:
20:26:11 11/18/2014
// Design Name:
control
28
// Module Name:
C:/Users/robinsat/Documents/Courses/CSSE232/IODatapath3/Datapath/control_tb.v
// Project Name: Datapath
// Target Device:
// Tool versions:
// Description:
//
// Verilog Test Fixture created by ISE for module: control
//
// Dependencies:
//
// Revision:
// Revision 0.01 - File Created
// Additional Comments:
//
////////////////////////////////////////////////////////////////////////////////
module control_tb;
// Inputs
reg [4:0] Opcode;
reg CLK;
reg Start;
reg Reset;
// Outputs
wire [2:0] ALUop;
wire [1:0] ALUsrcA;
wire [2:0] ALUsrcB;
wire RegWrite;
wire MemWriteTp;
wire MemWriteTpMod;
wire PCWrite;
wire PCWriteCond;
wire [2:0] PCSrc;
wire [1:0] MemLoad1;
wire MemLoad2;
wire [1:0] MemValSrc;
wire [1:0] ShiftSrc;
wire [1:0] ShiftVal;
wire BranchType;
wire [1:0] shiftType;
wire tpInit;
wire WriteB;
wire WriteE;
wire WriteF;
wire IRwrite;
wire [5:0] current_state;
wire [5:0] next_state;
// Instantiate the Unit Under Test (UUT)
control uut (
.ALUop(ALUop),
.ALUsrcA(ALUsrcA),
.ALUsrcB(ALUsrcB),
.RegWrite(RegWrite),
.MemWriteTp(MemWriteTp),
29
.MemWriteTpMod(MemWriteTpMod),
.PCWrite(PCWrite),
.PCWriteCond(PCWriteCond),
.PCSrc(PCSrc),
.Opcode(Opcode),
.MemLoad1(MemLoad1),
.MemLoad2(MemLoad2),
.MemValSrc(MemValSrc),
.ShiftSrc(ShiftSrc),
.ShiftVal(ShiftVal),
.BranchType(BranchType),
.shiftType(shiftType),
.tpInit(tpInit),
.WriteB(WriteB),
.WriteE(WriteE),
.WriteF(WriteF),
.IRwrite(IRwrite),
.current_state(current_state),
.next_state(next_state),
.CLK(CLK),
.Start(Start),
.Reset(Reset)
);
initial begin
// Initialize Inputs
Opcode = 0;
CLK = 0;
Start = 1;
Reset = 0;
//
//
//
//
//
//
//
//
#150;
Opcode = 1; // addi
#20
Start = 0;
//
//
//
//
#150; //
Opcode = 2; // and
#20
Start = 0;
//
//
//
//
#150; //
Opcode = 3; // andi
#20
Start = 0;
//
//
//
//
#150; // **
Opcode = 4; // or
#20
Start = 0;
30
//
//
//
//
#150; // ~~
Opcode = 5; // ori
#20
Start = 0;
//
//
//
//
#150; //
Opcode = 6; // sll
#20
Start = 0;
//
//
//
//
#150; //
Opcode = 7; // sra
#20
Start = 0;
//
//
//
//
#150; //
Opcode = 8; // srl
#20
Start = 0;
//
//
//
//
#150; // **
Opcode = 9; // sub
#20
Start = 0;
//
//
//
//
#150; // ~~
Opcode = 10; // pui
#20
Start = 0;
//
//
//
//
#150; // **
Opcode = 11; // slt
#20
Start = 0;
//
//
//
//
#150;
Opcode = 12; // beq
#20
Start = 0;
//
//
//
//
#150;
Opcode = 13; // bne
#20
Start = 0;
//
//
//
//
#150;
Opcode = 14; // jump
#20
Start = 0;
//
//
//
//
#150;
Opcode = 15; // jal
#20
Start = 0;
//
//
#150;
Opcode = 16; // load
31
//
//
#20
Start = 0;
//
//
//
//
#150;
Opcode = 17; // pushi
#20
Start = 0;
//
//
//
//
#150;
Opcode = 18; // pushv
#20
Start = 0;
//
//
//
//
#150;
Opcode = 19; // pmv
#20
Start = 0;
//
//
//
//
#150;
Opcode = 20; // pop
#20
Start = 0;
//
//
//
//
#150;
Opcode = 21; // store
#20
Start = 0;
//
//
//
//
#150;
Opcode = 22; // dup
#20
Start = 0;
//
//
//
//
#150;
Opcode = 23; // mtp
#20
Start = 0;
end
endmodule
Datapath
// Verilog test fixture created from schematic
C:\Users\robinsat\Documents\Courses\CSSE232\UpdatedImplementation\Datapath\Datapat
h.sch - Sun Nov 09 17:06:40 2014
`timescale 1ns / 1ps
module Datapath_Datapath_sch_tb();
32
// Inputs
reg CLK;
reg StartInput;
reg memCLK;
// Output
wire [15:0] IR;
wire [15:0] ShiftSource;
wire [15:0] ShiftValue;
wire [15:0] PCout;
wire [15:0] tpCurrent;
wire [2:0] ALUop;
wire [5:0] next_state;
wire [5:0] current_state;
wire [15:0] memoutC;
wire [15:0] memoutD;
wire [15:0] RegEContents;
wire [15:0] ALUoutput;
wire [15:0] ShiftOut;
wire [15:0] memInput;
wire [15:0] memAddr2;
wire [15:0] memAddr1;
wire [15:0] ALUinB;
wire [15:0] ALUinA;
// Bidirs
// Instantiate the UUT
Datapath UUT (
.CLK(CLK),
.IR(IR),
.ShiftSource(ShiftSource),
.ShiftValue(ShiftValue),
.PCout(PCout),
.tpCurrent(tpCurrent),
.ALUop(ALUop),
.next_state(next_state),
.current_state(current_state),
.StartInput(StartInput),
.memoutC(memoutC),
.memoutD(memoutD),
.RegEContents(RegEContents),
.ALUoutput(ALUoutput),
.memCLK(memCLK),
.ShiftOut(ShiftOut),
.memInput(memInput),
.memAddr2(memAddr2),
.memAddr1(memAddr1),
.ALUinB(ALUinB),
.ALUinA(ALUinA)
);
// Initialize Inputs
initial begin
CLK = 0;
memCLK = 0;
StartInput = 1;
33
#20
StartInput = 0;
end
always
#5 memCLK = !memCLK;
always
#10 CLK = !CLK;
endmodule
Stack Memory
// Verilog test fixture created from schematic
C:\Users\robinsat\Documents\Courses\CSSE232\UpdatedImplementation\Datapath\StackMe
m.sch - Sun Nov 09 14:19:30 2014
`timescale 1ns / 1ps
module StackMem_StackMem_sch_tb();
// Inputs
reg CLK;
reg [15:0] DataIn;
reg memWriteTp;
reg memWriteTpMod;
reg [15:0] tpAddr;
reg [15:0] tpModAddr;
// Output
wire [15:0] tpOut;
wire [15:0] tpModOut;
// Bidirs
// Instantiate the UUT
StackMem UUT (
.CLK(CLK),
.DataIn(DataIn),
.memWriteTp(memWriteTp),
.memWriteTpMod(memWriteTpMod),
.tpAddr(tpAddr),
.tpOut(tpOut),
.tpModOut(tpModOut),
.tpModAddr(tpModAddr)
);
// Initialize Inputs
initial begin
CLK = 0;
DataIn = 0;
memWriteTp = 0;
memWriteTpMod = 0;
tpAddr = 0;
34
tpModAddr = 0;
#100
DataIn = 16;
memWriteTp = 1;
tpAddr = 2;
#10
tpAddr = 4;
DataIn = 24;
#10
tpAddr = 6;
DataIn = 28;
#10
tpAddr = 8;
DataIn = 30;
#10
tpAddr = 10;
DataIn = 31;
#10
DataIn = 28;
tpAddr = 12;
tpModAddr = 8;
#10
memWriteTp = 0;
end
always
#5 CLK = !CLK;
endmodule
java.io.FileNotFoundException;
java.io.FileReader;
java.io.PrintWriter;
java.util.Scanner;
java.util.logging.Level;
java.util.logging.Logger;
35
36
break;
case "sra": // 7
sb.append("00111");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
case "srl": // 8
sb.append("01000");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
case "sub": // 9
sb.append("0100100000000000");
break;
case "pui": // 10
sb.append("01010");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
case "slt": // 11
sb.append("0101100000000000");
break;
case "beq": // 12
sb.append("01100");
if (sc.hasNextInt()) {
// treat as normal address
imm = sc.nextInt();
imm = imm - PC;
sb.append(intTo11BitSignedBinary(imm));
} else if (sc.hasNext()) {
// this is a label, so translate it to an address
first
String label = sc.next();
sb.append(labelToAddress(label, true)); // right
logic?
}
break;
case "bne": // 13
sb.append("01101");
if (sc.hasNextInt()) {
// treat as normal address
imm = sc.nextInt();
imm = imm - PC;
sb.append(intTo11BitSignedBinary(imm));
} else if (sc.hasNext()) {
// this is a label, so translate it to an address
first
String label = sc.next();
sb.append(labelToAddress(label, true)); // right
logic?
}
break;
case "jump": // 14
sb.append("01110");
if (sc.hasNextInt()) {
37
case
case
case
case
case
case
case
case
}
break;
"load": // 16
sb.append("1000000000000000");
break;
"pushi": // 17
sb.append("10001");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
"pushv": // 18
sb.append("10010");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
"pmv": // 19
sb.append("10011");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
"pop": // 20
sb.append("10100");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
"store": // 21
sb.append("1010100000000000");
break;
"dup": // 22
sb.append("1011000000000000");
break;
"mtp": // 23
sb.append("10111");
imm = sc.nextInt();
38
sb.append(intTo11BitSignedBinary(imm));
break;
case "jumpto": // 24
sb.append("1100000000000000");
// how to account for changing value of PC?
break;
}
pw.println(sb.toString());
sb.replace(0, sb.length(), "");
}
pw.flush();
pw.close();
sc.close();
} catch (FileNotFoundException ex) {
Logger.getLogger("oops, no file!").log(Level.SEVERE, null, ex);
}
}
/**
* Converts the label in beq and bne to it's respective address
* */
public static String labelToAddress(String label, boolean isBranch)
throws Exception {
Scanner scan = new Scanner(new FileReader(input));
String s = scan.next();
int address = 0;
while (scan.hasNext()) {
if (s.equals(label) || s.substring(0, s.length() - 1).equals(label))
{
if (isBranch) {
return intTo11BitUnsignedBinary(PC - address);
} else {
return intTo11BitUnsignedBinary(address);
}
} else {
address++;
scan.nextLine();
if (scan.hasNext()) {
s = scan.next();
}
}
}
return null;
}
/**
* Converts integers
* */
public static String
String binary
StringBuilder
39
40