0% found this document useful (0 votes)
230 views

Processor Report

This document describes a stack-based processor called the Stackintosh. It includes details about the registers used, instruction formats, supported instructions, datapath components, and testing procedures. The processor was implemented in Xilinx and can calculate the greatest common divisor (GCD) using Euclid's algorithm. While affordable due to its simple design, the stack-based architecture results in reduced performance compared to non-stack processors due to frequent memory accesses.

Uploaded by

api-300429739
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
230 views

Processor Report

This document describes a stack-based processor called the Stackintosh. It includes details about the registers used, instruction formats, supported instructions, datapath components, and testing procedures. The processor was implemented in Xilinx and can calculate the greatest common divisor (GCD) using Euclid's algorithm. While affordable due to its simple design, the stack-based architecture results in reduced performance compared to non-stack processors due to frequent memory accesses.

Uploaded by

api-300429739
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

CSSE 232 FINAL REPORT:

The Stackintosh
Group 4F

Jack Porter, Alia Robinson, Melissa Thai, Jamie Zhou

Table of Contents
Introduction ..

Executive Summary ..

Registers Used ....

Instruction Format

RTL

Supported Instructions

Common Operations in Assembly

Procedure Call Conventions ..

Components for Datapath ..

Control Signals .

10

Datapath Diagram

11

Finite State Diagram

12

Xilinx Implementation ..

14

Testing Plan .

14

Determining GCD Using Euclids Algorithm ....

16

Performance ..

17

Device Utilization Summary .

18

Conclusion .

19

Appendices

20

Appendix A: Datapath Implementation in Xilinx

20

Appendix B: Control Module Implementation in Xilinx

21

Appendix C: Sample Waveform for Xilinx .

22

Appendix D: Test Benches ...

23

Appendix E: Java Implementation of Assembler ... 35

Introduction
Our processor, the Stackintosh, will perform all operations using a stack. Arguments are pushed
onto the stack before operations are called. These elements are then popped and replaced by the
value resulting from the instruction. The tp register keeps track of the top of this stack, and this
value is automatically adjusted to account for items being pushed and popped. The mtp
instruction can be used to explicitly move this pointer, which is useful for procedure calls. All
operations, unless otherwise specified, will add, remove, and/or manipulate items on the stack.

Executive Summary
This report introduces a new miniscule instruction set general purpose processor centered around
a stack-based architecture. Methods of analysis include a look into the types and purposes of
registered used, the instruction formats used, and the RTL and descriptions of all available
instructions. Once the specifications of the instruction set itself are established, the report will delve
deeper into how to perform common operations such as looping, pushing a value onto the stack,
and conditional statements using this instruction set. Next, an analysis of the hardware
implementation of a datapath that supports these instructions is provided. This includes
descriptions of the different components and control signals that monitor the flow of data, as well
as diagrams depicting a high level overview of the datapath itself and its corresponding finite state
diagram. A description of how to implement this datapath in the Xilinx ISE is provided and the
detailed testing plan used to ensure the validity of the processor is also provided. Finally, our
assembly implementation of determining the greatest common divisor (GCD) using Euclid's
Algorithm is then shown, followed by various performance metrics analyzing the efficiency of the
processor.
For a more detailed look at the design and implementation of our processor, such as the Xilinx
schematic, sample test benches, and the Java implementation of our assembler (with examples of
assembly-to-machine code conversions), refer to the Appendices section of this report.
This report finds that the Stackintosh, if it were to be commercially sold, would be very affordable to
the general public since it only has one dedicated register (tp). However, the primary drawback is
the fact that executing instructions in a stack-based architecture requires many memory accesses,
which reduces the overall performance of the processor.
Recommendations discussed include:
Getting the result of relative prime to display on the LCD display
Merging identical control states together
Reducing the number of control signals
Some limitations of this report include:
Although our processor successfully calculates relative prime using Euclids Algorithm, the
extent to which it has been tested for other procedures is limited, except for the simpler
tasks that must be done in order to calculate relative prime.
All performance data was collected using a simulated version of our processor in Xilinx.
This data may not hold for an actual, pure hardware version of the processor.
2

Registers Used
tp
Holds the address of the top of the stack.
B, C, D, E, F, G
Keep track of values in the datapath between cycles.
Display Reg.
Holds whatever value (in this case, the solution to relative prime) will be displayed to the LCD
screen of the FPGA board.

Instruction Format
M-type
In an M-type instruction, the 8-bit immediate refers to an address in memory.
Opcode

Address

(5 bits)

(11 bits)

I-type
In an I-type instruction, the 8-bit immediate refers to an immediate value.
Opcode

Immediate

(5 bits)

(11 bits)

RTL
Instruction

B = tp 2
G = tp - 2

C = Mem[tp]
D = Mem[B]

F = C op D

C = Mem[tp]

C = Mem[C]

Mem[tp] = C

E = PC +
SE(IR[10:0])

B = tp - 2

C = Mem[tp]
D = Mem[B]
G = tp 4

CD
tp = G
if(beq:(zero==1) ||
bne:(zero!=1)) PC = E

B = tp +
SE(IT[10:0])

D = Mem[B]
B = tp + 2
G = tp + 2

F=D+0
tp = G

Mem[B] = F

jal

F = PC
PC = PC[15:11] ||
ZE(IR[10:0])<<1

B = tp + 2
G = tp + 2

Mem[B] = F
tp = G

Mem[B] = C
tp = G

pushv <addr>

B = tp + 2
G = tp + 2
C=
Mem(ZE(IR[10:0])
)
C = Mem[tp]
G = tp - 2

Mem[ZE(IR[7:0])
]=C
tp = G

C = Mem[tp]
B = tp + 2
G = tp + 2

Mem[B] = C
tp = G

B = tp - 2

C = Mem[tp]
D = Mem[B]
G = tp - 4

pui

B = tp + 2
G = tp + 2
F=
ZE(IR[10:0])<<11

Mem[B] = F
tp = G

mtp

G = tp +
SE(IR[7:0])

tp = G

I-type
operation

C = Mem[tp]

F = C op
SE(Imm) OR
ZE(Imm)

Mem[tp] = F

B = tp + 2

F = SE(Imm)

Mem[B] = F

sll/sra/srl

C = Mem[tp]

F=
C<<ZE(IR[10:0])

Mem[tp] = F

jumpto

C = Mem[tp]
G = tp - 2

PC = C
tp = G

pushIO

G = tp + 2

tp = G

M-type
Operation
load
beq/bne

PC = PC[15:11] ||
ZE(IR[10:0])

jump
pmv <imm>

pop <addr>
IR = Mem[PC]
PC = PC + 1
dup

store

pushi

display

Mem[B] = F

F=D+0

Mem[C] = F

Mem[tp] = I/O
LEDout = I/O

LCDout = Mem[tp]

Supported Instructions
Category

Instruction

Description

Opcode

add

Adds the first two elements at the top of the stack, decrements tp
to tp - 2, and stores sum in new tp

00000

Adds a sign-extended immediate to element at the top of the


stack, stores sum in tp (replacing the value previously at this
position)

00001

ANDs the first two elements at the top of the stack, decrements
tp to tp - 2, and stores result in new tp

00010

ANDs immediate with element at the top of the stack, stores


result in tp (replacing the previous value)

00011

or

ORs the first two elements at the top of the stack, decrements tp
to tp - 2, and stores result in new tp

00100

ori <imm>

ORs immediate with element at the top of the stack, stores result
in tp (replacing the previous value)

00101

sll <imm>

Performs a logical shift left on the element at the top of the stack
by the distance indicated by immediate, replaces top element with
shifted element

00110

Performs set less than on the top two elements of the stack,
removing them and pushing a 1 onto the top of the stack if the
element in the higher position on the stack is less than the
element in the lower position on the stack. Otherwise, a 0 is
pushed onto the stack

01011

sra <imm>

Performs an arithmetic shift right on the element at the top of the


stack by the distance indicated by immediate, replaces top
element with shifted element

00111

srl <imm>

Performs a logical shift right on the element at the top of the stack
by the distance indicated by immediate, replaces top element with
shifted element

01000

Subtracts the element at tp from the element at tp - 2,


decrements tp to tp - 2, and stores sum in new tp

01001

beq <addr>

Branches to the specified address if the top two elements on the


stack are equal. Decrements tp to tp - 4

01100

bne <addr>

Branches to the specified address if the top two elements on the


stack are not equal. Decrements tp to tp - 4

01101

Moves the PC to the specified address

01110

Pushes the return address to the top of the stack and moves the
PC to the specified address

01111

Moves the PC to the address at the top of the stack and


decrements tp by 2

11000

dup

Makes a copy of the value at the top of the stack and pushes it
back onto the stack

10110

load

Assumes that the value at the top of the stack is a valid address
in memory. This address is replaced with the value stored at this
address

10000

Increases tp by the value of the sign extended immediate.


Assumes the user will not give an invalid immediate

10111

addi <imm>
and
andi <imm>

Arithmetic and
Logical
Instructions

slt

sub

Branch
Instructions

jump <addr>
Jump
Instructions

jal <addr>
jumpto

Stack
Instructions

mtp <imm>
peek

Pseudoinstruction storing the top element to the specified


memory address. The assembler reads this as dup; pop
<Address>

pmv <imm>

Push memory value takes the value at the address of the tp


plus the sign extended immediate and pushes this value to the
top of the stack

10011

pop <addr>

Stores the element at the top of the stack to the specified memory
address, decrements tp to tp - 2.

10100

Pushes a 16-bit value onto the stack with the upper 5 bits
consisting of the last 5 bits of the 11-bit immediate. The lower 11
bits will consist of zeroes.

01010

pushi <imm>

Increments tp to tp + 2, and pushes the specified immediate to


the new top of the stack.

10001

pushv <addr>

Increments tp to tp + 2, and pushes the value at the specified


address to the new top of the stack.

10010

store

Stores the element at tp - 2 at the address at the top of the stack.


(Assumes Mem[tp] is a valid address in memory)

10101

pushIO

Pushes whatever is in the input onto the top of the stack,


increments tp to tp + 2

11001

pui <imm>

Input/Output
display

Writes whatever is at the top of the stack into the LCDout register

11010

Common Operations in Assembly


#Assume the first instruction is at 0x0000

Conditional statements:
pushv 0x0FF
comparison.
pushv Ox0FF
comparison.
beq <newAddr>
of the stack.

#push the address of the first value for


#push the address of the second value for
#move the PC to newAddr, remove the top two values

Loading an address onto the stack:


pushi 0x088 #Assume 0x88. Pushes the value 0x88 onto the stack

Iteration:
The following code demonstrates a procedure to (inefficiently) multiply a value by ten. Let
AddressA = 0x0002, AddressB = 0x0004, and AddressC = 0x0006

mtp 10
pushi 0
pushi 69
pushi 10
pop 4
pop 2
pop 6
timesten:
pushv 2
pushv 6
add
pop 6
pushv 4
addi -1
dup
pop 4
pushi 1
bne timesten

#moves tp up to create storage space


#push 0 for the initial product
#push the number we want to multiply
#initializes the iteration counter
#stores the counter at address 4
#stores the multiplicand at address 2
#stores the product at address 6
#pushes the value at address 2
#pushes the value at address 6
#pops the top two values and pushes the sum
#pops the sum and stores it at address 6
#pushes the iteration counter
#subtracts 1 from the iteration counter
#duplicate the iteration counter
#store the iteration counter at address 4
#pushes the value 1
#branches to the address of timesten

Procedure Call Conventions


Before a function is called, the parameters are pushed onto the stack one at a time. The first
argument specified is the first to go onto the stack. The return address, if applicable is the last thing
to go onto the stack. When the function is complete, the only thing that should be on the stack is
the value returned by the function.
For example, the following code recursively computes the nth fibonacci number:
pushi n
jal fib
<next>

#Push index of the Fibonacci number were looking for


#Jump to the fib method

pmv -2
pushi 0
bne 1
jumpto
pmv -2
pushi 1
bne 1
jumpto
pmv -2
addi -1
jal fib
pmv -4
addi -2
jal fib
add
mtp -6
pmv 6
mtp 2
jumpto

#Push the value of n


#Push 0 for comparison
#If n != 0, skip over next instruction
#n = 0, so return n.
#Push the value of n
#Push 1 for comparison
#If n != 1, skip over next instruction
#n = 1, so return n
#Push the value of n
#Subtract 1 from n
#Call fib(n-1)
#Push the value of n
#Subtract 2 from n
#Call fib(n-2)
#Add the results of fib(n-1) and fib(n-2)
#Move the stack pointer down by 6
#Push the value of fib(n-1)+fib(n-2) where n was
#Move tp to point to the return address
#Return fib(n-1)+fib(n-2)

fib:

Components for Datapath


Component

Characteristics

Description

16-bit input from PC

two 16-bit address inputs

Instruction
Register (IR)

16-bit input from memory


16-bit data input

tp Register

16-bit storage

These registers provide storage for values after the


corresponding instructions are retrieved from memory.
Each uses a basic 16-bit input and output.

16-bit input A

The ALU performs 5 basic arithmetic and logical


operations: and, or, add, sub, and set less than. The
ALU may take input from registers A and B, or they
may use immediate values as their inputs.

Sign Extender
(SE)

11 bit immediate input

Zero Extender
(ZE)

11 bit immediate input

16 bit value input

Barrel Shifter

6 bit op code ( instr


format indicator + op
code) input

x bit decoded control


signal output

Instruction
Memory

Stack Memory

Registers B, C, D,
E, F, G, Display
Reg.

ALU

Control

16-bit instruction output


16-bit data input

Stores the instructions to be executed. They are


retrieved in sequence.
All memory operations require an address. Stores
require data input and one of the write signals must be
1. Loads will output 16 bits of data.

two 16-bit data outputs


1-bit MemWritetTp signal
1-bit MemWriteTpMod
signal
Stores the current 16-bit instruction.

16-bit data output


Stores the address of the current top of the stack

16-bit data output


1-bit RegWrite signal

16-bit input B
3-bit ALUOp input
16-bit output
1-bit overflow detector
1-bit zero detector
16 bit immediate output

16 bit immediate output

16 bit shift amount input

Sign extends the 11 bit immediate input and outputs


the 16 bit result
Zero extends the 1 bit immediate input and outputs the
16 bit result
Bit shifts the input value by the shift amount so its
aligned and can be added for jumps or branches.

16 bit value output


Control unit takes a 6 bit op code and outputs control
bits

Control Signals
Control Signal
PCWrite
PCSrc[1:0]
PCWriteCond
BranchType

Description
Allows the target address to be written to the PC
Indicates the source of the value being written to the PC
Allows the target address to be written to the PC if the condition is met (that is, if the output of the
ALU is zero)
Indicates whether the branch instruction is a bne or a beq

ALUsrcA[1:0]

Sets the source for the first ALU input (a value retrieved from memory, the PC address, tp, etc.)

ALUsrcB[2:0]

Sets the source for the second ALU input a value retrieved from memory, an immediate value, etc.)

ALUOp[2:0]
MemLoad1[1:0]
MemLoad2
MemWriteTp
MemWriteTpMod
RegWrite

Determines the operation that the ALU must perform (and, or, add, sub, slt)
Sets the source for the first address input to the stack memory
Sets the source for the data input to the stack memory
Enables values to be written to the the address in the first memory input (tp)
Enables values to be written to the the address in the second memory input (modified tp, usually tp 2)
Enables the input to tp to be written to the register

MemValSrc

Determines the source of the data being stored in the F register

ShiftSrc[1:0]

Determines the source input to the shifter (Memory output or a zero extended immediate)

ShiftVal[1:0]

Determines the value that the number must be shifted by (either an immediate value or a constant)

WriteB

Determines whether data is being written to the B register

WriteE

Determines whether data is being written to the E register

WriteF

Determines whether data is being written to the F register

IRWrite

Determines whether data is being written to instruction register

LCDWrite

Determines whether data is being written to LCD screen

10

Datapath Diagram

11

Finite State Diagram

12

13

Xilinx Implementation
*assumes that each component has already been created and functional
The datapath begins with a PC counter, a register that will hold the current address. PC has a wire
output heading to the instruction memory, a 4 bit ALUInputA mux and a concatenator that takes the
top 5 bits (this is used for jump instructions). The instruction memory takes the PC and outputs a
16 bit instruction that goes into a instruction register that splits the instruction into 2 outputs. The
top 5 bits represent the opcode, and they will head to a control unit. The last 11 bits are output and
head into a sign extender and a zero extender, both heading into ALUInputB mux. The zero
extender splits into two separate muxes that head into a barrel shifter, which connects into the
ALUInputB mux, a mux that follows into the value you want to write to memory, and the address
concatenator that PC initially connects to.
The tp register output will head into ALUInputA mux and the memory blocks first address
input. ALUInputA and ALUInputB (which, in addition to the inputs already stated, will have a 2, -2,
and -4 immediate inputs) naturally head into the ALU.
The ALU output branches off into many wires. This output can be written to registers B, E, and G,
or it may be written to memory. Memory will access two separate values based on two separate
address inputs. The first top output will head into the temporary register C that feeds back into the
ALUInputA mux and the top mux that connects into the barrel shifter. The second bottom output
connects to the temporary D register and the ALUInputB mux.

Testing Plan
Initial testing phase COMPLETE
- Ran through our data path with various instructions and made sure there werent any timing
conflicts with component usage, and that every value ended up as it should be (PC ends up as
PC+1, stack memory has the new values stored, etc)
Component testing COMPLETE
- Design the individual components necessary in our design, then write test benches for each
one of them individually before combining them. Extra time was devoted to testing our 2 address
access memory, which will be our first major independent design.
Instruction testing - COMPLETE
- After individual components are verified working, well start connecting individual components
for instruction testing. Well start creating larger subsystems to test, those subsystems being:
1. PC + ALU to ensure PC+1 functions
2. PC + instr memory + instr register to make sure instruction codes output correctly
3. Barrel shifter and associated muxes/zero extender to make sure that it will shift correctly
and all the control signals work as intended.
4. Step 2 + tp reg + control + barrel shifter to make sure that each part of the instruction code
split correctly. Might remove the reg file, but precautionary testing just in case.

14

5. Step 4 + ALU and its associated input muxes, with immediates. Test basic calculations as
well as ensure tp-2 will store itself back in the reg file successfully. Also test jumping to
make sure address concatenation and PC write control signals work.
6. Step 5 + Memory, completing the datapath. Begin testing more advanced instructions that
require memory accesses, beginning with basic stack modifications (adding and subtracting
values from the stack), working to more advanced instructions like branching and pmv.
Real world testing - COMPLETE
- After each instruction is verified working independently, well start writing code for our
processor to see if our instructions will actually work in sequence, to make sure there arent any
issues with stack management in between instructions.
Control Signal Testing
In order to test our control module, well create a test bench that feeds various control codes into
the control module input and monitor what is output to each control signal. The control codes well
feed in are add (000001), sub ( 010001), load (001101), beq (001010), and jump (001100). The
first two are two separate R-type instructions that will test to make sure ALUop controls properly,
and the rest are the other 3 unique control cases. The actual control module will be a state
machine that should output a certain sequence of control signals each clock cycle (refer to our
state diagram for specifics), and we will ensure that the waveforms match the expected. Well also
feed nonsense control signals into the control module to make sure that it doesnt output any faulty
information.

15

Determining GCD using Euclids Algorithm


#Assume that the proper conventions have been followed when calling
this procedure.
#This implies that the argument n has been pushed onto the stack,
followed by the
#return address.
relPrime:
pushi 2
whileLoop:
pmv -4
pmv -2
jal gcd

#pushes on the initial value of m


#pushes n to the top of the stack
#pushes m to the top of the stack
#pushes the return address, jumps to gcd

#Assume when this returns, n, m, and ra have been removed.


#The top of the stack points to the returned gcd.
pushi 1
beq returnPrime
addi 1
jump whileLoop
returnPrime:
mtp -6
pmv 6
mtp 2
jumpto

#pushes 1 to top
#compare the gcd and 1, go to return if ==
#add 1 to m
#jump to beginning of loop
#move tp to the stack position before n
#pushthe m onto the stack where n was
#move the tp to the return address
#return from this procedure call

#Assume that the stack now contains the arguments a and b, followed
#by the return address
gcd:
pmv -4
pushi 0
bne loop

#push a to the top of stack


#push 0 on top of stack
#if a != 0 go to the loop

returnB:
mtp -6
pmv 4
pmv 4
jumpto

#move tp below a
#push b on to overwrite a
#push the return address on and overwrite b
#return from this procedure call

loop:
pmv -2
pushi 0
beq returnA

#push b to top of stack


#push 0 to top of stack
#if b == 0 go to returnA to exit loop

pmv -4
pmv -4
slt
pushi 1
bne else
pmv -2
pmv -6

#push a onto the top of stack


#push b onto the top of stack
#check if a < b
#push 1 to top of stack
#check if 1 == (a < b)
#push b to the top of the stack
#push a to the top of the stack

16

sub
mtp -8
pmv 8
mtp 4
jump loop

#subtract a - b
#move tp below a
#push a - b to tp + 2 to overwrite a
#move tp back to the return address
#jump to loop start

else:
pmv -4
pmv -4
sub
mtp -6
pmv 6
mtp 2
jump loop

#push a to the top of the stack


#push b to the top of the stack
#subtract b - a
#move tp below the mem address of a
#push b - a to tp + 2
#move tp to the return address
#jump to loop start

returnA:
mtp -4
pmv 4
jumpto

#move tp below mem address of a


#push the return address to mem addr of b
#jump to return address to end gcd call

Performance
Euclids algorithm for determining gcd takes 32 lines of assembly code, which is 64 bytes. The
complete algorithm for finding a relative prime, which includes the gcd function, takes 44 lines of
code, which is 88 bytes. The relPrime function requires 4 bytes of storage space in memory, plus
an additional 10 bytes for its call to the gcd function. Therefore there must be a total of 14 bytes
free in the stack memory to successfully run this algorithm.
When relPrime is called with the value 0x13B0 as the argument, a total of 152968 instructions are
executed. The values in the following table are based solely on running the relPrime algorithm with
this value.
Operation
M-type operation
I-type operation
load
store
beq/bne
jump
jumpto
jal
pushv/pui/pop
pushi
pmv
dup
mtp
sll/sra/srl
I/O instruction

Number of Cycles
5
4
4
5
5
3
4
4
3
4
5
3
3
4
2

Executions in relPrime
20374
9
0
0
20404
10196
11
10
0
20405
61173
0
20386
0
0

Frequency
0.133
5.88*10-5
0
0
0.133
0.067
7.19*10-5
6.54*10-5
0
0.133
0.400
0
0.133
0
0

17

Our processor takes a total of 683241 cycles to execute this function under this condition.
Therefore, our average CPI is 4.467.
According to the Synthesis Report generated by Xilinx, our cycle time is 67.524ns. Therefore, the
execution time for the algorithm with this input is 46.15 milliseconds. This report also included the
following information relevant to our design.

Device Utilization Summary


Selected Device : 3s500efg320-4
Number of Slices:
Number of Slice Flip Flops:
Number of 4 input LUTs:
Number used as logic:
Number used as RAMs:
Number of IOs:
Number of bonded IOBs:
IOB Flip Flops:
Number of BRAMs:
Number of GCLKs:

1918 out of 4656 41%


247 out of 9312 2%
3487 out of 9312 37%
1439
2048
182
182 out of 232 78%
14
1 out of 20 5%
2 out of 24 8%

18

Conclusion
At the end of the design and implementation process of this project, our group has been successful in creating a
stack-based processor that can perform relative prime calculations using Euclids Algorithm. However, with a CPI
of 4.467, a cycle time of 67.52 ns, and with an algorithm execution time of 46.15 ms, our performance data
demonstrates why such stack machines are rarely extensively used in industry anymore. The primary
performance disadvantage to the Stackintosh is the fact that it requires significantly more memory references,
despite the fact that our design only requires one dedicated register (tp). Despite these challenges, if the
Stackintosh were to be commercially sold, it would be much more affordable than many of the processors that
are currently on the market, since this processor has only the one dedicated register.
Several improvements can be made to our processor to either improve performance or enhance human
interaction. Namely, these improvements are getting the result of the relative prime calculations to display on
the LCD display of the FPGA boards, merging identical control states together, and reducing the number of
control signals.
Overall, we faced several challenges including minor issues with timing that were not accounted for. Most of
these were fixed with control signals to allow writing to specific registers. Also, creating our own double
memory to simulate a single memory with two outputs raised several issues, as it was difficult to ensure that
inputting a value into the double memory module inputs the same value in each of the single memories that
comprised it. In addition to this, all of the FPGA boards in the CS lab behaved differently with our I/O
components (and one of them even got very hot, and started melting the components), which leads our team to
think that most, if not all, of the FPGA boards are faulty.
In conclusion, despite the relatively low performance speed, we feel that the Stackintosh is a highly capable
processor that can support many different types of instructions.

19

Appendices
Appendix A: Datapath Implementation in Xilinx

20

Appendix B: Control Module Implementation in Xilinx

21

Appendix C: Sample Waveform for Datapath

22

Appendix D: Test Benches


16-bit ALU
// Verilog test fixture created from schematic
/home/robinsat/Documents/csse232/1415a-csse232-robinsat/lab06/alu/alu.sch - Wed
Oct 29 21:45:11 2014
`timescale 1ns / 1ps
module alu_alu_sch_tb();
// Inputs
reg [2:0] op;
reg [15:0] b;
reg [15:0] a;
// Output
wire ovfl;
wire zero;
wire co;
wire [15:0] R;
// Bidirs
// Instantiate the UUT
alu UUT (
.ovfl(ovfl),
.zero(zero),
.co(co),
.op(op),
.b(b),
.a(a),
.R(R)
);
// Initialize Inputs
initial begin
op = 0;
b = 0;
a = 0;
#100;
op = 0;
a = 0;
b = 0;
#10;
a = 32767;
b = 32767;
#10;
a = 0;
b = 32767;

23

#10;
a = 32767;
b = 0;
#10;
a = 1;
b = 32767;
#10;
a = 16;
b = 42;
#10;
a = -1;
b = -1;
#10;
a = -5;
b = 16;
#10;
a = -32768;
b = -32768;
#10;
a = -32768;
b = 32767;
#100;
op = 1;
a = 0;
b = 0;
#10;
a = 32767;
b = 32767;
#10;
a = 0;
b = 32767;
#10;
a = 32767;
b = 0;
#10;
a = 1;
b = 32767;
#10;
a = 16;
b = 42;
#10;
a = -1;
b = -1;

24

#10;
a = -5;
b = 16;
#10;
a = -32768;
b = -32768;
#10;
a = -32768;
b = 32767;
#100;
op = 2;
a = 0;
b = 0;
#10;
a = 32767;
b = 32767;
#10;
a = 0;
b = 32767;
#10;
a = 32767;
b = 0;
#10;
a = 1;
b = 32767;
#10;
a = 16;
b = 42;
#10;
a = -1;
b = -1;
#10;
a = -5;
b = 16;
#10;
a = -32768;
b = -32768;
#10;
a = -32768;
b = 32767;
#100;
op = 7;

25

a = 0;
b = 0;
#10;
a = 32767;
b = 32767;
#10;
a = 0;
b = 32767;
#10;
a = 32767;
b = 0;
#10;
a = 1;
b = 32767;
#10;
a = 16;
b = 42;
#10;
a = -1;
b = -1;
#10;
a = -5;
b = 16;
#10;
a = -32768;
b = -32768;
#10;
a = -32768;
b = 32767;
#100;
op = 00;
a = 0;
b = 0;
#10;
a = 32767;
b = 32767;
#10;
a = 0;
b = 32767;
#10;
a = 32767;
b = 0;

26

#10;
a = 1;
b = 32767;
#10;
a = 16;
b = 42;
#10;
a = -1;
b = -1;
#10;
a = -5;
b = 16;
#10;
a = -32768;
b = -32768;
#10;
a = -32768;
b = 32767;
end
endmodule

Barrel Shifter
`timescale 1ns / 1ps
////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date:
16:13:29 11/10/2014
// Design Name:
BarrelShifter
// Module Name:
C:/Users/robinsat/Documents/Courses/CSSE232/UpdatedImplementation/Datapath/barrelt
b.v
// Project Name: Datapath
// Target Device:
// Tool versions:
// Description:
//
// Verilog Test Fixture created by ISE for module: BarrelShifter
//
// Dependencies:
//
// Revision:

27

// Revision 0.01 - File Created


// Additional Comments:
//
////////////////////////////////////////////////////////////////////////////////
module barreltb;
// Inputs
reg [15:0] barrelIn;
reg [15:0] shifter;
// Outputs
wire [15:0] barrelsllOut;
wire [15:0] barrelsrlOut;
wire [15:0] barrelsraOut;
// Instantiate the Unit Under Test (UUT)
BarrelShifter uut (
.barrelIn(barrelIn),
.shifter(shifter),
.barrelsllOut(barrelsllOut),
.barrelsrlOut(barrelsrlOut),
.barrelsraOut(barrelsraOut)
);
initial begin
// Initialize Inputs
barrelIn = 0;
shifter = 0;
// Wait 100 ns for global reset to finish
#100;
// Add stimulus here
barrelIn = -256;
shifter = 4;
end
endmodule

Control
`timescale 1ns / 1ps
////////////////////////////////////////////////////////////////////////////////
// Company:
// Engineer:
//
// Create Date:
20:26:11 11/18/2014
// Design Name:
control

28

// Module Name:
C:/Users/robinsat/Documents/Courses/CSSE232/IODatapath3/Datapath/control_tb.v
// Project Name: Datapath
// Target Device:
// Tool versions:
// Description:
//
// Verilog Test Fixture created by ISE for module: control
//
// Dependencies:
//
// Revision:
// Revision 0.01 - File Created
// Additional Comments:
//
////////////////////////////////////////////////////////////////////////////////
module control_tb;
// Inputs
reg [4:0] Opcode;
reg CLK;
reg Start;
reg Reset;
// Outputs
wire [2:0] ALUop;
wire [1:0] ALUsrcA;
wire [2:0] ALUsrcB;
wire RegWrite;
wire MemWriteTp;
wire MemWriteTpMod;
wire PCWrite;
wire PCWriteCond;
wire [2:0] PCSrc;
wire [1:0] MemLoad1;
wire MemLoad2;
wire [1:0] MemValSrc;
wire [1:0] ShiftSrc;
wire [1:0] ShiftVal;
wire BranchType;
wire [1:0] shiftType;
wire tpInit;
wire WriteB;
wire WriteE;
wire WriteF;
wire IRwrite;
wire [5:0] current_state;
wire [5:0] next_state;
// Instantiate the Unit Under Test (UUT)
control uut (
.ALUop(ALUop),
.ALUsrcA(ALUsrcA),
.ALUsrcB(ALUsrcB),
.RegWrite(RegWrite),
.MemWriteTp(MemWriteTp),

29

.MemWriteTpMod(MemWriteTpMod),
.PCWrite(PCWrite),
.PCWriteCond(PCWriteCond),
.PCSrc(PCSrc),
.Opcode(Opcode),
.MemLoad1(MemLoad1),
.MemLoad2(MemLoad2),
.MemValSrc(MemValSrc),
.ShiftSrc(ShiftSrc),
.ShiftVal(ShiftVal),
.BranchType(BranchType),
.shiftType(shiftType),
.tpInit(tpInit),
.WriteB(WriteB),
.WriteE(WriteE),
.WriteF(WriteF),
.IRwrite(IRwrite),
.current_state(current_state),
.next_state(next_state),
.CLK(CLK),
.Start(Start),
.Reset(Reset)
);
initial begin
// Initialize Inputs
Opcode = 0;
CLK = 0;
Start = 1;
Reset = 0;
//
//
//
//

// Wait 150 ns for global reset to finish


#150;
Opcode = 0; // add
#20
Start = 0;

//
//
//
//

#150;
Opcode = 1; // addi
#20
Start = 0;

//
//
//
//

#150; //
Opcode = 2; // and
#20
Start = 0;

//
//
//
//

#150; //
Opcode = 3; // andi
#20
Start = 0;

//
//
//
//

#150; // **
Opcode = 4; // or
#20
Start = 0;

30

//
//
//
//

#150; // ~~
Opcode = 5; // ori
#20
Start = 0;

//
//
//
//

#150; //
Opcode = 6; // sll
#20
Start = 0;

//
//
//
//

#150; //
Opcode = 7; // sra
#20
Start = 0;

//
//
//
//

#150; //
Opcode = 8; // srl
#20
Start = 0;

//
//
//
//

#150; // **
Opcode = 9; // sub
#20
Start = 0;

//
//
//
//

#150; // ~~
Opcode = 10; // pui
#20
Start = 0;

//
//
//
//

#150; // **
Opcode = 11; // slt
#20
Start = 0;

//
//
//
//

#150;
Opcode = 12; // beq
#20
Start = 0;

//
//
//
//

#150;
Opcode = 13; // bne
#20
Start = 0;

//
//
//
//

#150;
Opcode = 14; // jump
#20
Start = 0;

//
//
//
//

#150;
Opcode = 15; // jal
#20
Start = 0;

//
//

#150;
Opcode = 16; // load

31

//
//

#20
Start = 0;

//
//
//
//

#150;
Opcode = 17; // pushi
#20
Start = 0;

//
//
//
//

#150;
Opcode = 18; // pushv
#20
Start = 0;

//
//
//
//

#150;
Opcode = 19; // pmv
#20
Start = 0;

//
//
//
//

#150;
Opcode = 20; // pop
#20
Start = 0;

//
//
//
//

#150;
Opcode = 21; // store
#20
Start = 0;

//
//
//
//

#150;
Opcode = 22; // dup
#20
Start = 0;

//
//
//
//

#150;
Opcode = 23; // mtp
#20
Start = 0;
end

always #10 CLK = !CLK;

endmodule

Datapath
// Verilog test fixture created from schematic
C:\Users\robinsat\Documents\Courses\CSSE232\UpdatedImplementation\Datapath\Datapat
h.sch - Sun Nov 09 17:06:40 2014
`timescale 1ns / 1ps
module Datapath_Datapath_sch_tb();

32

// Inputs
reg CLK;
reg StartInput;
reg memCLK;
// Output
wire [15:0] IR;
wire [15:0] ShiftSource;
wire [15:0] ShiftValue;
wire [15:0] PCout;
wire [15:0] tpCurrent;
wire [2:0] ALUop;
wire [5:0] next_state;
wire [5:0] current_state;
wire [15:0] memoutC;
wire [15:0] memoutD;
wire [15:0] RegEContents;
wire [15:0] ALUoutput;
wire [15:0] ShiftOut;
wire [15:0] memInput;
wire [15:0] memAddr2;
wire [15:0] memAddr1;
wire [15:0] ALUinB;
wire [15:0] ALUinA;
// Bidirs
// Instantiate the UUT
Datapath UUT (
.CLK(CLK),
.IR(IR),
.ShiftSource(ShiftSource),
.ShiftValue(ShiftValue),
.PCout(PCout),
.tpCurrent(tpCurrent),
.ALUop(ALUop),
.next_state(next_state),
.current_state(current_state),
.StartInput(StartInput),
.memoutC(memoutC),
.memoutD(memoutD),
.RegEContents(RegEContents),
.ALUoutput(ALUoutput),
.memCLK(memCLK),
.ShiftOut(ShiftOut),
.memInput(memInput),
.memAddr2(memAddr2),
.memAddr1(memAddr1),
.ALUinB(ALUinB),
.ALUinA(ALUinA)
);
// Initialize Inputs
initial begin
CLK = 0;
memCLK = 0;
StartInput = 1;

33

#20
StartInput = 0;
end
always
#5 memCLK = !memCLK;
always
#10 CLK = !CLK;
endmodule

Stack Memory
// Verilog test fixture created from schematic
C:\Users\robinsat\Documents\Courses\CSSE232\UpdatedImplementation\Datapath\StackMe
m.sch - Sun Nov 09 14:19:30 2014
`timescale 1ns / 1ps
module StackMem_StackMem_sch_tb();
// Inputs
reg CLK;
reg [15:0] DataIn;
reg memWriteTp;
reg memWriteTpMod;
reg [15:0] tpAddr;
reg [15:0] tpModAddr;
// Output
wire [15:0] tpOut;
wire [15:0] tpModOut;
// Bidirs
// Instantiate the UUT
StackMem UUT (
.CLK(CLK),
.DataIn(DataIn),
.memWriteTp(memWriteTp),
.memWriteTpMod(memWriteTpMod),
.tpAddr(tpAddr),
.tpOut(tpOut),
.tpModOut(tpModOut),
.tpModAddr(tpModAddr)
);
// Initialize Inputs
initial begin
CLK = 0;
DataIn = 0;
memWriteTp = 0;
memWriteTpMod = 0;
tpAddr = 0;

34

tpModAddr = 0;
#100
DataIn = 16;
memWriteTp = 1;
tpAddr = 2;
#10
tpAddr = 4;
DataIn = 24;
#10
tpAddr = 6;
DataIn = 28;
#10
tpAddr = 8;
DataIn = 30;
#10
tpAddr = 10;
DataIn = 31;
#10
DataIn = 28;
tpAddr = 12;
tpModAddr = 8;
#10
memWriteTp = 0;
end
always
#5 CLK = !CLK;
endmodule

Appendix E: Java Implementation of Assembler


import
import
import
import
import
import

java.io.FileNotFoundException;
java.io.FileReader;
java.io.PrintWriter;
java.util.Scanner;
java.util.logging.Level;
java.util.logging.Logger;

public class Assembler {


final static String base = System.getProperty("user.dir");
final static String separator = "\\";
final static String input = base + separator + "assembler.in";

35

final static String output = base + separator + "assembler.mif";


static int PC = 0;
public static void main(String[] args) throws Exception {
try {
Scanner sc = new Scanner(new FileReader(input));
PrintWriter pw = new PrintWriter(output);
StringBuilder sb = new StringBuilder();
// translate assembly to machine code
while (sc.hasNext()) {
String instruction = sc.next();
// skip over comments
if (instruction.charAt(0) == '#') {
if (sc.hasNext()) {
sc.nextLine();
}
if (sc.hasNext()) {
instruction = sc.next();
}
}
int imm;
PC++; // increment PC
switch (instruction) {
case "add": // 0
sb.append("0000000000000000");
break;
case "addi": // 1
sb.append("00001");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
case "and": // 2
sb.append("0001000000000000");
break;
case "andi": // 3
sb.append("00011");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
case "or": // 4
sb.append("0010000000000000");
break;
case "ori": // 5
sb.append("00101");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
case "sll": // 6
sb.append("00110");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));

36

break;
case "sra": // 7
sb.append("00111");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
case "srl": // 8
sb.append("01000");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
case "sub": // 9
sb.append("0100100000000000");
break;
case "pui": // 10
sb.append("01010");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
case "slt": // 11
sb.append("0101100000000000");
break;
case "beq": // 12
sb.append("01100");
if (sc.hasNextInt()) {
// treat as normal address
imm = sc.nextInt();
imm = imm - PC;
sb.append(intTo11BitSignedBinary(imm));
} else if (sc.hasNext()) {
// this is a label, so translate it to an address
first
String label = sc.next();
sb.append(labelToAddress(label, true)); // right
logic?
}
break;
case "bne": // 13
sb.append("01101");
if (sc.hasNextInt()) {
// treat as normal address
imm = sc.nextInt();
imm = imm - PC;
sb.append(intTo11BitSignedBinary(imm));
} else if (sc.hasNext()) {
// this is a label, so translate it to an address
first
String label = sc.next();
sb.append(labelToAddress(label, true)); // right
logic?
}
break;
case "jump": // 14
sb.append("01110");
if (sc.hasNextInt()) {

37

// treat as normal address


imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
} else if (sc.hasNext()) {
// this is a label, so translate it to an address
first
String label = sc.next();
sb.append(labelToAddress(label, false));
}
break;
case "jal": // 15
sb.append("01111");
if (sc.hasNextInt()) {
// treat as normal address
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
} else if (sc.hasNext()) {
// this is a label, so translate it to an address
first
String label = sc.next();
sb.append(labelToAddress(label, false));

case

case

case

case

case

case

case

case

}
break;
"load": // 16
sb.append("1000000000000000");
break;
"pushi": // 17
sb.append("10001");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
"pushv": // 18
sb.append("10010");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
"pmv": // 19
sb.append("10011");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
"pop": // 20
sb.append("10100");
imm = sc.nextInt();
sb.append(intTo11BitSignedBinary(imm));
break;
"store": // 21
sb.append("1010100000000000");
break;
"dup": // 22
sb.append("1011000000000000");
break;
"mtp": // 23
sb.append("10111");
imm = sc.nextInt();

38

sb.append(intTo11BitSignedBinary(imm));
break;
case "jumpto": // 24
sb.append("1100000000000000");
// how to account for changing value of PC?
break;
}
pw.println(sb.toString());
sb.replace(0, sb.length(), "");
}
pw.flush();
pw.close();
sc.close();
} catch (FileNotFoundException ex) {
Logger.getLogger("oops, no file!").log(Level.SEVERE, null, ex);
}
}
/**
* Converts the label in beq and bne to it's respective address
* */
public static String labelToAddress(String label, boolean isBranch)
throws Exception {
Scanner scan = new Scanner(new FileReader(input));
String s = scan.next();
int address = 0;
while (scan.hasNext()) {
if (s.equals(label) || s.substring(0, s.length() - 1).equals(label))
{
if (isBranch) {
return intTo11BitUnsignedBinary(PC - address);
} else {
return intTo11BitUnsignedBinary(address);
}
} else {
address++;
scan.nextLine();
if (scan.hasNext()) {
s = scan.next();
}
}
}
return null;
}
/**
* Converts integers
* */
public static String
String binary
StringBuilder

to its respective signed 11-bit equivalent


intTo11BitSignedBinary(int n) {
= Integer.toBinaryString(n);
sb = new StringBuilder(binary);

39

while (sb.length() < 11) {


sb.insert(0, "0");
}
while (sb.length() > 11) {
sb.delete(0, 1);
}
return sb.toString();
}
/**
* Converts integers to its respective unsigned 11-bit equivalent; assumes
* input is a positive integer
* */
public static String intTo11BitUnsignedBinary(int n) {
String binary = Integer.toBinaryString(n);
StringBuilder sb = new StringBuilder(binary);
while (sb.length() < 11) {
sb.insert(0, "0");
}
return sb.toString();
}
}

40

You might also like