0% found this document useful (0 votes)
125 views

Eee446 Lab Manual Spring2015

This document describes an experiment involving designing an 8-bit integer arithmetic processor datapath. Students are tasked with designing modular components like an ALU, registers, and multiplexers in VHDL and connecting them to build the full datapath. They are also asked to modify an existing Booth multiplier control unit to support the new datapath signals and operations. The goal is to simulate and validate the design before implementing it on an FPGA development board.

Uploaded by

vognar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views

Eee446 Lab Manual Spring2015

This document describes an experiment involving designing an 8-bit integer arithmetic processor datapath. Students are tasked with designing modular components like an ALU, registers, and multiplexers in VHDL and connecting them to build the full datapath. They are also asked to modify an existing Booth multiplier control unit to support the new datapath signals and operations. The goal is to simulate and validate the design before implementing it on an FPGA development board.

Uploaded by

vognar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

METU NORTHERN CYPRUS CAMPUS

Computer Architecture II
EEE446 LABORATORIES
Ali Muhtarolu

Spring 2015

Regulations:

Students are not permitted to perform an experiment without doing the preliminary work before
coming to the laboratory. It is not allowed to do the preliminary work at the laboratory during the
experiment.

No food or drink in the lab.

Initial experiments will be done individually. You will form groups in the CPU design portions.

Cheating is not tolerated in this laboratory. Plagiarism is a form of cheating as is using someone
elses written word with minor changes and no attribution. If you are caught cheating, you will, at the
very least, receive a zero for that preliminary work.

Students who miss the lab 3 times without a valid excuse get zero as the EEE-446 laboratory
portion of the course grade.

Those who fail to get a satisfactory score from the laboratory portion will fail the class. This score
will be finalized later, but is expected to be around 70%.

EXPERIMENT #1
INTRODUCTION TO COMPUTER ARCHITECTURE LABORATORY:
8-BIT BOOTHS MULTIPLIER DESIGN
1.1 OBJECTIVE
The purpose of the first laboratory exercise is to implement an 8-bit signed integer multiplier in
hardware using Booths algorithm. The lab exercise will also be used as a vehicle for getting familiar
with Altera Quartus II Software, Cyclone II Field Programmable Gate Array (FPGA) family,
programming interface, switch inputs, LED outputs, and clocking interface of the DE2-70 development
board.
1.2 PRELIMINARY WORK
1.2.1

Download Quartus II Web Edition Software (Free).

1.2.2

Execute Quartus II, select, and complete the Quartus II Interactive Tutorial, which is one of the
opening menu options. There are plenty of materials for self-training under Training, and
Online Demos selections in the opening menu. Get familiar with creating a hierarchical design
using schematic editor and VHDL, compiling, simulating, and configuring a design by reading
the attached handouts. You can get to useful documentation at any time through Quartus II Help
Menu.

1.2.3

Browse through the DE2-70 User Manual at the below link to understand the general
functionality, and some of the features that will be useful at this introductory laboratory. You may
want to read all of the Chapters 1-5. Pay special attention to the use of LEDs, Switches, and 7Segment Displays explained in Chapter 5.
http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=53&No=226&PartNo=4

We will have a copy of the User Manual in the lab. But you may want to download your own
copy as well.
1.2.4

Draw out the 8-bit Booth multiplier datapath for signed multiplication, labeling all blocks and
signals clearly.

1.2.5

Do the design entry of the 8-bit multiplier datapath (1.2.4) using VHDL, and simulate to ensure
correct operation. You can implement the top level block using schematic editor if you prefer.

1.2.6

Draw out a state diagram for the Booth multiplier control unit, given your datapath design from
1.2.5. A START signal (preferably one of the push-buttons on the board) should start off the
multiplication by loading the two 8-bit numbers from the DE2-70 toggle-switches into the proper
registers. The multiplication should be executed only once and stop until the START button is
released and pushed again.

1.2.7

Design the control unit using VHDL, and simulate to ensure correct operation.

1.2.8

Connect the datapath and the control unit blocks using the schematic editor. Make sure the
FPGA device selected in your project is Cyclone II EP2C70F896C6. Use the pin mapping tool to
map the 8-bit binary multiplication inputs to toggle-switches, hexadecimal output to 7-segment
displays, START and CLK signals to push-buttons. It is a good practice to also have a RESET
3

input (toggle-switch or push-button) to initialize all flip-flops in your design. Simulate the design
to ensure correct operation. In addition to functional simulations, do the timing simulations as
well to ensure no timing problems.
1.2.9

Report the maximum clock speed (minimum clock period) for your design.

1.2.10 Try to optimize your design to minimize the number of clock cycles it takes to execute one
multiplication operation (without degrading the clock period significantly.)
1.2.11 Submit a preliminary work report including
an objective statement,
your drawings from 1.2.4 and 1.2.6,
printouts for all of your VHDL and schematic design files,
Booth multiplier timing simulation results showing the correct operation for
multiplications: 14x6, -14x6, -6x14, -6x-14.
the summary from the Timing Analyzer showing the maximum clock frequency you
reported in 1.2.9,
Any optimizations you came up with in 1.2.10 (bonus).

Bring the project and design files you created in your preliminary work to the lab in order to
save time.
1.3 EXPERIMENTAL WORK
1.3.1

Experimental Setup
Ensure you have a DE2-board connected through USB to your PC terminal, and to a power
supply. Power up the board and observe flashing LEDS and cycling numbers on 7-segment
displays. The LCD display should have: Welcome to the Altera DE2-70.

1.3.2

Control Panel Checkout

1.3.3

Execute the DE2 70 Control Panel executable on your desktop. Download code to either
SDRAM-U2 or SSRAM. Connect to the board by pushing the Connect button unless the
communication with the board has already been established.
Try features in the Control Panel to ensure a healthy interface on the DE2-70 board. All
toggle switches, LEDs, push-buttons, 7-segment displays, and the LCD display should be
checked.

Design Validation
Load your project and ensure the correct FPGA family is selected in the Set project and
Compiler Settings menu item under Assign Constraints. Device Category should indicate
Cyclone II EP2C70F896C6.
ii) Enter pin assignments if you have not done this already. Compile and Program Device.
iii) Use carefully picked input vectors to test your design by stepping the CLOCK through the
push-button. If the design does not work, re-run timing simulations with the same input
vectors to ensure your design entry is correct. The multiplier output as observed at the 7segment displays should match your simulation result for each cycle.
iv) Debug any problems you may run into through divide and conquer approach.
v) If the design works:
a. Demonstrate to your lab instructor.
i)

b. How many clock cycles does it take to complete one multiplication operation? Can
you think of any performance enhancements in your architecture to improve CPI?
c. Remap your CLOCK input pin to assign each of the free running clocks to your
CLOCK input progressively from slowest to fastest. Does your design work for the
fastest clock signal available on the DE2-70 board?
vi) If you have not completed within the allocated time, demonstrate how far you were able to
get, and explain your debug and root-causing process in order to get partial credit.

EXPERIMENT #2
8-bit INTEGER ARITHMETIC PROCESSOR DATAPATH DESIGN
2.1 OBJECTIVE
In the first laboratory exercise you got familiar with the Altera Quartus II CAD and DE2-70
prototyping environment, while delivering an 8-bit Booth multiplier with split Datapath and Control Units.
We will extend the multiplier datapath to an 8-bit integer Arithmetic Processor (AP) datapath in this
experiment. The Booth multiplier Control Unit from the last experiment will be modified to support the
new datapath. This AP will make up the processing power of the CPU core you will complete in Lab 3.
2.2 PRELIMINARY WORK
2.2.1

Since this experiment builds upon the results you got in the first lab, it is important that you
finish any incomplete parts of the Booth multiplier design in that experiment before you start this
preliminary work.

2.2.2

Read through the AP specifications (Section 2.3).

2.2.3

You will follow a modular design approach as in the first experiment. Design the ALU, register,
and multiplexer blocks as separate VHDL modules. After making sure they individually satisfy
the provided functional requirements, instantiate and connect them in the schematic editor to
build the AP datapath. Save this as your APdatapath design file. Run timing simulations to
ensure full functionality.

2.2.4

Modify the Booth multiplier control unit (FSM) to support the new set of signals in Table 1, and
connect it to APdatapath. Save this as your new boothmultiplier design file. Run timing
simulations to ensure multiplication still works correctly.

2.2.5

Submit a preliminary work report that includes:


An objective statement;
printouts for all of your VHDL and schematic design files from 2.2.3 and 2.2.4,
AP datapath timing simulations from 2.2.3 showing all of the main AP micro-operations,
documented in Table 1, with realistic signal delays (you may want to submit few distinct
simulations for clarity, instead of packing all the functions into one simulation run),
the Booth multiplier control unit state diagram enhanced from Experiment 1 to support the
new signals provided in Table 1,
timing simulations from 2.2.4 showing an example multiplication operation after the modified
Booth control unit is tied to the new datapath,
a summary from the Timing Analyzer showing the maximum clock frequency of your design.

Note: It is crucial that you achieve a good health of your design as demonstrated by your
simulations before you come to the scheduled lab session. The valuable lab time and
resources should not be used for design work. You will use the lab session for hardware
prototyping, debug, and validation work. If you cannot demonstrate successful simulations
in preliminary work, you may not be accepted to perform this lab.

2.3 Arithmetic Processor Datapath Specifications


2.3.1

Top Level Datapath


The AP top level datapath is depicted in Figure 1. Note this is the enhanced version of the
datapath designed in the first experiment in order to support the new micro-operations in Table
1. The clk signal is not shown in the datapath. The three registers and the Qm1 flip-flop are
triggered on the clk rising edge.
DAT1(7:0)

M
REGISTER
0 1
0

ASEL(1:0)

0 1
2

LDM
DAT3(7:0)

BSEL(1:0)

A
CO
OVF
Z
N

CI

ALU

AOP(2:0)

F
DAT2(7:0)

SRSEL
SIR
SOL

SOR
0

0
1

SIR

SIL
A
REGISTER

SIL

Qm1

SOR
Q
REGISTER

LDQ
Qm1

A(7)

SR,SL
LDA
RST

ALUOUT
(7:0)

A(7:0)

Q(7:0)

Figure 1. Arithmetic Processor datapath top level block diagram


Table 1. Arithmetic Processor Micro-Operations
Micro-operation
LDM LDA
Q 0; Qm1 0;
X
X
A 0; M 0;
M DAT1
1
0
Q DAT2;
0
0
Qm1 0
A(7:0),Q(7:0),Qm1
0
0
A(6:0),Q(7:0),Qm1,0
A(7:0),Q(7:0), Qm1
0
0
0,A(7:0),Q(7:0)
A(7:0),Q(7:0),Qm1
0
0
A(7),A(7:0),Q(7:0)
A 0 AOP 0
0
1

LDQ

ASEL

BSEL

AOP[2:0]

SRSEL SR

SL

RST

AOP

0
7

A 0 AOP 1
A 0 AOP M
A 0 AOP DAT3
A 1 AOP 0
A 1 AOP 1
A 1 AOP M
A 1 AOP DAT3
A Q AOP 0
A Q AOP 1
A Q AOP M
A Q AOP DAT3
A A AOP 0
A A AOP 1
A A AOP M
A A AOP DAT3

2.3.2

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
1
1
1
1
2
2
2
2
3
3
3
3

1
2
3
0
1
2
3
0
1
2
3
0
1
2
3

AOP
AOP
AOP
AOP
AOP
AOP
AOP
AOP
AOP
AOP
AOP
AOP
AOP
AOP
AOP

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

ALU
The block diagram of the simple combinational ALU in the AP datapath is shown in Figure 2.

A(7:0)

B(7:0)

CO
OVF

1
1

1
3

F
8

CI
AOP

F(7)

Z
N

F(7:0)
Figure 2. ALU inputs and outputs
Its functions and controls are defined in Table 2 below. The Carry-In (CI) input and Carry-Out
(CO) output of the ALU are only used for add/subtract operations. In addition to the CO bit, the
ALU has three other status output bits: Overflow (OVF), MSB of the ALU result (N), and Zero
(Z). The status bits and the significance of each are summarized in Table 3.

Table 2. ALU Function Control


Mnemonic
ADD
SUBB
SUBA
OR
AND
NOTAB
XOR
XNOR

AOP[2:0]
0
1
2
3
4
5
6
7

ALU Function
A Plus B Plus CI
A Minus B Minus !CI
B Minus A Minus !CI
A OR B
A AND B
!A AND B
A XOR B
A XNOR B

Symbol
CO,F A + B + CI
CO,F A B !CI
CO,F = B A !CI
FA+B
FA.B
F !A . B
FAB
F !(A B)

Table 3. ALU Status Specifications


Status
CO
OVF
Z
N
2.3.3

Description
1 if there is a Carry Out from add and subtract
operations; 0 for logic operations
1 if the add or subtract operation results in
overflow (XOR of most significant two carry bits);
0 for all logic operations
1 if the value of F(7:0) is 0; applies to all
operations
1 if the MSB of F(7:0) is 1; applies to all
operations

Registers
The M register in Figure 1 has synchronous reset (RST) and load (LDM) functions. Q and A
registers have shift-right (SR) and shift-left (SL) functions with corresponding shift-in (SIR, SIL)
and shift-out (SOR, SOL) bits in addition to the reset (RST) and load (LDQ, LDA). Qm1 flip-flop
has reset (RST), load (LDQ) control inputs. It resets when either RST or LDQ is asserted. When
neither RST nor LDQ is asserted, it loads the least significant bit (SOR) from the Q register. The
specifications of the synchronous logic blocks in the AP datapath are summarized in Table 4, 5,
and 6.
Table 4. M functions
RST LDM Operation
1
X
M0
0
0
Retain
0
1 M DAT1

Table 5. Q (or A) functions


RST
1
0

LDQ
SL SR
(LDA)
X
X X
0
0 0

Operation

Q (A) 0
Retain
Q DAT2
X
(A ALU)
QQ(6:0),SIL
X
(AA(6:0),SIL)
QSIR,Q(7:1)
1
(ASIR,A(7:1))

Table 6. Qm1 functions


RST LDQ
1
X
0
0
0
1
0
0
0

SL
X
0
X
1
0

SR
X
0
X
X

Operation
Qm1 0
Retain
Qm1 0
Qm1 0
Qm1
1
Q(0)

2.4 EXPERIMENTAL WORK


2.4.1

Arithmetic Processor Datapath Validation


i)

Set your APdatapath design file as the top level, and make sure the correct device is
selected in your project (Cyclone II EP2C70F896C6).
ii) Enter pin assignments for the AP datapath in order to assert the input data and control
signals using the available toggle switches and push-buttons, and monitor the outputs using
the 7-segment displays and LEDs. Do not forget the CI input bit to the ALU in addition to the
control signals in Table 1. Since the number of switches on the DE2-70 board is limited, you
may want to connect only 2 or 3 bits of the input data buses to the switches and hard-wire
the rest to GND in your design. Clock and RST signals can use the push-buttons. You can
take advantage of the remaining two push buttons for the other signals.
iii) Debug any problems you run into. Validate your datapath using the previously simulated
vectors to ensure you get the same results as your simulations.
iv) Demo 1: Demonstrate your validation results to the lab instructor.
2.4.2

Booth Multiplier Validation with the Arithmetic Processor Datapath


v) Set your new boothmultiplier design file as the top level.
vi) Make sure you have the new pin assignments. Since the control signals to the AP datapath
will be coming from the control unit in the design, you do not need to tie these to the
switches any more.
vii) Debug any problems. Validate the Booth multiplier operation using previously simulated
vectors to ensure you get the same results as your working simulations.
viii) Demo 2: Demonstrate your validation results to the lab instructor.

EXPERIMENT #3
CPU ISA & DATAPATH DESIGN
3.1 OBJECTIVE
In this lab exercise you will define and design the datapath of a CPU based on provided high level
specifications and constraints. As part of the experiment, you will
a. need to plan out the instructions, and instruction formats (ISA) supported by your CPU;
b. complete the missing pieces of the provided CPU core template in order to design a capable
general purpose computer CPU datapath,
c. do the design entry and validation using the Altera Quartus II CAD and DE2-70 prototyping
environment.
Even though you have 3 weeks to complete this experiment, you are expected to submit a portion
of the preliminary work in parts (a) and (b) at the end of the first 1.5 weeks.
3.2 PRELIMINARY WORK
3.2.1

Define an ISA for the CPU with the below requirements:


The 8-bit core processing datapath of the CPU is depicted in Figure 3.1. Note this datapath
has some similarities with the design you have delivered in Lab 2. It also has some
differences due to the addition of a Register File block. Also note that this is not a complete
CPU datapath; you will need to enhance it to be able to implement the instruction fetchdecode-execute cycle, and support all instruction types.
The ISA needs to support general purpose computing, and therefore will contain different
instructions for arithmetic-logic, memory load-store, and program control operations. There
should be sufficient variety of addressing modes to support both simple constants and
variables, but also more complex data structures like indexed arrays as well.
You will need to analyze the capabilities of the provided CPU core datapath in order to
identify the arithmetic-logic instructions you would like to support. It is required that you have
at least one multiplication and one division instruction in your arithmetic-logic instruction list.
The architecture will use memory-mapped I/O, so there is no need to define distinct I/O
instructions.
Both Instruction and Data Memory have 10 bits of addressing space, and 16-bit words at
each address i.e. you have 2 kB of accessible Instruction Memory and Data Memory.
You will submit a report with the details of your ISA specifications in 1.5 weeks. The
report should contain a description of different instruction formats and details of each
instruction in your ISA. Remember to come up with a name for your ISA (e.g. we previously
studied the details of MIPS ISA, Motorola 68HC11 ISA, generally discussed Intel x86 ISA
etc.). Add a discussion in your report on different choices you made associated with your
ISA design. e.g. How many bits did you assign to your opcode and why? What are the
different addressing modes and why? What are the different types of branches/jumps and
why? When you make these choices, think about the type of things a high-level programmer
(e.g. a C-programmer) is interested in doing while writing a general purpose program, in
addition to paying attention to the principles of ISA design we studied last semester:
o
o
o
o

Simplicity favors regularity


Smaller is faster
Make the common case fast
Good design demands good compromises
11

DAT3
(7:0)

DAT1 DAT2
(7:0) (7:0)

DSEL(1:0)

DataIn(7:0)

SrcAdrA
(2:0)
SrcAdrB
(2:0)

8x
8-bit
RF

SrcAdrA(2:0)
SrcAdrB(2:0)

DstAdr
(2:0)

DstAdr(2:0)
WE

WE
DataOutA
(7:0)

0 1
0

ASEL(1:0)

RST

0 1
2

BSEL(1:0)

CO
OVF
Z
N

DataOutB
(7:0)

CO
OVF
Z
N

ALU

CI

CI

AOP(2:0)

SRSEL
SOR
0

0
1

SIR

SIL
A
REGISTER

SIR
SOL

SIL
SOR
Q
REGISTER

A(7)

Qm1
LDQ
Qm1
SR,SL
LDA
RST

A(7:0)

ALUOUT
(7:0)

Q(7:0)

Figure 3.1. 8-bit CPU Core Datapath

3.2.2

Design a CPU datapath for a multi-cycle computer to support your ISA description above.
Think about the essential functional blocks required in the datapath by the von Neumann
Fetch-Decode-Execute cycle.
You will need to do the design along with your ISA definition. You should develop the
sequence (cycle-by-cycle) of how each of the instruction types in your ISA is going to
execute through the datapath.
Along with the ISA specification report described in 3.2.1, you will submit a CPU datapath
drawing in 1.5 weeks, showing the details of all registers, flip-flops, signals, and
signal buses in your CPU, an extended version of Figure 3.1. It is crucial that this picture
is consistent with your ISA definition. For this lab exercise, you do not need to worry about:
o Control Unit (You should, however, have a good executable idea on how you would
design the Control Unit (or FSM) for your datapath.)
o Instruction and Data Memory: Please show these as abstracted logic blocks in your
datapath drawing. Assuming you will have a 10-bit one-way address bus, and 16-bit two12

way (bi-directional) data bus between your CPU datapath and the Memory Units. We will
worry about the control signals associated with the Memory in a later lab exercise.
3.2.3

Review the Register File design and implementation discussion in Section 3.3.

3.2.4

Follow a modular design approach to do the design entry in VHDL for all the blocks in your
CPU datapath (designed in 3.2.2) including the Register File and any other components you
deem necessary. Connect the blocks using schematic capture tool.
Note: As your design increases in size, it becomes necessary to build some organization skills
(like doing artwork) to ensure your schematic is easy to follow by anybody. i.e. you should not
have a lot of twisting and twirling, or snaking signals around your blocks. Plan out your signal
routing to be as straight and clean as possible similar to the way they look in a drawing like
Figure 3.1. If it will help to build hierarchies by combining multiple blocks under a simpler
symbol, for example combining the registers and muxes around the ALU under a single symbol
along with the ALU, you are recommended to do so. Your top level schematic should represent
the hierarchy of main functions in your CPU datapath.

3.2.5

Run a functional simulation of your datapath for each instruction in your ISA. Use a relaxed
clock frequency when defining your simulation waveforms (e.g. 50ns period). Do not add more
than two or three instruction to each simulation waveform. Since the Control Unit, Instruction
Memory, and Data Memory do not yet exist in your design, you will enter the control and
memory interface signals through the waveform editor manually using the correct timing, as if
these blocks exist.

3.2.6

After you achieve functionality, run timing simulations to ensure your design does not have
timing problems. Before you run the timing simulations save your top level design under a
different name and do appropriate pin assignments by using the available LEDs, 7-segment
displays, push-buttons, and toggle-switches efficiently on DE2-70 board. Do not use the
problematic switch, SW7 for now. The pin assignment will allow your timing simulations to be
accurate. Do not leave any of the unconnected inputs floating in your design (tie to 0 or 1).

3.2.7

Submit a preliminary work report when you arrive at the lab (in 3 weeks) that includes:
An objective statement;
printouts for all of your VHDL, schematic, and timing simulation files from 3.2.4 and 3.2.5 (do
not submit meaningless simulations which do not demonstrate results clearly),
a summary from the Timing Analyzer showing the maximum clock frequency of your design.

Note: It is crucial that you achieve a good health of your design as demonstrated by your
simulations before you come to the scheduled lab session. The valuable lab time and
resources should not be used for design work. You will use the lab session for hardware
prototyping, debug, and validation work. If you cannot demonstrate successful simulations
in preliminary work, you may not be accepted to perform this lab.

13

3.3 Register Files


3.3.1

Overview
A register file (RF) is the central storage of a CPU. Most operations involve using or modifying
data stored in the register file. The register file depicted in Figure 3.2 has 4 locations
(00,01,10,11), and is a 4x4-bit RF. A RF can use a Latch or a Flip-Flop as the 1-bit cell.
In this example, each cell of the register file is constructed using a D-FF with synchronous
RST. Therefore, a register is represented by a row of the 2-dimensional D-FF array shown in
the figure. Each of the four registers has a clock input (positive edge triggered), a data input,
and a data output. The collection of registers is managed by one 2-to-4 decoder at the input
stage, and two 4-to-1 multiplexors at the output stage. There are two output ports (PortA and
PortB) for the register file. The output of each register is connected to two multiplexors. The two
registers to be read at the output ports are selected through SrcAdrA and SrcAdrB. The register
to be written is selected through the DstAdr input to the 2-to-4 decoder. The Write-Enable (WE)
signal gates the output of the 2-to-4 decoder enabling the RF for writing. The enabled register
decoder line, is further gated with the CLK. Thus, the write operation only happens on the rising
edge of the clock. The read operation is asynchronous so that DataOutA and DataOutB are
available as soon as the SrcAdrA and SrcAdrB are set respectively.
SrcAdrA[1:0]

CLK

RST

RST

RST

Q
RST

1
DstAdr
[1:0]

2-to-4
decoder

RST

RST

RST

0
1
2
3

DataOutA
[3:0]

0
1
2
3

DataOutB
[3:0]

Q
RST

2
D

WE

RST

RST

RST

RST

RST

Q
RST

RST

Q
RST

RST
DataIn
[3:0]

[3]

[2]

[1]

[0]

SrcAdrB[1:0]

Figure 3.2. A Register File block diagram


3.3.2

Implementing Variable Length Arrays in VHDL


VHDL allows use of temporary variables in order to control the generated hardware size (e.g.
number of bits in a register, number of registers in a register file) through parameterized
description. This provides full flexibility in the VHDL code, where hardware can be scaled easily
by changing the value of few variables declared through generic statements. Figure 3.3
contains an example of a variable sized register coding for RF applications. Generic integer
parameter w controls the size of the register (set to 4 bits in this case). The number of bits
stored by the register can therefore be easily changed by modifying the 4th line of the code.
Note the use of a temporary variable i in the body of the code without any prior declaration.
Using a for loop similar to a high level programming language, the variable allows the
generation of the Flip-Flops in the register one bit at a time.
14

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
entity RFregister is
generic (w : INTEGER := 4);
Port ( RST, LOAD : in STD_LOGIC;
clk : in STD_LOGIC;
D : in STD_LOGIC_VECTOR(w-1 DOWNTO 0);
O : out STD_LOGIC_VECTOR(w-1 DOWNTO 0));
end RFregister;
architecture Behavioral of RFregister is
begin
p1: process(clk)
begin
if (clk'EVENT AND clk='1') then
for i in w-1 downto 0 loop
if RST='1' then
O(i) <= '0';
elsif LOAD='1' then
O(i) <= D(i);
end if;
end loop;
end if;
end process p1;
end Behavioral;

Figure 3.3. Sample VHDL code for parameterized (w-bit) register design entry
Similarly, parameterized register file can be generated using the VHDL code in Figure 3.4. This
particular example corresponds to Figure 3.2. The generic declaration at the beginning defines
all variables. Note even though the number of bits required for the register address is declared
as an independent variable (m) in the example, the value of m directly depends on the value of l,
and could have been calculated in the code using a (log2 l) function. Generate statement in
the first part of the architecture description is used to repeatedly instantiate the RFregister (from
above), and building the array of D-FFs one row at a time. The next process declarations
describe the decoder and two multiplexors respectively in a parameterized way. Note in the
body
of
these
declarations
i=conv_integer('0'&SrcAdrB)
converts
a
STD_LOGIC_VECTOR type into unsigned integer so that it can be compared with the
temporary integer variable i. (Variables of different type cannot be directly compared.)
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity RF is
generic (l : INTEGER := 4; -- l: number of registers in RF
w : INTEGER := 4; -- w: register width (or # bits in a register)
m : INTEGER := 2); -- m: # bits in register address
Port ( RST, WE : in STD_LOGIC;
SrcAdrA, SrcAdrB, DstAdr : in STD_LOGIC_VECTOR(m-1 DOWNTO 0);
clk : in STD_LOGIC;
DataIn : in STD_LOGIC_VECTOR(w-1 DOWNTO 0);
DataOutA, DataOutB : out STD_LOGIC_VECTOR(w-1 DOWNTO 0));
end RF;

15

architecture Behavioral of RF is
-- following are defined for the outputs of the D-FF array
signal tmpq: std_logic_vector(w*l-1 downto 0);
-- following are the load signals for individual registers
signal load: std_logic_vector(l-1 downto 0);
component RFregister is
Port ( RST, LOAD : in STD_LOGIC;
clk : in STD_LOGIC;
D : in STD_LOGIC_VECTOR(w-1 DOWNTO 0);
O : out STD_LOGIC_VECTOR(w-1 DOWNTO 0));
end component;
begin
-- Generation of correct number of registers:
genreg1: for i in l-1 downto 0 generate begin
registers: RFregister port map(
RST => RST,
LOAD => load(i),
clk => clk,
D => DataIn(w-1 DOWNTO 0),
O => tmpq((i+1)*w-1 DOWNTO i*w)
);
end generate genreg1;
-- Parameterized DstAdr Decoder:
p1: process (WE, DstAdr) begin
for i in l-1 downto 0 loop
if ((i=conv_integer('0'&DstAdr)) AND (WE='1')) then
load(i) <= '1';
else
load(i) <= '0';
end if;
end loop;
end process p1;
-- Parameterized SrcAdrA MUX:
p2: process (SrcAdrA) begin
for i in l-1 downto 0 loop
if i=conv_integer('0'&SrcAdrA) then
DataOutA <= tmpq((i+1)*w-1 DOWNTO i*w);
end if;
end loop;
end process p2;
-- Parameterized SrcAdrB MUX:
p3: process (SrcAdrB) begin
for i in l-1 downto 0 loop
if i=conv_integer('0'&SrcAdrB) then
DataOutB <= tmpq((i+1)*w-1 DOWNTO i*w);
end if;
end loop;
end process p3;
end Behavioral;

Figure 3.4. Sample VHDL code for parameterized (l x w) register file

16

3.4 EXPERIMENTAL WORK


CPU Datapath Validation
i) Program the version of the design with pin assignments to the Cyclone II FPGA on DE2-70.
ii) Go through the combination of tests you simulated to verify functionality of your CPU.
Ensure each instruction in your ISA is covered. Debug any problems you run into. Note any
discrepancies with simulations.
iii) Demonstrate your validation results to the lab instructor.
iv) After achieving functionality with your design, use the APLL megafunction in Altera
component library to switch your clock input with a variable frequency clock. Check with the
instructor on how to do this. Validate the maximum frequency your design can achieve at
room temperature.

EXPERIMENT #4 & #5
MULTI-CYCLE CPU DESIGN w/ SPLIT INSTRUCTION AND DATA MEMORY
4.1 OBJECTIVE
You will complete and integrate few missing building blocks of your multi-cycle CPU in this lab
exercise using a split-memory architecture:
a. Control Unit and Front Panel;
b. Instruction Memory,
c. Data Memory.
You will build and test the whole of the CPU within two lab sessions by running each of the
instructions in your ISA (from Experiment 3) after broadside loading the Instruction Memory during the
programming of the Cyclone II FPGA on the DE2-70 Development board.
4.2 PRELIMINARY WORK
4.2.1

Control Unit and Front Panel


Define and design a hardwired (not microcoded) Control Unit based on the register transfer
sequences you identified during the datapath and ISA design phase for each instruction:

Remember the average CPI performance of your design will directly depend on the number
of FSM states reserved for each instruction type. Try to minimize the CPI.

Depending on your prior design of the ISA and datapath, you may have some instructions
(e.g. multiplication) which take many unique cycles (or FSM states) to complete. One
strategy would be to first design and validate the control associated with these instructions
as separate FSMs based on previous labs, and integrate them to the main FSM afterwards.

Before you can complete the FSM, you will need to make sure memory interface has the
correct timing. Go through the next two sections to generate the Instruction and Data
Memory modules and ensure they have consistent timing with the Control Unit outputs.

Implement the user accessible inputs shown in Table 4.1 at the interface to the Control Unit
in order to facilitate the start, stop, and debug of your machine.
Table 4.1. Front Panel Inputs

RUN
(toggle-switch)

CLR/INIT
(push-button)
AUTO(0)/
MANUAL(1)
(toggle-switch)
MAN_CLK

When RUN=1, FETCH the instruction from the Instruction Memory


address as stored in the PC (e.g. address 0), and continue FETCHDECODE-EXECUTE cycle for the rest of the program.
When RUN=0, pause the execution after the completion of the current
instruction execution (When RUN is reasserted, the execution should
continue from the next instruction whose address is in the PC).
Initialize the PC to its default value (the location of the 1st instruction in
the instruction memory), and clear the machine state (reset all
architectural and user registers)
When in MANUAL mode, switch the clock input over to a push-button
on the DE2-70 board. When in AUTO mode, a free running clock is
used either directly from the board or from the output of a Phase
Locked Loop integrated to your design.
Push-button clock used only when in MANUAL mode.
18

Implement the user visible outputs at the interface to the Control Unit, as depicted in Table
4.2, in order to facilitate the debug of your machine.
Table 4.2. Front Panel Outputs
RUN Indicator
(green LED)
CLR/INIT
Indicator
(green LED)
AUTO/MANUAL
Indicator
(green LED)

4.2.2

On when the machine is in RUNning state. It turns off when RUN=0.


Turns on when the CLR/INIT button is pressed. Turns back off when
the button is released.
Turns on when in AUTO mode and off when in MANUAL mode.

Critical Debug
Signals
(red LEDs)

Bring out the following signals to an LED in order to ease debug:


PC Write Enable
IR Write Enable
Register File Write Enable
Register A and Q Write Enables
Data Memory Write Enable
(And any other important signals you would like to monitor)

Control State
(7-Segment
Display)

Decode the (present) state information from your FSM and display
using hexadecimal digits on your DE2-70 board.

Instruction Memory
The instruction memory will be implemented as a 2 kB ROM with 1024 words and 16 bits
per word as described in Experiment 3. The ROM can be programmed either while uploading
the design bit file to the Cyclone II FPGA, or during an interactive debug session using the InSystem Memory Content Editor under the Quartus II Tools menu. The following steps should
be followed in instantiating a ROM into your top level schematic design:
a. Double click empty space to launch the library menu
b. Pick altsyncram component under quartus/libraries megafunctions storage (or you
can just type in the name to find it).
c. Pick VHDL as the output file type in the first window of the MegaWizard Plug-In Manager.
Make sure the output file will be created under your own project directory.
d. Next Window: Pick With one read port (ROM mode) at the top of the menu. Also indicate
you want to specify the memory size As a number of words in the below menu.
e. Next Window: Select 16 bits for the width of the Read/Write Ports and select 1024 for the
number of 16-bit words. You can leave Auto for the memory block type and maximum
block depth.
f. Next Window: Select Single clock for the clocking mode.
g. Next Window: Unselect any ports to be registered. i.e. you want minimum amount of
registering to prevent a read operation from taking many clock cycles. Read input ports are
registered by default and you do not have control over those based on the Cyclone II FPGA
internal memory architecture. No clock enable or clr signals are needed. You can add them
if you prefer.
h. Next Window: Pick Yes, use this file for the memory content data. In the provided
space type in a sensible file name you will use to store your program in hexadecimal format
with .mif suffix. e.g. test_program_instructions.mif. Also check the option at the bottom to
Allow In-System Memory Content Editor to capture and update content and enter a
19

i.

j.

short word for the Instance ID to be used later to in the In-System Memory Content Editor to
refer to this particular ROM e.g. rom1.
For the next set of windows you do not have to enter anything. Finish creating the block.
Once you are done, you will find a .JPEG file generated in your project directory to specify
the pin timing of the memory module you have just created. Study this file to ensure the
timing of the input and output signals with respect to the clock matches your expectations. If
not, you may need to modify your Control FSM a bit.
Connect the ROM as the Instruction Memory to the rest of your design.
You can modify the contents of the ROM to be programmed by opening the .mif file
created in your project directory in a text editor e.g. using Wordpad. The content of this file is
integrated into the bit file to be used for programming the FPGA at the time of the design
compilation stage. You can also use the same file to modify the ROM contents while the
system is running on the DE2-70 board using the In-System Memory Content Editor.

4.2.3

Data Memory
The data memory will be implemented as a 2 kB Static RAM with 1024 words and 16 bits
per word as described in Experiment 3. The steps required to create the RAM are very similar to
the above steps followed for the ROM, except a write port is required. Only the modifications to
the previous steps are highlighted below.
b. Again use the same altsyncram component from the megafunction libraries.
d. Pick With one read/write port (Single-port mode) in the parameter settings general
menu. Also indicate you want to specify the memory size As a number of words in the
below menu.
e. Next Window: Select 16 bits for the width of the Read/Write Ports and select 1024 for the
number of 16-bit words. You can leave Auto for the memory block type and maximum
block depth.
h. Next Window: Pick Yes, use this file for the memory content data. In the provided
space type in a different name than what you used for the ROM. You can use this .mif file to
initialize the data memory with desired data. Remember to check the option at the bottom to
Allow In-System Memory Content Editor to capture and update content and enter a
short word for the Instance ID to be used later to in the In-System Memory Content Editor to
refer to this particular RAM e.g. ram1.
i. For the next set of windows you do not have to enter anything. Just finish creating the block.
j. Connect the RAM as the Data Memory to the rest of your design.
You can modify the initial contents of the RAM by opening the .mif file created in your
project directory in a text editor e.g. using Wordpad. You can also use the same file to
modify the RAM contents while running a debug session using the In-System Memory
Content Editor for debug purposes.
Special Note: Cyclone II has a problem associated with the M4K memory feature and the
In-System Memory Content Editor, which will prevent your design from compiling once
you add the RAM component with step h configuration above. Add the following line to
the very end of your project .QSF file (in your project directory) and save using a text
editor before running compilation:
set_global_assignment -name CYCLONEII_M4K_COMPATIBILITY OFF

As your design gets larger, it is more important to modularize different parts (e.g. Memory
subsystem, RF and ALU subsystem, Control subsystem, etc.) for your top level schematic to be
readable. There is a tradeoff in doing this: Some of your internal signals lose visibility at the top
level as you push them into modules, and therefore debug may require a bit more work when
you need to look at these signals.
20

4.3 EXPERIMENTAL WORK


Preliminary Work Report
i)

ii)

Submit a report containing your simulations of:


a. Control Unit showing state transitions for each instruction type
b. Full simulations of following nature showing critical control and data signals to
demonstrate functionality (fully annotate simulation waveforms to explain):
1. Load immediate values to two registers
2. Do an arithmetic operation between the two registers and store the result into a
register
3. Store the register value to a memory address
4. Load from the same memory address into another register
5. Run a conditional and an unconditional branch (jump)
Also be ready to submit a soft copy of your full design with simulation waveform files.

Multi-Cycle CPU Validation


i)

ii)

iii)
iv)
v)

Use the Front Panel features to validate your Control Unit first by storing each of your
OPCODEs to the Instruction Memory, and going through the FSM cycle-by-cycle in
MANUAL clocking mode to ensure all state transitions are as expected. Also pay attention to
the LEDs to ensure critical control signals are asserted in the expected FSM states. It is very
important that your PC will be initialized with a known address value (e.g. 0x0000) during
startup, because this will determine the address of the first instruction to be fetched from the
Instruction Memory.
Write simple code segments to load some registers, run arithmetic/logic operations on them,
and store the results to the data memory. You can check the final state of the data memory
in the lab by using the In-System Memory Content Editor.
Validate the more sophisticated instructions which consume many clock cycles to execute
such as multiplication, division, shift, etc.
Finally write program segments to validate conditional and unconditional branches/jumps in
your ISA.
Demonstrate each of your instruction types to your lab instructor.

21

EXPERIMENT #6 & #7: FINAL


PROGRAMMING AND VALIDATION OF A CPU
6.1 OBJECTIVE
In this experiment you will complete a set of macro-code programs for the CPU you designed in the
previous labs. You are expected to write an assembler in order to convert your mnemonic based
assembly language into binary machine code (similar to one of the assignments you completed last
semester for the MIPS architecture). Programming in machine language would be too cumbersome.
You will upload your programs to the instruction memory one at a time, and execute to validate full
functionality of your CPU.
6.2 PRELIMINARY WORK
Write code to execute each of the following tasks using your CPU.
6.2.1

n-long Array Computation


You will be provided with Data Memory entries, which look like the one in Table 6.1. The first
number n represents the array length. The rest of the entries contain the arrays A, B, and C
each n memory words long. Your program will read the 3 input arrays, and do the below
computation on the array elements, which are 8-bit signed integers. The program will then store
the result array Z back to the Data Memory starting with the next available entry in the Data
Memory.
Z[i] = A[i] + B[i] - C[i]
Table 6.1. Data memory entries for nx1 array computation
Address
0
1
2

6.2.2

Entry
n
A[0]
A[1]

A[n-1]

Address
n+1
n+2

2n

Entry
B[0]
B[1]

B[n-1]

Address
2n+1
2n+2

3n

Entry
C[0]
C[1]

C[n-1]

Address
3n+1
3n+2

4n

Entry
Z[0]
Z[1]

Z[n-1]

Text Parser
This program will read ASCII encoded text from the Data Memory starting at data memory
address 2, count the number of alphanumeric and space characters (backspace, tab, line feed,
form feed, carriage return), and replace any consecutive identical space characters with a single
space character. The program will stop execution when it runs into a null character in the text,
report the number of alphanumeric characters detected at Data Memory address 0, and the
number of space characters at address 1.

22

6.2.3

Multiplication
The program will read two 8-bit signed numbers from Data Memory addresses 0 and 1, multiply
them, and store the result to address 2.

6.2.4

Division
The program will read two 8-bit signed numbers from Data Memory addresses 0 and 1, divide
the first number by the second number, and store the result to address 2.

6.2.5

Infinite Loop
You will write the shortest program possible using your ISA which loops onto itself forever. The
purpose of this program will be to look at performance and power dissipation.

6.3 EXPERIMENTAL WORK


i.

Debug, validate, and demonstrate the correct execution of each of the above programs to
the lab instructor as well as the CPU Front Panel functions outlined in Lab 5.

ii.

What is the average CPI for each of your test programs? What is the execution time (CPU
time only) for each?

iii. Check with your lab instructor about measuring the consumed average power during the
execution of the Infinite Loop test. Please do not do this by yourself, since the measurement
requires the removal of the DE2-70 top lid to access the power lines.
iv. BONUS: If you can demonstrate that your machine is capable of handling 16-bit data
memory words, i.e. two 8-bit numbers per memory word, BONUS points will be
awarded.
6.4 FINAL REPORT
You will prepare a final report for the semester which has all the critical information about the
CPU you designed during the semester:
-

The CPU name and the design team


Objectives and design approach
The final ISA, and the main instruction formats (entered electronically not handwritten)
The final design files (VHDL and schematic)
The code for each of the five programs described above (in assembly and binary)
Sample input and output files showing the relevant portions of the data memory before and
after each program is run
Performance measurements and power estimates
Conclusions

You will submit one copy of the final report by June 1st. Please type neatly and bind it into a
formal report booklet. It is recommended you keep a copy of the report to yourself for future
reference.

23

You might also like