Ca Manual
Ca Manual
Lab Manual
Submitted by:
2020-EE-509
2020 -EE-551
Submitted to:
Engr. Saira Arif
The Basys-3 is an entry-level FPGA development board designed exclusively for the Vivado® Design Suite
featuring the Xilinx® Artix®-7-FPGA architecture. Basys-3 is the newest addition to the popular Basys line of
FPGA development boards for students or beginners just getting started with FPGA technology. The Basys-3
includes the standard features found on all Basys boards: complete ready-to-use hardware, a large collection
of on-board I/O devices, all required FPGA support circuits, and a free version of development tools and at a
student-level price point. The important components of the FPGA are shown in Fig. 1.
b
a1
c o1 y = a.b + c
1. From Quick Start click on Create Project. A new project dialog box will appear as shown
3. The Add Sources dialog box will appear. Click on Next (we will add the sources later). A
constraints file dialog box will appear. Click on Next.
4. A Default Part dialog box will appear. Select the same family, package and speed of the board as
shown in Fig. 5. Then, from the parts shown select xc7a100tcg324-1 and click on Next.
input a ,b , c
};
and a1 (o ,a , b );
or o1 (y ,o , c );
e n d mo d ul e
3.1.2.4 Click on Schematic to check your gate level design as shown in Fig. 7.
Fig. 7: Vivado Gate Level Schematic
3.1.5.2 Vivado will take a few seconds connect to the FPGA. Once done, click on Program device
from therea dialog box appears and click on Program to program your FPGA (Fig. 10).
the Verilog code willbe implemented on the FPGA. To check the behavior of the LED, make
the truth table of a.b+c and check for all the possible combinations of inputs.
Scroll down the box, click on Unconstrained Paths and then None to None and the Setups. The window
will show the maximum path along with the name of path in this example it gives maximum delay for a [0]
to c [1] with the delay of 6.778 ns as shown in Fig. 12.
Fig. 12: Maximum delay for a[0] to c[1] with the delay of 6.778ns.
ab
TRUTH TABLE:
a b y
0 0 1
0 1 1
1 0 1
1 1 1
CODE:
module gatelevel(output y,input a,b);
assign c = a & b;
assign d = a ^ b;
assign e = a & b;
assign f = ~(d & e);
assign y = c | f;
endmodule
SCHEMATIC:
TESTBENCH:
module gl_tb();
reg a;
reg b;
wire y;
gatelevel uut ( .a(a),.b(b),.y(y));
initial begin
#10 a=1'b0;b=1'b0;
#10 a=1'b0;b=1'b1;
#10 a=1'b1;b=1'b0;
#10 a=1'b1;b=1'b1;
#10$stop;
end
endmodule
SIMULATION:
INSTRUCTOR: ENGR. SAIRA ARIF
LAB TASKS
This is the introductory lab which will cover the tools installation, its usage, and a brief revision of Verilog HDL. This lab has
been subdivided into two parts. In the first phase, the tutorial file will be provided to install Xilinx Vivado 2019.2 into your
systems, while the other phase consists of Verilog based task to be performed on Vivado.
DELIVERABLES
➢ Prepare a report for everything that you have done in “Lab Tasks”. Explain all the steps and observations in the report.
TOOLS INSTALLATION
2 You can use the manual at this link to install Vivado in your system.
PROBLEM SET
HALF ADDER
➢ Half Adder is a combinational arithmetic circuit that adds two numbers and produces a sum bit (S) and carry bit (C) as
the output. If A and B are the input bits, then sum bit (S) is the X-OR of A and B and the carry bit (C) will be the AND of
A and B. Half adder is the simplest of all adder circuit, but it has a major disadvantage. The half adder can add only
two inputs (A and B) and has nothing to do with the carry if there is any in the input. Its truth table, module schematic,
and its gate level realization are shown in Figure 1.
➢ You are requested to write Verilog code for the circuit using structural modeling (using primitives such as AND, OR,
etc., gates). Your inputs are two 2-bit numbers.
➢ Synthesize the circuit for the Basys-3 100T FPGA Board and Simulate it by writing a test bench module for half adder
covering all possible IOs combinations.
➢ Read the synthesis report of your circuit and extract the useful information related to maximum combinational delay,
resources in the FPGA used (like lookup tables (LUTs), input/output (IOs), etc.), timing information, power usage, etc.
Report this information in your lab report.
CODE:
module half_adder(a, b, sum, carry);
input a;
input b;
output sum;
output carry;
assign carry=a&b;
assign sum=a^b;
endmodule
SCHEMATIC:
TEST BENCH:
module half_adder_tb();
reg a;
reg b;
wire sum;
wire carry;
half_adder uut ( .a(a),.b(b),.sum(sum), .carry(carry));
initial begin
#10 a=1'b0;b=1'b0;
#10 a=1'b0;b=1'b1;
#10 a=1'b1;b=1'b0;
#10 a=1'b1;b=1'b1;
#10$stop;
end
endmodule
SIMULATION:
FULL ADDER
➢ The main difference between a half-adder and a full-adder is that the full-adder has three inputs and two outputs. The
first two inputs are A and B and the third input is an input carry designated as CIN. Its truth table, module schematic,
and its gate level realization are shown in Figure 2.
➢ You are requested to write Verilog code for the circuit using half adder module which you have developed in Problem
1. Your inputs (A & B) are two 2-bit numbers while Carry CIN is also 2 bit but its MSB is zero.
➢ Synthesize the circuit for the Basys-3 100T FPGA Board and Simulate it by writing a test bench module for half adder
covering all possible IOs combinations.
➢ Read the synthesis report of your circuit and extract the useful information related to maximum combinational delay,
resources in the FPGA used (like lookup tables (LUTs), input/output (IOs), etc.), timing information, power usage, etc.
Report this information in your lab report.
CODE:
module full_adder(a, b, c, sum, carry);
input a;
input b;
input c;
output sum;
output carry;
wire d,e,f;
xor(sum,a,b,c);
and(d,a,b);
and(e,b,c);
and(f,a,c);
or(carry,d,e,f);
endmodule
SCHEMATIC:
TEST BENCH:
module full_adder_tb();
reg a;
reg b;
reg c;
wire sum;
wire carry;
full_adder uut ( .a(a), .b(b),.c(c),.sum(sum),.carry(carry) );
initial begin
#10 a=1'b0;b=1'b0;c=1'b0;
#10 a=1'b0;b=1'b0;c=1'b1;
#10 a=1'b0;b=1'b1;c=1'b0;
#10 a=1'b0;b=1'b1;c=1'b1;
#10 a=1'b1;b=1'b0;c=1'b0;
#10 a=1'b1;b=1'b0;c=1'b1;
#10 a=1'b1;b=1'b1;c=1'b0;
#10 a=1'b1;b=1'b1;c=1'b1;
#10$stop;
end
endmodule
SIMULATION:
CODE:
module binary_multiplier (input [3:0] a, input [3:0] b, output reg [7:0] p);
reg [3:0] i;
reg [7:0] temp;
always @* begin
temp = 0;
for (i = 0; i <= 3; i = i + 1) begin
if (b[i] == 1) temp = temp + (a << i);
end
p = temp;
end
endmodule
TESTBENCH:
module binary_multiplier_tb;
reg [3:0] a;
reg [3:0] b;
wire [7:0] p;
binary_multiplier UUT (a, b, p);
initial begin
// Test cases
a = 4'b0001; b = 4'b0010; #10;
a = 4'b0010; b = 4'b0001; #10;
a = 4'b0011; b = 4'b0010; #10;
a = 4'b0100; b = 4'b1000; #10;
// End of test
$finish;
end
endmodule
SIMULATION:
Lab Manual
Tools
Introduction
This is the introductory lab which will cover the concepts of Finite State Machines, Datapath and Controller design.
Deliverables
• Read the Tutorial provided with the manual handout (also available here). Complete the
lab and prepare a report for everything that you have done in “Problem Set”. Explain all
the steps and observations in the report.
Problem Set
Bubble sort is an algorithm to sort a list of numbers in ascending or descending order. Followingis the algorithm of
bubble sort in C language (adapted from this link):
int c, d, swap;
// Initial assignment of unsorted numbers
array[0] = 1; array[1] = 6; array[2] = 2; array[3] = 9;
swap = array[d]; Listing 1: Code for bubble sort, can be seen in action on http://goo.gl/h6ij6d
array[d] = array[d+1];
array[d+1] = swap;
return 0;
}
Code
module shift_register (
input clk, rst, en, // clock, reset, and enable inputs
input [7:0] data_in, // 8-bit parallel data input
output reg serial_out // serial data output
);
endmodule
Testbench
module shift_register_tb;
reg clk, rst, en;
reg [7:0] data_in;
wire serial_out;
shift_register dut (
.clk(clk),
.rst(rst),
.en(en),
.data_in(data_in),
.serial_out(serial_out)
)
initial begin
clk = 0;
rst = 1;
en = 0;
data_in = 8'b0;
#10 rst = 0; // deassert reset after 10 time units
end
always #5 clk = ~clk; // toggle clock every 5 time units
always @(*) begin
if (clk) begin
if (!rst) begin
// Test 1: Load data and shift
en = 1;
data_in = 8'b10101010;
end else begin
en = 0;
data_in = 8'b0;
end
end
end
initial begin
// Wait for some cycles to observe the output
#100;
// Check output for Test 1
if (serial_out !== 1'b1) begin
$display("Test 1 failed! Expected serial_out = 1, actual = %b", serial_out);
$finish;
end
// Test complete
$display("All tests passed!");
$finish;
end
endmodule
Schematic
Simulation
INSTRUCTOR: ENGR. SAIRA ARIF
// Internal registers
reg [7:0] data_reg;
reg load_reg;
reg transmit_reg;
reg [9:0] baud_counter;
reg tx_reg;
// Datapath
always @ (posedge clk or negedge rst_n) begin
if (~rst_n) begin
data_reg <= 8'h00;
load_reg <= 1'b0;
transmit_reg <= 1'b0;
baud_counter <= 10'h000;
tx_reg <= 1'b1;
end else begin
if (load) begin
data_reg <= data_in;
load_reg <= 1'b1;
end else begin
load_reg <= 1'b0;
end
if (transmit) begin
transmit_reg <= 1'b1;
end
if (baud_counter == BAUD_DIV) begin
baud_counter <= 10'h000;
tx_reg <= 1'b0;
end else begin
baud_counter <= baud_counter + 1;
end
TEST BENCH:
module uart_module_tb();
reg clk;
reg rst_n;
reg [7:0] data_in;
reg load;
reg transmit;
wire tx_out;
uart_module uart_inst (
.clk(clk),
.rst_n(rst_n),
.data_in(data_in),
.load(load),
.transmit(transmit),
.tx_out(tx_out)
);
initial begin
clk = 1'b0;
rst_n = 1'b0;
data_in = 8'h00;
load = 1'b0;
transmit = 1'b0;
#100 rst_n = 1'b1;
#100;
// Load data
data_in = 8'h55;
load = 1'b1;
#100 load = 1'b0;
// Transmit data
transmit = 1'b1;
#100 transmit = 1'b0;
// Wait for transmission
#100;
// End simulation
$finish;
end
endmodule
SCHEMATIC:
SIMULATION:
Shift Register:
Lab Manual
The Lab Resources will be available at the following link. All reference books, related tutorials,assessment rubrics
will be updated here: Resources EE475
Tools
Deliverables
Implement a single cycle RV processor that supports all instructions of RISC-V ISA. Theprocessor must has a fetch
unit, decode logic, functional units, a register file, I/O support and access to memory. You will be implementing the
datapath while designing a single cycle RISC- V processor which will work with five stages mainly,
• Intruction Fetch
• Operand Fetch
• Execution
• Memory Access
• Write back the result
It must contain a file that will be used as Random Access Memory. Memory is 8-bits wide butthe processor
accesses 32-bits (4B) for operation. It has 32 Registers working as General purpose Register. While one special
purpose register (Program Counter) will be used to hold the address of the instruction. Each Register must be 32 bit
wide.
Figure 1 Instruction Formats for four different classes of Instruction
Instruction format for S-type is for store instructions. The register rs1 is the base register that is added to the 12-bit
immediate field to form the memory address. (The immediate field is split into a 7-bit piece and a 5-bit piece.) Field
rs2 is the source register whose value should be stored into memory. Instruction format for SB-type conditional
branch. The registers rs1 and rs2compared. The 12-bit immediate address field is sign-extended, shifted left 1 bit,
and added tothe PC to compute the branch target address.
The datapath with all necessary multiplexers and all control lines identified is shown in Figure 2. The control lines
are shown in color. The ALU control block has also been added, which depends on the funct3 field and part of the
funct7 field. Whereas the complete diagram of datapath with controller is shown in Figure 3. The input to the control
unit is the 7-bit opcode field, 3-bit func3 and 7-bit func7 fields from the instruction. The outputs of the control unit
consist of two 1-bit signals that are used to control multiplexers (ALUSrc and MemtoReg), three signals for controlling
reads and writes in the register file and data memory (RegWrite, MemRead, and MemWrite), a 1-bit signal used
in determining whether to possibly branch (Branch), and a 4-bit control signal for the ALU (ALUOp). An AND gate
is used to combine the branch control signal and the Zero output from the ALU; the AND gate output controls the
selection of the next PC.
CODE
Verilog module for a single-cycle implementation of a MIPS CPU. The module contains several sub-modules, each
implementing a specific part of the CPU functionality.
The main module, Single_Cycle_Top, has two input ports: clk and rst. The output ports are:
• PC_Top: the current value of the program counter
• RD_Instr: the current instruction being executed, read from the instruction memory
• RD1_Top: the value of the first source register read from the register file
• Imm_Ext_Top: the immediate value, sign-extended to 32 bits
• ALUResult: the result of the ALU operation
• ReadData: the data read from the data memory
• PCPlus4: the value of the program counter incremented by 4
• RD2_Top: the value of the second source register read from the register file or the data read from
the data memory
• SrcB: the second operand of the ALU, selected from either RD2_Top or Imm_Ext_Top depending
on the value of the ALUSrc control signal
• Result: the result of the instruction, written back to the register file
• RegWrite: a control signal that enables register write
• MemWrite: a control signal that enables data memory write
• ALUSrc: a control signal that selects the second operand of the ALU
• ResultSrc: a control signal that selects the result to be written back to the register file
• ImmSrc: a control signal that selects the source of the immediate value
• ALUControl_Top: the control signal for the ALU operation
The main module instantiates the following sub-modules:
• PC_Module: implements the program counter
• PC_Adder: adds 4 to the current value of the program counter to get the next instruction address
• Instruction_Memory: implements the instruction memory
• Register_File: implements the register file
• Sign_Extend: sign-extends the immediate value to 32 bits
• Mux_Register_to_ALU: selects the second operand of the ALU
• ALU: implements the ALU operation
• Control_Unit_Top: generates the control signals for the CPU
• Data_Memory: implements the data memory
• Mux_DataMemory_to_Register: selects the result to be written back to the register file.
Overall, this Verilog module represents a basic implementation of a MIPS CPU using the single-cycle approach.
TEST BENCH
A test bench for a single-cycle RISC processor would typically involve the following steps:
1. Load an assembly program into the instruction memory of the processor.
2. Set the inputs to the processor such as reset, clock, and any necessary input signals for the program.
3. Run the clock for a number of cycles, allowing the processor to execute the program.
4. Monitor the outputs of the processor, including the values of the registers and any output signals.
5. Compare the expected output values with the actual output values to verify the correctness of the
processor implementation.
The test bench would need to cover a wide range of test cases to ensure that the processor implementation is
correct and robust. Test cases could include various combinations of instructions, different data values,
and edge cases such as overflow conditions or branching. The test bench would need to be carefully
designed to ensure that it thoroughly tests the processor and detects any issues that may arise.
SCHEMATIC
SIMULATION
INSTRUCTOR: ENGR. SAIRA ARIF
In this streamlined architecture, each instruction undergoes fetching from memory, decoding, and
execution within a single clock cycle. This contrasts with multi-cycle data paths, offering simplicity with a
fixed cycle time. However, the rigidity of the cycle time can impose limitations on the processor's clock
speed and, consequently, the system's overall performance.
Comprising essential components like the Instruction Memory (IM), Program Counter (PC), Instruction
Register (IR), Register File (RF), Arithmetic Logic Unit (ALU), Data Memory (DM), and Control Unit (CU),
the Single Cycle RISC V data path orchestrates the instruction execution and data manipulation processes.
Execution of instructions unfolds through a series of sequential steps: instruction fetch, decode, operand
fetch, execute, memory access, and write-back. Each of these phases swiftly unfolds within a single clock
cycle, ensuring rapid and efficient instruction execution.
Despite its simplicity, the Single Cycle RISC V data path remains a potent force in the realm of
microprocessors and digital systems. Its uncomplicated design renders it ideal for applications demanding
swift and efficient data path handling.
The data path of a single cycle RISC V processor consists of the following components:
➢ Instruction memory (IM): This component is responsible for storing the instructions that are
fetched by the processor. The instruction memory is typically implemented using SRAM or
DRAM.
➢ Program Counter (PC): This component is responsible for storing the address of the next
instruction to be fetched. The PC is incremented by 4 after each instruction fetch.
➢ Instruction Register (IR): This component is responsible for holding the current instruction that
is being executed by the processor. The instruction register is loaded with the instruction fetched
from the instruction memory.
➢ Register File (RF): This component is responsible for storing the data values that are used by the
processor. The register file typically has 32 general-purpose registers, each of which is 32 bits
wide.
➢ ALU (Arithmetic Logic Unit): This component is responsible for performing arithmetic and logical
operations on the data values stored in the register file. The ALU takes two input values and
produces a single output value.
➢ Data Memory (DM): This component is responsible for storing the data values that are used by
the processor. The data memory is typically implemented using SRAM or DRAM.
➢ Control Unit (CU): This component is responsible for controlling the operation of the processor.
The control unit generates control signals that are used to control the other components of the
data path.
➢ MUX (Multiplexer): It selects between different inputs to provide the required data or control
signals to different components of the data path.
➢ Sign Extend: It extends the sign bit of an immediate value to 32 bits.
➢ Shift Left 1: It shifts the value of a register or immediate value to the left by one bit.
➢ Instruction fetch: The processor fetches the instruction from the instruction memory by reading
the instruction at the address stored in the program counter (PC). The PC is incremented by 4 to
point to the next instruction.
➢ Instruction decode: The processor decodes the instruction by examining the opcode and
operands of the instruction.
➢ Operand fetch: The processor fetches the operands of the instruction from the register file or
the data memory.
➢ Execute: The processor performs the operation specified by the instruction using the ALU.
➢ Memory access: If the instruction requires a memory access, the processor accesses the data
memory to read or write the data.
➢ Write back: The processor writes the result of the operation back to the register file.
➢ Control unit: The control unit generates control signals that are used to control the operation of
the data path components.
➢ PC Update: The processor updates the PC with the address of the next instruction.
This completes one cycle of the single cycle RISC V data path, and the processor proceeds to fetch the
next instruction from the instruction memory.
Note that the single cycle RISC V data path is simple and straightforward, but it has a long cycle time due
to the large number of stages in the data path. This can limit the clock speed of the processor, and hence
the overall performance of the system.
DIAGRAM:
BRANCH EQUAL:
CONCLUSION:
We can conclude that it is a simple and efficient hardware component that is widely used in
microprocessors and other digital systems. Its fixed cycle time can limit the clock speed of the processor,
but it allows for the execution of each instruction in a single clock cycle, ensuring that instructions are
executed quickly and efficiently.
The Single Cycle RISC V data path includes key components such as the Instruction Memory, Program
Counter, Instruction Register, Register File, Arithmetic Logic Unit, Data Memory, and Control Unit. These
components work together to fetch, decode, and execute instructions within a single clock cycle.
INSTRUCTOR: ENGR. SAIRA ARIF
SCHEMATIC
SIMULATION
INSTRUCTOR: ENGR. SAIRA ARIF
In the case of the three stage pipeline, some data hazards can be resolved by forwarding the result of the
Memory-Writeback stage to the Decode-Execute stage which is performed by adding forwarding
multiplexers. Forwarding is used when the destination register in the Memory-Writeback stage matches
either of the source registers in the Decode-Execute stage. This leads to the addition of two forwarding
multiplexers and a forwarding unit which takes the whole instruction in the two pipeline stages as the
inputs and the selection of the two muxes becomes the outputs. This can be illustrated by Figure 8.1.
Forwarding is not sufficient in case of load instructions which can have multi-cycle latency due to which
the results can not be forwarded. The only solution left would be to stall the pipeline until the result has
been written to the register file. When a stage is stalled, all the previous stages must also be stalled in
order to avoid instruction loss. For this purpose, we add the stalling capability to the forwarding to make
it the forward stall unit. This adds the stall signals to all the pipeline registers which has been illustrated
in Figrue 8.2.
INSTRUCTOR: ENGR. SAIRA ARIF
For taken branches as well as jumps the following instruction (which has been fetched) should not be
executed. Rather it should be flushed from the pipeline, while the program counter is updated to the new
address. For this purpose we need to flush the Decode-Execute stage which is done by setting the
instruction pipeline register between the Fetch stage and the Decode-Execute to nop. For this purpose,
we need to modify our forward stall module to add the br_taken flag as its input and the flush signal as
the output. These changes can be observed in Figure 9.3.
Lab 10
Introduction and Interfacing of BASYS3, NEXYS A7 and GPIO testing of Basys-3
Introduction:
The BASYS 3 is one of the best boards on the market for getting started with FPGA. It is an entry-level
development board built around a Xilinx Artix-7 FPGA.
As a complete and ready-to use digital circuit development platform, it includes enough switches, LEDs, and
other I/O devices to allow a large number of designs to be completed without the need for any additional
hardware. There are also enough uncommitted FPGA I/O pins to allow designs to be expanded using Digilent
Pmods or other custom boards and circuits, and all of this at a student-friendlyprice point.
The BASYS 3 is designed exclusively for Xilinx’s Vivado Design Suite, and the WebPACK edition is available as a
free download from Xilinx.
Guides and demos are available to help users get started quickly with the BASYS 3. These can be foundthrough the
Support Materials tab.
https://digilent.com/shop/basys-3-artix-7-fpga-trainer-board-recommended-for-introductory-users/
Figure 1: BASYS 3
There are two methods for assigning package pins which are given as follows:
Click on Sources in the project manager and then select add or create constraints and then
select Create File. The constraints file will be created.
Type the following code as shown in the Listing 1 and then save that file. The pins will beassigned in
the same manner as done in Section above.
Vivado will take a few seconds connect to the FPGA. Once done, click on Program device from there a
dialog box appears and click on Program to program your FPGA (Figure 3). The Verilog code will be
implemented on the FPGA. To check the behavior of the LED, make the truth table of a.b+c and check
for all the possible combinations of inputs.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
--counter.
use IEEE.std_logic_unsigned.all;
entity GPIO_demo is
CLK : in STD_LOGIC;
);
end GPIO_demo;
component UART_TX_CTRL
Port(
SEND : in std_logic;
Lab 11:
Problem Set:
1. Half adder:
Half Adder is a combinational arithmetic circuit that adds two numbers and produces a sum bit (S) and
carry bit (C) as the output. If A and B are the input bits, then sum bit (S) is the X-OR of A and B and the
carry bit (C) will be the AND of A and B. Half adder is the simplest of all adder circuit, but it has a major
disadvantage. The half adder can add only two inputs (A and B) and has nothing to do with the carry if
there is any in the input. Its truth table, module schematic, and its gate level realization are shown in
Figure 1.
➢ You are requested to write Verilog code for the circuit using structural modeling (using
primitives such as AND, OR, etc., gates). Your inputs are two 2-bit numbers.
➢ Synthesize the circuit for the Nexys A7 100T FPGA Board and Simulate it by writing a
testbench module for half adder covering all possible IOs combinations.
➢ Read the synthesis report of your circuit and extract the useful information related to
maximum combinational delay, resources in the FPGA used (like lookup tables (LUTs),
input/output (IOs), etc.), timing information, power usage, etc. Report this information in
your lab report.
utput:
Constraints:
When we save the constraints in the above Section, we are indirectly creating a constraint file. But we
can directly create that file using the following method:
Click on Sources in the project manager and then select add or create constraints and then
select Create File. The constraints file will be created.
Type the following code and then save that file. The pins will beassigned
//======= LEDS ========//
set_property PACKAGE_PIN R2 [get_ports a]