ASIC Project ckpt4
ASIC Project ckpt4
Contents
1 Introduction 2
10 Grading 22
Version 3.3 April 30, 2018 2
1 Introduction
The primary goal of this project is to familiarize students with the methods and tools of digital design.
In order to make the project both interesting and useful, we will guide you through the implementation
of a CPU that is intended to be integrated on a modern SOC. Working alone or in teams of 2, you will
be designing a simple 3-stage CPU that implements the RISC-V ISA, developed here at UC Berkeley.
If you work in a team, you both must work on the project together (i.e. you are not allowed to divide up
the work), and you will both receive the same grade.
Your first and most important goal is to write a functional implementation of your processor. To
better expose you to real design decisions, you will also be tasked with improving the performance of
your processor. You will be required to meet a minimum performance to be specified later in the project.
You will use Verilog HDL to implement this system. You will be provided with some testbenches
to verify your design, but you will be responsible for creating additional testbenches to exercise your
entire design. Your target implementation technology will be the Synopsys 28nm Educational Design
Kit, a predictive model technology used for instruction. The project will give you experience designing
synthesizeable RTL (Register Transfer Level) code, resolving hazards in a simple pipeline, building
interfaces, and approaching system-level optimization.
Your first step will be to map our high level specification to a design which can be translated into
a hardware implementation. You will then generate and debug that implementation in Verilog. These
steps may take significant time if you do not put effort into your system architecture before attempting
implementation. After you have built a working design, you will be optimizing it for speed in the 28nm
technology that we have been using this semester.
1.1 RISC-V
The final project for this class will be a VLSI implementation of a RISC-V (pronounced risk-five) CPU.
RISC-V is a new instruction set architecture (ISA) developed here at UC Berkeley. It was originally
developed for computer architecture research and education purposes, but recently there has been a
push towards commercialization and industry adoption. For the purposes of this lab, you don’t need to
delve too deeply into the details of RISC-V. However, it may be good to familiarize yourself with it, as
this will be at the core of your final project. Check out the official Instruction Set Manual and explore
http://riscv.org for more information.
• Read through sections 2.2 and 2.3 starting on page 11 in the RISC-V Instruction Set Manual to
understand how the different types of instructions are encoded. Most of this should be familiar as
it is similar to MIPS.
• Read through sections 2.4, 2.5, 2.6 and 2.8 starting on page 13 in the Instruction Set Manual and
think about how each of the instructions will use the ALU.
You do not need to read 2.7, as you will not be implementing those instructions in the project.
of your processor that is independent of technology (there are no standard cells yet). You have 5 weeks
to complete the first phase, but you are highly encouraged to try to finish early. Everything will take
much longer than you expect, and finishing early gives you more time to improve your QOR (Quality
of Results, e.g. clock period).
In the second phase (back-end), you will implement your front-end design in the Synopsys 28nm
kit using the VLSI tools you used in lab. When you have finished phase 2, you will have a design that
could actually be fabricated if this were a real process. You will have another 2 weeks to complete the
second phase.
Within each phase, you will have multiple checkpoints (nominally one per week) that will ensure
you are making consistent progress. These checkpoints will contribute (although not significantly) to
your final grade. You are free to make design changes after they have been checked off if they will help
subsequent phases or improve QOR.
1.3 Philosophy
This document is meant to describe a high-level specification for the project and its associated support
hardware. You can also use it to help lay out a plan for completing the project. As with any design
you will encounter in the professional world, we are merely providing a framework within which your
project must fit.
You should consider the GSI(s) a source of direction and clarification, but it is up to you to produce a
fully functional design, as well as a physical implementation. I/We will attempt to help, when possible,
but ultimately the burden of designing and debugging your solution lies on you.
Design your modules well, make sure you understand what you want before you begin to code.
Code exactly what you designed; do not try to add features without redesigning.
Simulate thoroughly; writing a good testbench is as much a part of creating a module as actually
coding it.
Debug completely; anything which can go wrong with your implementation will.
Document your project thoroughly as you go. Your design review documents will help, but you
should never forget to comment your Verilog and to keep your diagrams up to date. Aside from the final
project report (you will need to turn in a report documenting your project), you can use your design
documents to help the debugging process. Finish the required features first. Attempt extra features after
everything works well.
Version 3.3 April 30, 2018 4
This project is divided into checkpoints. Each checkpoint will be due 2 weeks after its release,
and the releases will occur each week. Use this to your advantage- try to get ahead so that you have
additional time to debug.
The most important goal is to design a functional processor- this alone is 50-60% of the final grade,
and you must have it working completely to receive any credit for performance.
• Checkpoint 1: ALU design and pipeline diagram (due Friday, March 23, 2018)
• Checkpoint 3: Core + memory system implementation (due Wednesday, April 25, 2018)
Before you start your project, you must post your group information as a private note on Piazza.
Please provide each group member’s name, student ID number, and instructional account name for all
group members (e.g. eecs151-aa). Please do this even if you are working alone, as these git repos will
be used for part of the final checkoff. Once it is setup you will be given a team number, and you will
be given a repo hosted on the servers for version control for the project. You should be able to add the
remote host of “geecs151:teamXX” where “XX” is the team number that you are assigned. An example
working flow to be able to pull from the skeleton as well as push/pull with your team repository is
shown below:
Then to pull changes from the skeleton, you would need to type:
Next, push the template into your team repository you would type:
Now your team repository should be set. You can now use this remote repository to maintain your
work during the project. Please contact your GSI if you run into any difficulties.
Version 3.3 April 30, 2018 5
31 27 26 25 24 20 19 15 14 12 11 7 6 0
funct7 rs2 rs1 funct3 rd opcode R-type
imm[11:0] rs1 funct3 rd opcode I-type
imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S-type
imm[12|10:5] rs2 rs1 funct3 imm[4:1|11] opcode SB-type
imm[31:12] rd opcode U-type
imm[20|10:1|11|19:12] rd opcode UJ-type
For these two modules, the inputs and outputs that you care about are opcode, funct, add_rshift_type,
A, B and Out. To test your design thoroughly, you should work through every possible opcode,
funct, and add_rshift_type that you care about, and verify that the correct Out is generated
from the A and B that you pass in.
The test bench generates random values for A and B and computes REFout = A + B. It also
contains calls to checkOutput for load and store instructions, for which the ALU should perform
addition. It will be up to you to write tests for the remaining combinations of opcode, funct, and
add_rshift_type to test your other instructions.
Remember to restrict A and B to reasonable values (e.g. masking them, or making sure that they are
not zero) if necessary to guarantee that a function is sufficiently tested. Please also write tests where
the inputs and the output are hard-coded. These should be corner cases that you want to be certain are
stressed during testing.
[106:100] = opcode
[99:97] = funct
[96] = add_rshift_type
[95:64] = A
[63:32] = B
[31:0] = REFout
Open up the skeleton provided to you in ALUTestVectorTestbench.v. You need to complete the
module by making use of $readmemb to read in the test vector file (named testvectors.input),
writing some assign statements to assign the parts of the test vectors to registers, and writing a for loop
to iterate over the test vectors.
The syntax for a for loop can be found in ALUTestbench.v. $readmemb takes as its arguments
a filename and a reg vector, e.g.:
Test vectors are of the format specified above, with the 7 opcode bits occupying the left-most
bits. Open up the file vcs-sim-rtl/testvectors.input and add test vectors for the following
instructions to the end (i.e. manually type the 107 zeros and ones required for each test vector): SLT,
SLTU, SRA, and SRL.
In the same directory, we’ve also provided a test vector generator written in Python, which is a
popular language used for scripting. We used this generator to generate the test vectors provided to you.
If you’re curious, you can read the next paragraph and poke around in the file. If not, feel free to skip
ahead to the next section.
The script ALUTestGen.py is located in vcs-sim-rtl. Run it so that it generates a test vector
file in the vcs-sim-rtl folder. Keep in mind that this script makes a couple assumptions that aren’t
necessary and may differ from your implementation:
• Jump, branch, load and store instructions will use the ALU to compute the target address.
• For all shift instructions, A is shifted by B. In other words, B is the shift amount.
• For the LUI instruction, the value to load into the register is fed in through the B input.
You can either match these assumptions or modify the script to fit with your implementation. All the
methods to generate test vectors are located in the two Python dictionaries opcodes and functs.
The lambda methods contained (separated by commas) are respectively: the function that the operation
should perform, a function to restrict the A input to a particular range, and a function to restrict the B
input to a particular range.
If you modify the Python script, run the generator to make new test vectors. This will overwrite
the file, so if you want to save your handwritten test vectors, rename the file before running the script,
then append them once the file has been generated.
% python ALUTestGen.py
This will write the test vector into the file testvectors.input. Use this file as the target test vector
file when loading the test vectors with $readmemb.
always@(*) begin
case(foo)
3'b000: // something happens here
3'b001: // something else happens here
3'b010, 3'b011: // you can have more than
// one case do the same thing
default: // everything else
endcase
end
To make your job easier, we have provided two Verilog header files: Opcode.vh and ALUop.vh.
They provide, respectively, macros for the opcodes and functs in the ISA and macros for the different
ALU operations. You should feel free to change ALUop.vh to optimize the ALUop encoding, but if
you change Opcode.vh, you will break the test bench skeleton provided to you. You can use these
macros by placing a backtick in front of the macro name, e.g.:
case(opcode)
`OPC_STORE:
case(opcode)
7'b0100011:
alu_tb = ALUTestbench
This variable is used to select which ALU testbench you use. You may change it to ALUTestVec-
torTestbench to use the test vector testbench.
Once you have a working design, you should see the following output when you run either of the
given testbenches:
1. List of the modules involved in the test bench. You can select one of these to have its signals
show up in the object window.
Version 3.3 April 30, 2018 11
2. Object window - this lists all the wires and regs in your module. You can add signals to the
waveform view by selecting them, right-clicking, and doing Add ¿ To Wave ¿ Selected Signals.
3. Waveform viewer - The signals that you add from the object window show up here. You can
navigate the waves by searching for specific values, or going forward or backward one transition
at a time.
As an example of how to use the waveform viewer, suppose you get the following output when you run
ALUTestbench:
The $display() statement actually already tells you everything you need to know to fix your bug,
but you’ll find that this is not always the case. For example, if you have an FSM and you need to look
at multiple time steps, the waveform viewer presents the data in a much neater format. If your design
had more than one clock domain, it would also be nearly impossible to tell what was going on with only
$display() statements.
Add all the signals from ALUTestbench to the waveform viewer and you see the following win-
dow: The two highlighted boxes contain the tools for navigation and zoom. You can hover over the
icons to find out more about what each of them do. You can find the location (time) in the waveform
viewer where the test bench failed by searching for the value of DUTout output by the $display()
statement above (in this case, 0x490a9a92:
1. Selecting DUTout
2. Clicking Edit > Wave Signal Search > Search for Signal Value > 0x490a9a92
Now you can examine all the other signal values at this time. Compare the DUTout and REFout
values at this time, and you should see that they are similar but not quite the same. From the opcode,
funct, and add_rshift_type, you know that this is supposed to be an SRA instruction, but it
looks like your ALU performed a SRL instead. However, you wrote
That looks like it should work, but it doesn’t! It turns out you need to tell Verilog to treat B as a signed
number for SRA to work as you wish. You change the line to say:
After making this change, you run the tests again and cross your fingers. Hopefully, you will see the
line:
If not, you will need to debug your module until all test from the test vector file and the hard-coded test
cases pass.
Version 3.3 April 30, 2018 12
4.2 Details
Your job is to implement the core of the 3-stage RISC-V CPU.
Version 3.3 April 30, 2018 13
cd vcs-sim-rtl
make run-asm-tests
This will generate .out files in the output/ directory, and summarize which tests passed and failed.
If you would like to generate waveforms for a single test:
cd vcs-sim-rtl
make output/rv32ui-p-simple.vpd
, where ’simple’ gets replaced with any of the available tests defined in the Makefile.
You can read the assembly code of the programs by looking at the dump file. Comments in the code
will help you understand what is happening.
cd tests/isa/
vim rv32ui-p-addi.dump
Last, you can see the hex code that is loaded directly into the memory by looking at the hex file.
cd tests/isa/
vim rv32ui-p-addi.hex
Version 3.3 April 30, 2018 14
Congratulations! You’ve started the design of your datapath by implementing your pipeline dia-
gram, and written and thoroughly tested a key component in your processor and should now be well-
versed in testing Verilog modules. Please answer the following questions to be checked off by a TA.
A address
CE clock edge
OEB output enable bar (tie this to 0)
WEB write enable bar (1 is a read, 0 is a write)
CSB chip select bar (tie this to 0)
BYTEMASK write byte mask
I write data
O read data
You should use cache lines that are 512 bits (16 words) for this project. The memory interface is
128 bits, meaning that you will require multiple (4) cycles to perform memory transactions.
Below find a description of each signal in Cache.v:
clk clock
reset reset
cpu req valid The CPU is requesting a memory transaction
cpu req rdy The cache is ready for a CPU memory transaction
cpu req addr The address of the CPU memory transaction
cpu req data The write data for a CPU memory write (ignored on reads)
cpu req write The 4-bit write mask for a CPU memory transaction (each bit corresponds to the
byte address within the word). 4’b0000 indicates a read.
cpu resp val The cache has output valid data to the CPU after a memory read
cpu resp data The data requested by the CPU
mem req val The cache is requesting a memory transaction to main memory
mem req rdy Main memory is ready for the cache to provide a memory address
mem req addr The address of the main memory transaction from the cache. Note that this address
is narrower than the CPU byte address since main memory has wider data.
mem req rw 1 if the main memory transaction is a write; 0 for a read.
mem req data valid The cache is providing write data to main memory.
mem req data ready Main memory is ready for the cache to provide write data.
mem req data bits Data to write to main memory from the cache (128 bits/4 words).
mem req data mask Byte-level write mask to main memory. May be 16’hFFFF for a full write.
mem resp val The main memory response data is valid.
mem resp data Main memory response data to the cache (128 bits/4 words).
To design your cache, start by outlining where the SRAMs should go. You should include an SRAM
per way for data, and a separate SRAM per way for the tags. Depending on your implementation, you
may want to implement the valid bits in flip flops or as part of the tag SRAM.
Next you should develop a state machine that covers all the events that your cache needs to handle
for both hits and misses. Keep in mind you will need to write any valid data back to main memory
before you start refilling the cache. Both of these transactions will take multiple cycles.
Resulting cache is instantiated in Memory141.v. Take a look at how it interacts with the core you
designed. To access the instruction cache, you must provide icache addr (the byte address of the in-
struction) and icache re (the read enable signal of the cache). The cache will return icache dout
(the 32 bit instruction from the memory). If there is a cache miss, stall will go high, indicating that
the memory request from the cache failed, and the pipeline should not advance to the next state. Note
that the memory is a synchronous read: after the clock edge, the data from the address provided right
before the clock edge will be provided. This means that there should not be a pipeline stage right before
Version 3.3 April 30, 2018 16
1. Show that all of the assembly tests and final pass using the cache
2. Show the block diagram of your cache
3. What was the difference in the cycle count for the final test with the perfect memory and the
cache?
4. Show your final pipeline diagram, updated to match the code
dc scripts folder contains scripts for the tool, and the setup folder contains other files shared
amongst the other tools.
For this checkpoint you will not have to modify any of these files; you will simply need to run the
following commands:
cd dc-syn
make
Be sure to look at the file dc-syn/current dc/log/dc.log, since this is a log of the run
and will contain useful information. For later, the clock period is defined in the Makefrag file in the
base directory, but do not edit it for this checkpoint since all it will do is slow down your runs.
`ifdef GATE_LEVEL
assign x = top.some_gate_level_node;
`else
assign x = top.some.rtl.node;
`endif
If you do this, be sure to pass the GATE LEVEL macro into VCS using +define+GATE LEVEL in
your vcs-sim-gl-syn/Makefile. To run the tests, use the following commands:
cd vcs-sim-gl-syn
make run-asm-tests
This will run the same assembly tests as before, but on the post synthesis netlist. For this simula-
tion, we have turned off timing, so this is just checking to make sure that your design is functionally
equivalent. If there are any errors in the synthesis process these tests will fail, so be sure to check the
output logs and results from Design Compiler before trying to run these simulations. Otherwise, you
can debug the same way as before, with the print statements and the waveforms. Please be sure that you
update your print statements since the names of the signals most likely will have changed.
1. Show that all of the assembly tests pass after running the design through synthesis.
vcs_clock_period = 1.6
dc_clock_period = 1.2
icc_clock_period = 1.5
Changing the first three variables enables different frequency targets for synthsis, place-and-route,
and simulation. Beyond changing clock targets for DC and ICC, there are many other ways to improve
your maximum clock frequency. One major way to improve clock frequency is to improve the design
floorplan.
8.2 Floorplanning
If you look at the floorplan/floorplan.tcl file, you can change how the floorplan is created.
Right now, the current text is:
Version 3.3 April 30, 2018 19
create_floorplan \
-core_utilization 0.1 \
-flip_first_row \
-start_first_row \
-left_io2core 10 \
-bottom_io2core 10 \
-right_io2core 10 \
-top_io2core 10 \
-row_core_ratio 1
This uses an automatic floorplan just based on the core utilization. You can change the utilization
and it will change the size of the floorplan, or you can look at the documentation and find how to
set it based on other parameters. The utilization target is important, so please experiment. With too
high a utilization, the tool will be unable to route every wire successfully. With too low a utilization,
the standard cells will be spaced too far apart and unnecessary wiring will decrease your maximum
operating frequency. A utilization of 0.7 is a realistic target.
You can also change the placement of the SRAM macros. They are currently placed automatically,
but using the commands that we discussed in Lab 7, you should be able to specify where the SRAM
macros are placed and in what orientation.
To run through ICC, you simply need to issue the following commands:
cd icc-par
make
Be sure to run through design compiler in the dc-syn folder before doing this.
cd vcs-sim-gl-par
make run-asm-tests
This will run the same assembly tests as before, but on the post place and route netlist. As before,
this will fail if Synthesis or Place-and-Route have failed, so always check your logs before trying to run
a simulation.
cd vcs-sim-gl-par
make run-bmarks
These tests are included in the post place and route testing folder as well as the RTL folder in case
there are extra corner cases that your verilog may not handle properly even before synthesis. It is highly
recommended that you run the tests in the vcs-sim-rtl folder before trying to do so after running
the tools.
Version 3.3 April 30, 2018 21
• src/*.v
• icc-par/current-icc/reports/*
• icc-par/current-icc/results/top.output.v
• icc-par/current-icc/results/top.output.sdf
These files will be used to check processor functionality and will show us your critical path, maxi-
mum operating frequency and area. During the final lab sessions (Friday, May 4, 2018), the professor
and GSIs will be interviewing each team to gauge understanding of various concepts learned in the
project, understand more about each team’s design process, and provide feedback. Your final report
does not need to be long, but needs to answer the following questions:
2. What is the post-synthesis critical path length? What sections of the processor does the critical
path pass through? Why is this the critical path?
4. What is the post-place-and-route critical path length? What sections of the processor does the
critical path pass through? Why is this the critical path? If it is different than the post-synthesis
critical path, why?
5. Show a screenshot of the final clock tree. What is the insertion delay? What is the skew?
7. What is the number of cycles that your design takes to run the benchmarks? What changes/optimizations
have you done to try and optimize for these tests?
8. Is there anything you would like to tell the staff before we grade your project?
If you worked with a partner you do not need separate reports. If you are having issues with your
partner please contact the GSI privately as soon as possible.
Version 3.3 April 30, 2018 22
10 Grading
70% Functionality at project due date: Your design will be subjected to a comprehensive test suite and
your score will reflect how many of the tests your implementation passes.
25% Final Report and Final Interview: If your design is not 100% functional, this is your opportunity
explain your bugs and recoup points.
5% Checkpoints: Each check-off is worth 1.25%. If you accomplished all of your checkpoints on time,
you will receive full credit in this category.
Bonus 5% Performance at project due date: You must have a fully working design to score points in
this section. You will receive up to 5 bonus points as your performance improves relative to your
peers. Performance will be calculated using the Iron Law: IPC * F