Design A 5-Stage Pipeline RISC-V CPU and Optimise
Design A 5-Stage Pipeline RISC-V CPU and Optimise
Design A 5-Stage Pipeline RISC-V CPU and Optimise
DOI: 10.54254/2755-2721/34/20230334
Lifu Deng
Glasgow College, University of Electronic Science and Technology of China, 611731
Chengdu, China
Abstract. The RISC-V instruction set has advanced and expanded significantly in recent years.
It is an open instruction set architecture (ISA) based on the concept of Reduced Instruction Set
Computing (RISC). This article uses Verilog to design a 5-stage pipeline CPU based on RISC-
V architecture in Vivado 2022.2. The CPU can execute 38 instructions and optimises its
arithmetic logic unit (ALU) by optimising adders, shifters, and multipliers. Next, write a
testbench in the simulation software to verify the functionality of the CPU. RTL diagrams and
reports are then generated to verify the design structure and evaluate resource allocation. Finally,
the CPU successfully executes the instruction and obtains the correct operation result, and the
occupation of LUT resources in the shifter part is reduced. This work serves as an important
reference for system-on-chip (SoC) and computer design in general. It not only highlights the
potential of the RISC-V architecture but also demonstrates the success of optimisation efforts.
This paves the way for more powerful and efficient computing systems.
1. Introduction
RISC-V is an instruction set architecture (ISA) that is open and free. It has generated a lot of interest in
the field of computer science due to the fact that it has the potential to completely transform the way
that processors and systems are designed [1]. RISC-V has a wide range of applications thanks to its
versatility and flexibility. These applications include low-power embedded devices, high-performance
computation, computer security, and machine learning [1-3]. It is likely that RISC-V's influence on the
global chip industry, as well as the wider technological environment, will increase as it continues to
flourish [3]. This heralds a future that is both innovative and exciting.
This paper uses Verilog hardware description language to design a five-stage pipeline CPU. Using
the Vivado 2022.2 simulation software, the CPU is designed to support RV32I 37 basic instructions and
a multiplication extension instruction. The ALU module is optimised with carry look-ahead adders,
barrel shifters and multipliers combined with the Wallace tree booth algorithm. Create a test file to run
a mock test, then check the waveform it generates to see if it executes the instructions correctly. In
addition, synthesise generates RTL diagrams, judges whether the design structure requirements are met,
and analyses resource occupancy according to the synthesised report. The CPU has guiding significance
for the processor design of embedded and Internet of Things devices and also provides a new design
idea for the optimization of the pipeline and ALU in the processor.
© 2023 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
237
Proceedings of the 2023 International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/34/20230334
I-Type addi, slti, sltiu, ori, xori, Performs arithmetic and logical operations on a register
andi, slli, srli, srai (rs1) and an immediate value and stores the result in rd.
U-Type lui, auipc Loads a value into the highest 20 bits of a register (lui)
or adds a value to the PC (auipc).
238
Proceedings of the 2023 International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/34/20230334
Table 1. (continued).
beq, bne, blt, bltu, bge, Performs conditional branches based on the comparison
bgeu of two registers with an immediate value.
S-Type Loads or stores register data into memory. Two
sw, sh, sb lw, lh, lb, lhu, registers and an instantaneous value calculate the
lbu effective address.
There are six different types of instruction. Then use Verilog to develop and design a single-cycle
CPU that can execute the above instructions in Vivado 2022.2 [4, 5]. The whole CPU structure was
designed as shown in Figure 1.
Control Unit
Data Path
Instruction Memory Data Memory
Decoder ALU Registers Branch
ROM RAM
239
Proceedings of the 2023 International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/34/20230334
𝐶16
𝑥1 𝑥1 𝑥0 𝑥0
𝐶32
32bit BCLA
A barrel shifter that can rotate or shift a data word by any number of bits in a single clock cycle. It
connects the input data word directly to the correct position of the output without cascading shifts over
multiple clock cycles. This increases the speed of the shifter and reduces latency [9]. The barrel shifter
can also realise different shift or rotation functions according to the control signal, such as logical left
240
Proceedings of the 2023 International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/34/20230334
shift, arithmetic right shift, etc., which can meet the needs of the CPU and reduce time delay. The
integration of multiple shift functions also makes it highly versatile.
The multiplier is jointly optimised through the booth algorithm combined with the Wallace tree and
the CLA designed earlier to reduce the time delay. The Booth algorithm is a multiplication algorithm
using two's complement, which reduces the number of partial products, thereby reducing the complexity
and delay of the multiplier [10]. Wallace tree: A multiplier structure that uses a compressor, Wallace
trees are able to split the addition of partial products into numerous levels and utilise compressors at
each level to concurrently lower the number of bits in each partial product. Wallace trees are an efficient
way to decrease the multiplier's size and power requirements while increasing its speed. Figure 3
illustrates the construction of the whole Wallace tree structure utilising multiple 3-2 compressors and a
4-2 compressor reduction.
4. Simulation verification
Ripes is an open-source RISC-V CPU simulation software, which can convert the written assembly
language into the corresponding machine code and can run instructions and view the data in registers or
memory in real time. Vivado is a well-known development and simulation test tool that plays a key role
in the CPU design process. Its powerful capabilities include compilation, synthesis, and implementation
of hardware Verilog descriptions; during synthesis, Vivado efficiently configures various logic
resources, including look-up tables, flip-flops, and RAM, and strategically arranges them within the
FPGA to create well-structured circuits. This complex arrangement ensures the seamless integration of
different functions and enables our complex CPU design. First, use the open-source tool Ripes to convert
the selected assembly instructions into the machine code required for the test, and then write the test file
in Vivado to generate the waveform diagram and synthesise the circuit and report.
241
Proceedings of the 2023 International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/34/20230334
As can be seen from Figure 5, the adder and shifter can work normally and write the correct value
into the corresponding register. Finally, verify the multiplication, and multiply the value in R1 and R2
to get the correct result 0xfffffc04.
From the project report in Vivado, the data in Table. 2 can be accessed. It can be noticed that the
barrel shifter saves the occupation of LUT resources due to its structural design. The abbreviation "LUT"
stands for "Look-Up Table." A Look-Up Table is a data structure or a hardware component that, in the
context of computer science and digital logic, is used to execute logic functions or mathematical
242
Proceedings of the 2023 International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/34/20230334
operations in a way that is computationally efficient. This can be accomplished in a number of different
ways. Lt is an essential resource in FPGA.
Table 2. Resource usage comparison.
Normal shifter Barrel shifter
LUT 204 185
IO 71 71
5. Conclusion
The designed pipeline CPU structure meets expectations. it executes instructions accurately and
produces expected results after ALU optimisation. The waveforms clearly correspond to register values
in Ripes, confirming proper CPU functionality. It successfully performs basic operations such as
addition, shift, and multiplication, demonstrating its versatility and computational power. Moreover, the
perfect execution cycle of the five stages of an instruction confirms the successful implementation of
the five-stage pipeline, which greatly enhances the performance and throughput of the CPU. A
comprehensive report analysis shows that the optimised barrel shifter effectively utilises 19 LUTs
compared to conventional shifters, which represents a significant saving of circuit resources.
Looking ahead, there are many opportunities for further improvement and expansion. Additional
designs can be incorporated into decode and control blocks, enabling the CPU to support a wider range
of instructions, enhancing its versatility and capabilities. Exploring innovative approaches, such as
combining shifters and adders, can further optimise multiplier structures, helping to improve circuit
resource efficiency. The successful development of this pipelined CPU lays a solid foundation for future
endeavours. Building on this achievement, it is also possible to create a complete SoC based on this
CPU to meet the needs of various computer systems and microprocessors. This holistic approach
promises to deliver enhanced performance and simplified integration, making CPU a compelling choice
for a wide variety of computing applications.
References
[1] Lu T. 2021. A Survey on RISC-V Security: Hardware and Architecture. arXiv.org. [Online].
Available at: https://arxiv.org/abs/2107.04175v1.
[2] Cui E, Li T, Wei Q. 2023. RISC-V Instruction Set Architecture Extensions: A Survey. IEEE
Access. 11: 24696-24711.
[3] Sharma M, Bhatnagar E, Puri K, Mitra A, Agarwal J. 2022. A survey of RISC-V CPU for IoT
applications. Available at SSRN 4033491
[4] Waterman A, Lee Y, Avizienis R, Patterson DA, Asanovic K. 2015. The RISC-V instruction set
manual volume II: Privileged architecture version 1.7. EECS Department, University of
California, Berkeley, Tech. Rep. UCB/EECS-2015-49.
[5] Dennis DK, Priyam A, Virk SS, Agrawal S, Sharma T, Mondal A and Ray KC. 2017, December.
Single cycle RISC-V microarchitecture processor and its FPGA prototype. In: Proceedings of
the 2017 7th International Symposium on Embedded Computing and System Design (ISED),
Durgapur, India, pp. 1-5.
[6] Avinash NJ, Mishra I, Ghorpade TC, Lokesh M, Nishanth H, Kumar M. 2023, January. Design
and Implementation of 32-Bit MIPS RISC Processor with Flexible 5-Stage Pipelining and
Dynamic Thermal Control. In: Proceedings of the 2023 International Conference on Intelligent
and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), Bengaluru,
India, pp. 846-849.
[7] Lai J-Y, Chen C-A, Chen S-L, Su C-Y. 2021. Implement 32-bit RISC-V Architecture Processor
using Verilog HDL in: Proceedings of the 2021 International Symposium on Intelligent Signal
Processing and Communication Systems (ISPACS), Hualien City, Taiwan, pp. 1-2.
243
Proceedings of the 2023 International Conference on Machine Learning and Automation
DOI: 10.54254/2755-2721/34/20230334
[8] Miao J, Li S. 2017. A novel implementation of a 4-bit carry look-ahead adder in Proceedings of
the 2017 International Conference on Electron Devices and Solid-State Circuits (EDSSC),
Hsinchu, Taiwan, pp. 1-2.
[9] Tiwari P, Kumar A. 2021. A 4-bit Barrel Shifter Design using Diverse Logic Styles. In:
Proceedings of the 2021 First International Conference on Advances in Computing and Future
Communication Technologies (ICACFCT), Meerut, India, pp. 131-134.
[10] Govekar D, Amonkar A. 2017. Design and implementation of high-speed modified Booth
multiplier using hybrid adder. In: Proceedings of the 2017 International Conference on
Computing Methodologies and Communication (ICCMC), Erode, India, pp. 138-143.
244