0% found this document useful (0 votes)
70 views8 pages

Design and Implementation of Synthesizable 32-Bit Four Stage Pipelined RISC Processor in FPGA Using Verilog/VHDL

The document describes the design and implementation of a 32-bit four stage pipelined RISC processor in an FPGA using Verilog and VHDL. The processor uses a Harvard architecture with separate 32-bit instruction and data memories. It has a 32-bit datapath unit and control unit. The processor is divided into fetch, decode, execute, and writeback stages to implement the four stage pipeline. The RISC processor was tested successfully in both simulation and by implementing it on a Spartan 3E FPGA board.

Uploaded by

pawan verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views8 pages

Design and Implementation of Synthesizable 32-Bit Four Stage Pipelined RISC Processor in FPGA Using Verilog/VHDL

The document describes the design and implementation of a 32-bit four stage pipelined RISC processor in an FPGA using Verilog and VHDL. The processor uses a Harvard architecture with separate 32-bit instruction and data memories. It has a 32-bit datapath unit and control unit. The processor is divided into fetch, decode, execute, and writeback stages to implement the four stage pipeline. The RISC processor was tested successfully in both simulation and by implementing it on a Spartan 3E FPGA board.

Uploaded by

pawan verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Nepal Journal of Science and Technology Vol. 15, No.

1 (2014) 81-88

Design and Implementation of Synthesizable 32-bit Four Stage


Pipelined RISC Processor in FPGA Using Verilog/VHDL

Bikash Poduel1, Prasanna Kansakar1, Sujit R.Chhetri1 and Shashidhar Ram Joshi2
1
Department of Electronics and Communication, Thapathali Campus, Institute of Engineering,
Kathmandu Nepal
2
Department of Electronics and Communication, Institute of Engineering,
Pulchowk Campus, Lalitpur
e- mail: [email protected]

Abstract
This paper is delineating the design and implementation of high performance, synthesizable 32-bit pipelined Reduced
Instruction Set Computer (RISC) Core. The design of the Harvard Architecture based 32-bit RISC Core involves
design of 32-bit Data-path Unit, Control Unit, 32-bit Instruction Memory, 32-bit Data Memory, Register file with
each register of size 32 bit. The processor is divided into Fetch, Decode, Execute and Write Back block in order to
implement a four-stage pipeline. A 2*16 LCD is connected to the processor IO block to show the instruction
execution sequence for demonstration in FPGA. The RISC Core is designed using Verilog HDL and VHDL and is
tested in ISIM Simulator. The implementation of the processor is done in a Spartan 3E Starter Board using Xilinx ISE
14.7. All of the instructions incorporated with the processor have been tested successfully both in simulation and
hardware implementation in FPGA.

Key words: SoC, Harvard architecture, Xilinx ISE, IP Core, Register file.

Introduction enable design engineers to implement an impressive


The Reduced Instruction Set Computer (RISC) is a number of components like microprocessor, memories,
design philosophy used in powerful microprocessors and interfaces in a single microchip called
and micro-controllers (Stallings 2011). This design Configurable System-on-Chip (Kim & Leibson 2005).
philosophy involves fixed length instruction, single With the advent of large, fast, cheap FPGAs, it is
clock cycle execution for most of the instructions, only practical and cost effective to skip the ASIC (Xilinx
load and store instruction for memory access, pipelined 2005) and ship volume embedded system in a single
execution, and a large register bank for fast memory FPGA since FPGA implements all of the system logic
operation. The most common RISC microprocessors including a processor core.
are ARM, DEC Alpha, PA-RISC, SPARC, MIPS, and
IBM’s Power-PC. This project has a very promising future in the context
of the synthesizable embedded system design,
This project involves designing a high performance verification, and implementation. There is not a single
32-bit synthesizable RISC core in Verilog-HDL and company in Nepal till this date which works in
VHDL. The implementation of the core is done in designing, manufacturing, and verifying Hi-Tech
Spartan 3E FPGA. FPGA (Kilts & Steve 1978) is the electronic product as microcontrollers, simple low end
short hand for Field Programmable Gate Array, which phone set to high end phones like smart phones,
is capable of implementing combinational, sequential, processor cores, etc. Every year thousands of
and FSM based digital systems or components. electronics Engineers gets B.E. degree from various
Today’s deep submicron fabrication technologies Universities here in Nepal but there are no electronic

81
Nepal Journal of Science and Technology Vol. 15, No.1 (2014) 81-88

jobs. This Hi-Tech business has a huge upside the instruction to be fetched is loaded in the Program
potential technically and economically thus creating Counter (PC). The address is also saved in the pipelined
myriad of job opportunities. This paper is inception register for internal purpose. The particular instruction
from our team to start writing and contributing IP Cores, pointed by PC is fetched from the Instruction Memory
which in our case, involves designing, and verifying into the Instruction Register (IR). The second stage is
processor core. This paper, we believe, shall be the the Instruction Decode stage, where the Instruction
reference for students who wanted to learn designing Decoder decodes the fetched instruction. Here, the
and verifying processor core, for teachers who want decoder extracts out the source operands, destination
to teach a simple real world implementation of operand, and the operation to be performed, from the
processor core in FPGA, and for all of those who available 32-bit instruction. The next, is the Execution
wanted to be a contributor of IP Cores. This project stage where the operation on the source operands is
shall emanates a good insight of a real deal involve in performed and the result is generated in the internal
Hi-Tech work which is designing and verifying Cores pipeline register. The corresponding flags (Negative,
as Processor, Bus, Peripherals, etc. thus bringing the Carry, Zero and Overflow) are set according to the
horizon closer and clearer which was hazy since result of the arithmetic and the logical operations
nobody has done it before here. performed. For the load and store instruction, the
memory location where the data is to be loaded or
Methodology fetched from is calculated. The final stage is the Write
Basic building block of RISC core Back or Memory Read/Write stage. For the arithmetic
The HDL design of the RISC core involves the design and logical operation, the result of the ALU operation
of the following components, which are the major is written to the destination register and for the load/
building blocks of most of the processor: memory unit, store, instruction reading from the data memory or
data path unit, and control unit. writing to the data memory is done.

Control unit: The pipeline is controlled by setting


Memory:The pre-dominant feature of a RISC Core is control values during each pipeline stage. Each control
the use of registers from a register file, which is a fast signal is active only during a single pipeline stage and
memory, and immediate values for all arithmetic and hence the control lines can be divided according to
logical instructions. This leads to less frequent the four-pipelined stages. These signals will be
accesses of the main memory. Since the RISC core is forwarded to the adjacent stage through the pipeline
based on the Harvard Architecture (Iannucci 1988), registers.
there is separate data and instruction memory. Memory
access is limited to Load and Store instructions only. Instruction pipeline design: The process of breaking
an instruction (process) into a number of independent
There are sixteen 32-bit registers in the register file. sub-processes, which are capable of executing
The instruction memory is implemented as a single concurrently and executing such sub-processes
port on-chip distributed ROM while the data memory simultaneously, is known as pipelining. In order to be
is designed as a single port on-chip block RAM inside successful commercially, any processor design must
the FPGA. The data memory and instruction memory have a fast system clock and must be able to execute,
is 32 * 256 bits, which can be extended as per on average, one or more instruction per clock cycle. In
requirement. Since the core being designed is order to achieve this requirement, pipelining is
synthesizable, we can adjust the code to change the essential.
attributes of the core.
The instruction pipeline is designed by pipelining the
Data path unit: The data path unit comprises of ALU, basic fetch-execute operation sequence shown in Fig
multiplexers, Instruction Decoder, Branch Logic 1. It shows, in particular, the four stage pipeline
Generator, Pipeline Registers, destination register architecture of the RISC core, which is implemented in
selection logic, Program Counter, Link Register, Data the proposed core. The interrupt path has been omitted
Register, etc. The pipeline progresses through four to simplify the pipeline design. Table 1 and Table 2
stages when executing an instruction. The first stage demonstrate the performance improvement that can
is the Instruction Fetch stage, where the address of be achieved by pipelining.

82
Bikash Poduel et al./Design and Implementation of.......

Table 1. Execution of four-instruction takes 16 clock cycles without pipelining

Fig.1. A four stage pipeline showing the stages and the respective operations

Pipeline hazards: Conditions that may result in the resulting in a resource conflict. For example, a common
need to delay the execution of an instruction pipeline problem of this form occurs when there is only one
are referred to as pipeline hazard (Jiang 2006). There memory device for instruction and data and a load or
are three main types of pipeline hazards - structural store instruction is encountered. Since instruction
hazard, data hazard, and control hazard. must be fetched during each clock cycle, if an
instruction in the pipeline requires an access to
Structural hazard refers to the situation in which two memory then a memory access conflict will result.
instructions must use a common resource, thereby

83
Nepal Journal of Science and Technology Vol. 15, No.1 (2014) 81-88

Table 2. Four-instruction execution with four-stage pipelining takes 7-clock cycles (less than half of what it
takes without pipelining)

Block diagram of proposed RISC core

Fig.2. Internal pipelined architecture of the proposed RISC core

register value before the preceding instruction


In data-hazard, one instruction changes the value of changes the register value, the later instruction may
a register, while a following instruction uses that same use an incorrect value, thereby resulting in an incorrect
register value. If the following instruction reads the program result.

84
Bikash Poduel et al./Design and Implementation of.......

Control hazard occurs when an unconditional or instruction scheduling in the assembly to remove the
conditional branch instruction is encountered. In data hazard. For the control hazard, whenever a branch
absence of branch instruction, all instructions typically instruction is encountered in which the branch takes
can be fetched in strict sequential order. Then, if a place, the following instructions currently in the pipeline
fixed length instruction format is used, a sequence of are marked and subsequent instructions are fetched,
instructions can be pre-fetched before the first starting from the branch-target address. Whenever a
instruction of the sequence has completed the marked instruction is encountered during the execution
execution. However, if the first instruction happens stage, the result of that instruction is nullified by not
to be a branch instruction, then all subsequent storing the execution result for that instruction.
instructions fetched in this manner will be incorrectly
fetched instructions. Instruction set: The RISC core has 32-bit instructions,
which mostly are register based operations. Since
register are fast memory RISC core mostly uses register
Pipeline hazard solution adopted: In order to remove
based addressing mode. However, there is support for
the structural hazard, a separate memory for the data
immediate operand and loading/storing mechanism for
and instruction is used. Thus, there are two sets of
memory access.The instructions are divided into four
the address and data lines in the processor. There is
categories, which are Data Processing Instructions,
no any mechanism used to remove the data hazard. It
Load/Store Instruction, Interrupt Instruction, and
is left for the assembly programmer to perform
Branch Instruction.

Data Processing Instruction


Table 3. List of data processing instructions and its respective address fields

85
Nepal Journal of Science and Technology Vol. 15, No.1 (2014) 81-88

Load/store instruction
Table 4.List of memory access instructions and their respective address fields

Branch instruction
Table 5. List of branch instructions and their respective address fields

Interrupt instruction
Table 6. List of interrupt instructions and their respective address fields

Programmer model of the RISC core Table 7. Programmers model for the RISC core
There RISC core executes 32-bit instruction. The
registers are all 32-bit wide. The ALU is 32-bit. The
programmer model for this RISC core is shown in Table
7. There are 16 visible registers (R0-R15) each 32-bit
wide and four status registers.

The Program Counter (PC) contains the address of the


instruction to be fetched from the memory. The Link
Register (LR), used with sub-routine calls, contains
the address of the next instruction after the sub-routine
call. Thus, when a return instruction is executed from
within the sub-routine the Link Register points to the
instruction to be executed after the return. The Stack
Pointer (SP) points to the top of the Stack, which holds
the data contents in a last-in-first-out manner. The
condition flags are Carry, Zero, Negative, and Overflow Implementation in HDL
which are all 1 bit wide. These bits are set or reset This 32-bit RISC core is designed in both of the HDLs,
during any arithmetic and logical operations. Verilog and VHDL. The block diagram of the module

86
Bikash Poduel et al./Design and Implementation of.......

being implemented is shown in Fig 3. The lists of the


ports are in Table 8. There are three input ports, five
output ports, and one bi-directional port. The RISC
core interacts with instruction memory and data
memory for the instruction fetching and data load/
store respectively. The instruction memory and data
memory module is shown in Fig 4.

Fig.5. HDL module for data memory

Table 8. List of the port for RISC core

Fig.3. HDL module for proposed RISC core

Results and Discussion


The simulation of the proposed RISC core was done
in the Xilinx ISIM simulator where the actual core is
instantiated along with the data and instruction
memory in the test bench. The instruction memory
was loaded with ADD, SUB, AND, and ADD instruction
in the sequence, which was fetched, decoded, and
executed by the processor. All of the instructions are
simulated correctly and the result is shown in Table 9.
In addition, the final synthesis report of the RISC core
Fig.4. HDL module of instruction memory
generated by Xilinx ISE is in the Table 10.

87
Nepal Journal of Science and Technology Vol. 15, No.1 (2014) 81-88

Table 9. Result of execution of the instructions

Table 10. Result of operation of the instructions


would like to thank Mr. Sandesh Ghimire and Ballav
Bhattarai to read our paper and to provide valuable
advices.

References
Hong J. 2006. Pipeline: Hazards. cse.unl.edu/~jiang/
cse430/Lecture%20Notes/Pipeline_Hazards.ppt.
The simulation and result of this processor verifies all Iannucci, R. A. 1988. Towards a Dataflow / Vons Neumann
of the instructions incorporated in the RISC core. The Hybrid Architecture.In: Proceedings of the 15th Annual
RISC core is useful to develop a 32-bit micro-controller International Symposium on Computer architecture
by simply adding the peripherals and a bus. Since this (May 1988), IEEE Computer Society PressLos
core is synthesizable and reconfigurable one can Alamitos, CA, USA, pp. 131-140.
Kilts, Steve.2007. Advanced FPGA Design: Architecture,
upgrade it by increasing the memory of the processor,
Implementation, Optimization.Wiley-IEEE Press, New
by moving up to 5-stage pipeline, by adding the data- Delhi, India. Pp. 170-265.
forwarding technique to remove data hazard, by adding Kim,James and Steve Leibson. 2005. Configurable
branch prediction block to remove the control hazard. processors: A new era in chip design.computer
38:51-59.
Acknowledgements Stallings, W. 2011. Computer organization and architecture,
We take this opportunity to express our profound designing for performance. Dorling Kindersley India
Pvt. Ltd., New Delhi, India.pp. 498-536.
gratitude and deep regards to Professor Dr. Shashidhar
Wikipedia.2011. Semiconductor intellectual property
Ram Joshi for his exemplary guidance, monitoring and core. http://en.wikipedia.org/wiki Semiconductor_
constant encouragement throughout the course of this intellectual_property_core.
research project. The support, help and guidance given Xilinx.2000. FPGA vs ASIC. http://www.xilinx.com/fpga/
by him time to time shall carry us a long way in the asic.htm
journey of life on which we are about to embark. We

88

You might also like