0% found this document useful (0 votes)
55 views7 pages

ASIC Design of MIPS Based RISC Processor For High Performance

This document summarizes a research paper that designed a 32-bit MIPS RISC processor using Verilog HDL. It implemented the key stages of a MIPS pipeline, including instruction fetch, decode, execution, memory access, and write back. The design aimed to avoid stalls and hardware interlocks by adding an ALU forwarding unit and register files. Evaluation showed the proposed method improved power efficiency and performance compared to existing MIPS processor designs through reduced power dissipation and avoiding pipeline hazards.

Uploaded by

Shrinidhi Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views7 pages

ASIC Design of MIPS Based RISC Processor For High Performance

This document summarizes a research paper that designed a 32-bit MIPS RISC processor using Verilog HDL. It implemented the key stages of a MIPS pipeline, including instruction fetch, decode, execution, memory access, and write back. The design aimed to avoid stalls and hardware interlocks by adding an ALU forwarding unit and register files. Evaluation showed the proposed method improved power efficiency and performance compared to existing MIPS processor designs through reduced power dissipation and avoiding pipeline hazards.

Uploaded by

Shrinidhi Rao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ASIC Design of MIPS Based RISC Processor for High

Performance
AGINETI ASHOK1, V. RAVI2*
1,2
School of Electronics Engineering, VIT University, Chennai, India
1
[email protected], [email protected]
* Address for correspondence

Abstract— Objectives: The main aim of this paper is to processor creates an erroneous yield because of
implement 32Bit MIPS (Microprocessor Interlocked information reliance relations between the instructions.
Pipeline Stages) RISC (Reduced Instruction Set Computer) This happens when the information of one instruction
Processor using Verilog HDL (hardware description concurs with the yield of a past instruction, yet the past
language).
instruction output won't be present at the time of
Methods/Statistical analysis: The proposed algorithm
analyzes the different stages of instruction decoding such as executing the present instruction. At the point when the
Instruction fetch module, Decoder module, Execution instruction execution stage happens then the both
module and design theory based on 32Bit MIPS RISC instructions have information reliance. This can be
Processor. In addition to that the algorithm uses pipelining avoided by using nop’s (no operation) or the result of the
concept which involves Instruction Fetch, Instruction ALU is forwarded until the output of the previous
Decode, Execution, Memory and Write Back modules of instruction is stored in the register file. In a single clock
MIPS RISC processor based on 32Bit MIPS Instruction set cycle of MIPS single cycle processor, it carries only one
in a single clock cycle. instruction. It performs the tasks like fetching the
Findings: RISC is a processor which is intended to
instruction, decoding the instruction, memory access,
perform a tiny set of operations, to expand the rate (speed)
of the processor. In general, the processor works with a huge executing the instruction and writing back all the results
number of instructions every second by bringing the in single clock cycle. In every single cycle of an MIPS
information from the memory. In the event that the processor, register record supports two autonomous
processor speed does not coordinate with memory access register read and single register writing back. The register
speed then hardware interlocks happen. In concurring with record reads in the memory locations with the help of the
this there is one more issue called stalls because of requested address and yields the information values
instruction pipelining in the CPU design. The primary desire contained in the register. On this information ALU can be
of this paper is to design and synthesize the MIPS processor worked, whose operation is dictated by the control unit to
by making utilization of register files and to insert the ALU
to either process a memory address, perform arithmetic
forwarding unit in order to avoid the stalls and hardware
interlocks. operation (sub and addition), or performing compare
Application/Improvements: Based on the literature (branch). If the instruction decoded is arithmetic, then the
survey, the proposed method brings significant power result from ALU must be written back to a register. If the
efficiency improvements with enhanced performance and instruction decoded is a store or load, then the ALU result
reduced power dissipation due to not only technology scaling can be utilized as a address to the data memory. The last
but also a great deal of design efforts. step composes the ALU result or memory esteem back to
the register record.
Keywords: Inter locked pipeline stages; Hazards; Register
files; ALU forwarding unit.
2. RISC AND CISC ARCHITECTURE
1. INTRODUCTION 2.1 CISC Architecture
The MIPS instruction set architecture is a RISC based CISC architecture is essentially the chips which are
chip design. MIPS processors are mainly used in Cisco effectively programmable; additionally, improve
Routers, digital cameras, Sony play station game consoles proficient and memory utilization. It was produced to
and Windows CE devices. MIPS contain fixed length streamline the compiler advancement. For instance, a
straight forward decode instruction format, where the load CISC processor has an inherent ability to execute
and store has the constrained memory, large number of complex instructions, rather than making compiler
register record where the operations can have finished compose long machine instructions. Pentium is an
inside of the register record of the processor. The primary illustration of CISC processor. The given assignment is
issue with MIPS processor is Data Hazards and is executed by utilizing the less number of instructions.
characterized as a circumstance where the pipelined Slow memory is utilized to make it more proficient.

978-1-5090-5913-3/17/$31.00 2017
c IEEE 263

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:11:51 UTC from IEEE Xplore. Restrictions apply.
Compiler is less muddled as smaller scale programming In EXE stage the execution of instructions takes place,
instructions which are written to match high level where all arithmetic and logical operations such as shifts
constructs. For each new era of directions for a PC have a left and right, subtraction, addition are done in this stage.
subset of past form set of instructions1. Subsequently it If any current instruction requires the memory access,
turns out to be more unpredictable with each most recent then the Memory Access stage will perform this type of
era PC's. As the instruction is of any length, subsequently operations. So, for load instruction, Memory Access stage
every instruction will take diverse number of clock cycles would load an operand from memory.
to execute. Consequently, the general execution is slowed
down.
4. PROPOSED WORK
2.2 RISC Architecture
4.1 Hazard Unit
It utilizes little and very advanced arrangement of
instructions. Each and every instruction uses a solitary Mainly the hazards occur due to instruction pipelining
clock cycle to execute. RISC makes utilization of in central processing unit. An incorrect compilation result
pipelining idea where it permits the concurrent execution occurs when the next instruction cannot execute in the
of instructions, by making the processor more effective. respective clock cycle. The control logic determines
To stay away from the gigantic number of cooperation’s whether a hazard will occur or not when an instruction is
with memory, RISC utilizes an extensive number of fetched. If the hazard is going to happen then the control
register record to store the intermediate results/data. unit will insert no operation.
4.2 ALU Forwarding Unit
3. FIVE STAGE PIPELINING As the processor executes millions of instructions per
second, then there is a problem called Interlocking in the
The execution of an instruction in a processor is spitted pipeline stages and the solution for it is to stall the
into number of stages, but in the design of MIPS
pipeline stages. Divide the pipeline into two parts,
processor I make use of five stage pipelining. Figure.1
instruction fetch and instruction execute as shown in
shows five stage pipelining.
Table 1.

Table 1. Division of pipeline stages.

IF Phase EXE Phase


IF ID EXE MA WB

So, to stall the pipeline ALU forwarding unit is used in


order to make use of the ALU result directly, without
waiting for the result to be written back to the register
record. In order to get the result like that we need to
forward the result of the ALU directly back to the
Figure 1. Five Stage Pipelining arithmetic and logical unit. The pipeline register record
contains the ALU output produced by an ALU. So, this
The IF stage gets the next instruction from memory result has been forwarded to the subsequent instructions
with the address present in the Program Counter (PC) and in order to prevent the data hazards. It selects the correct
then it will be stored in the instruction register (IR). ALU inputs to the execution stage. If the hazard occurs,
Whereas in ID stage decodes the instruction present in the the operands will come from either MEM/WB or
instruction register and evaluates the program counter EX/MEM pipeline registers. This is shown in Figure.2
instruction, and reads if any operand is needed from
register records2.

264 2017 International Conference on Nextgen Electronic Technologies

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:11:51 UTC from IEEE Xplore. Restrictions apply.
If there are no hazards, then register file will provide
the operands for an ALU3. The arithmetic and logical unit
sources can be chosen with the help of two multiplexers,
with control signals namely Forward_B and Forward_A
shown in Figure.3. So, the forwarding Unit removes the
data hazards which involves in arithmetic instructions. It
just compares the source register of the current instruction
and the destination register record of the previous
instruction.

4.3 Instruction Divider


As the MIPS processor is 32 bit it makes use of 32-bit
general purpose registers. MIPS has three types of
instructions they are

1. I-type: This type of instruction format is used


Load and Store instructions.
2. R-type: This type of instruction format is used
for Arithmetic Instructions.
3. J-type: This type of instruction format is used for
jump instructions.

Figure 2. ALU forwarding unit

Figure 3. MIPS processor with forwarding unit

2017 International Conference on Nextgen Electronic Technologies 265

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:11:51 UTC from IEEE Xplore. Restrictions apply.
Figure 4. Different Fields of instruction format and its description

In Figure 4, the three instruction format uses six-bit where the two least significant bits are removed and the
opcode to perform particular operations like addition, four most significant bits are removed to get as same as
subtraction depending on the opcode value. I type the current instruction address4. R type instruction is used
instructions includes branch Instruction. for performing ALU operation based on the opcode. The
register is used to store results and shift amount (sa) is
used to shift and rotate instructions. The amount of shift is
decided by the source operand Rs is shifted. Function is
used because it contains the control codes to differentiate
the multiple instructions5. The different fields in the
instruction divider with its description are as given in
Figure.5.

5. SIMULATION RESULTS
5.1 Execution Unit
MIPS execution unit contains ALU, which perform the
operations based on the opcode. By the addition of
program counter value to the sign extension unit, which is
left shifted by two units with the help of an adder we will
get the branch address The sign extended unit increases
the number by appending the most significant bit in order
to preserve the sign of a binary number6. The control
Figure 5. Different types of instruction format signals for ALU are generated by the ALU controller.
ALU controller is a circuit has two inputs followed by and
To calculate the branch destination address, branch output, which is a two-bit data that tells ALU, which type
instruction uses the sum of offset value from the present of arithmetic and logical operation that ALU performs on
address in the address in accordance with the program the two input data7. The simulation results are carried out
counter. Source register (Rs) and destination register in a tool called Xilinx and the resultant waveform for
(Rd) are of 5 bits to store the offset values. Where J type execution unit is shown in Figure 6.
instruction uses 6-bit opcode and 26 bits of address,

266 2017 International Conference on Nextgen Electronic Technologies

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:11:51 UTC from IEEE Xplore. Restrictions apply.
Figure 6. Execution unit output waveform

5.2 Hazard Unit IFID_Register_Rs or IDEX_Register_Rt equal to


IFID_Register_Rt then stall is active high otherwise stall
This Hazard detection unit detects whether there is a
will be active low.
stall in the pipeline or not. Figure 7 shows the simulation
result for hazard unit. If the signal IDEX_MemRead is
active high and IDEX_Register_Rt is equals to

Figure 7. Hazard detection unit waveform

2017 International Conference on Nextgen Electronic Technologies 267

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:11:51 UTC from IEEE Xplore. Restrictions apply.
Figure 8. ALU forwarding unit waveform

5.3 ALU Forwarding Unit


It consists of comparator which compares the singles
and produces the output. Figure 8 shows the simulation
result for
ALU forwarding unit and the outputs are Forwar_A and
Forward_B.

6. LAYOUT
Layout for the processor shown in Figure 9 is done
using cadence tool called SoC Encounter. The inputs for
the tool are gate level netlist, standard cell library files,
timing constraints file and technology file8. Gate level
netlist is consisting of interconnected logic gates that
define the logic. Standard cell library has readily made
logic cells like nor gate, and gate etc. A netlist has the
instantiation of these standard cells. Technology file has
the rules for the design like metal widths, spacing etc and
the cadence tool will take .lef file as technology file.
While doing synthesis for the design the timing constraint
file is generated, which contains the timing constraints of Figure 9. ALU forwarding unit waveform
the design like input and output delay constraints, false
paths and clock definitions9. The major step in the layout Power calculation is tabulated in terms of the cells used
is floorplanning where the chip quality is determined. In for each phase of an instruction and the leakage power,
this step the space for standard cells, macros in the design, dynamic power; total power is shown in the Table 2.
routing resources for power and size of the chip is
defined. Aspect ratio defines the height and width of the
chip. Then the core of the chip is utilized up to 70% to Table 2: Power calculations
place standard cells and macros10. The remaining 30% is LEAKAG DYNAMIC
used for routing the gates and if we want to place the CELL Total Power
MODULES E POWER POWER
buffers. Core utilization formula is given as. S (nW)
(NW) (NW)
Processor 4.23 3667.04 17421614. 17425281.82
Core Utilization = (Standard cell area + macro cell area) / 1 78
total core area IF Unit 190 133.041 437604.07 437737.144
3
After Floor planning the power rings and strips were ID Unit 2788 2777.06 13636676. 13639453.3
added followed by routing and timing verification for the 24
design. The final netlist is compared with the desired EXE Unit 1119 566.164 1097609.0 1098175.171
07
netlist.
Data 74 130.795 794861.52 794992.317
Memory 2
Control 30 18.171 8060.935 8079.105
Unit

268 2017 International Conference on Nextgen Electronic Technologies

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:11:51 UTC from IEEE Xplore. Restrictions apply.
7. CONCLUSION
MIPS process is a best contender to remove the hazards
in original datapath with the help of forwarding unit,
where by fetching the results from the pipeline registers
before they return back to the register record. So, the
processor won’t go to the high impedance or unknown
state, which intern results in the performance
enhancement.

REFERENCES
1. W. Hu et al., “Godson-3B1500: A 32 nm 1.35 GHz
40 W 172.8GFLOPS 8-Core processor,” in IEEE
ISSCC Dig. Tech. Papers, 2013, pp. 54–55.
2. W. Hu et al., “Godson-3B: A 1 GHz 40 W 8-Core
128GFLOPS processor in 65 nm CMOS,” in IEEE
ISSCC Dig. Tech. Papers, 2011, pp. 76–78.
3. Sinha, Neha, and V. Ravi. "Implementation of health
monitoring system using mixed environment."
4. Indian journal of science and technology
5. 8.20 (2015): 1.
6. B. Fan et al., “Physical implementation of the 1 GHz
Godson-3 Quad-Core microprocessor,” J. Comput.
Sci. Techn., vol. 25, no. 2, pp. 192–199, 2010.
7. W. Hu and Y. Chen, “GS464V: A high-performance
low-power XPU with 512-bit vector extension,” in
Hot Chips Symp., 2010.
8. J. Friedrich et al., “Design methodology for the IBM
POWER7 microprocessor,” IBM J. Research and
Development, vol. 55, no. 3, pp. 9:1–9:14, 2011.
9. M.Dodiya Chandni. M and V.Ravi, “Built in Self-
Test Architecture using Concurrent Approach”,
Indian Journal of Science and Technology, Vol
9(20), May 2016
10. Q. Fan et al., “A synchronized variable frequency
clock scheme in chip multiprocessors,” in Proc.
IEEE ISCAS, 2008.
11. S. Damaraju et al., “A 22 nm IA multi-CPU and
GPU system-on-chip,” in IEEE ISSCC Dig. Tech.
Papers, 2012, pp. 56–57
12. Charles E. Gimarc, Veljko M. Mhtinovic, "RISC
Principles, Architecture, and Design", Computer
Science Press Inc., 1989.

2017 International Conference on Nextgen Electronic Technologies 269

Authorized licensed use limited to: International Institute of Information Technology Bangalore. Downloaded on October 14,2023 at 04:11:51 UTC from IEEE Xplore. Restrictions apply.

You might also like