0% found this document useful (0 votes)
29 views4 pages

Implementation of 32 Bit Mips Processor With Cisc Multiplication Operation IJERTV4IS110636 2

Uploaded by

gopala krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views4 pages

Implementation of 32 Bit Mips Processor With Cisc Multiplication Operation IJERTV4IS110636 2

Uploaded by

gopala krishna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Implementation of MIPS Processor with CISC

Architecture

Dr. A. Sahaya Anselin Nisha


N. Hara Gopala Krishna P. Naresh
M.E., Ph.D
Dept. of E C.E Dept. of E C.E
Dept. of E C.E
Sathyabama Institute of Sathyabama Institute of
Sathyabama Institute of
Science and Technology Science and Technology
Science and Technology

Abstract—MIPS architecture is one of the first commercially Instruction Set architecture (ISA). Along with this 32-bit
available RISC processor. MIPS stands for ‘Microprocessor RISC processor, a CISC operation - multiplication have been
without Interlocked Pipeline Stages’. In a normal MIPS RISC incorporated, with normal ALU operation.
architecture, for 32-bit multiply operation it can hold the
processor for more than 32 clock cycles, which affects the RISC processors typically support fewer and much
processor performance. In order to avoid this problem, here we simpler instructions. RISC architecture has a simpler
have implemented 32-bit MIPS processor with one CISC hardware compared to CISC. The CISC operation adds more
operation for multiplication which is realized using a Booth hardware to the design. Even if it increases hardware
multiplier. Processor is tested in Xilinx Nexsys Spartan3 board, complexity, it reduces number of cycles required to execute a
using a 177MHz clock frequency. multiplication operation.
Keywords—MIPS, ISA, Pipeline, booth multiplier The ISA has 32-bit instructions, with 5-bit opcode, 5-bit
each for registers and rest of the bits are for shift amount and
function value in the case of R-type instructions, 16-bit data
I. INTRODUCTION in the case of I-type instructions, and Jump address for J-type
The Arithmetic and Logic Unit is the most important part instructions.
of a processor, which executes all the arithmetic and logical
operations. To make a processor with lighter hardware, the The instruction execution in a processor can be split into a
Arithmetic and Logic Unit(ALU) should be simple. But the number of stages. As shown in Fig.1, in a MIPS processor
instructions like multiply and divide takes many clock cycles there are 5 stages:
with normal ALU. So it is better to implement those 1. The Instruction Fetch stage fetches the next instruction
instructions separately with a dedicated hardware. from memory using the address in the Program
This paper implements a 32-bit 5-stage RISC Processor, Counter (PC) register and stores this instruction in the
with one CISC operation, which is implemented using Booth Instruction Register (IR).
Multiplication Algorithm. This algorithm uses less number of 2. The Instruction Decode stage decodes the instruction
cycles for execution than normal multipliers. A dedicated in the IR, calculates the next PC, and reads any
ALU is designed, for realizing the booth multiplier. The operands required from the register file.
normal ALU does all other operations except multiplication. 3. The Execute stage executes the above decoded
instruction. ALU units are there in the execute stage.
Processor is designed using Verilog (IEEE 1364) 4. The Memory Access stage performs any memory
Hardware Description Language (HDL) language. The design access required by the current instruction. For load
is synthesized in Xilinx ISE design suite 12.4. For instruction, it is to load an operand from the memory.
implementing the processor, Xilinx Nexsys Spartan-3E board For store instruction, it is to store an operand into
with FG320 package has been used. memory.
5. For instructions that have a result which have to be
The paper is organized as follows: Section II gives an
written into a register, the Write Back writes this
Introduction to MIPS processor; Section III explains MIPS
result back to the register file.
pipeline architecture and Instruction Set Architecture. In
Section IV CISC Operation-multiplication algorithm is given.
Section V deals with the FPGA Implementation of the
proposed design. Section VI gives Simulation results and
Section VII: conclusion and future scope.

II. MIPS PROCESSOR


MIPS processor, designed in 1984 by researchers at
Stanford University, is typically a Reduced Instruction Set
Computer (RISC) with Harvard architecture. Here we have Fig. 1. 5-stage MIPS Processor
implemented a 32-bit processor which follows MIPS
III. MIPS PIPELINE ARCHITECTURE AND INSTRUCTION SET B. Instruction Set Architecture(ISA)
ARCHITECTURE Here we have implemented R-type, I-type and J-type
Pipelining is a method used to improve processor instructions.
performance. Pipeline reduces the number of processor cycles
1) R-type Instruction: Register type instructions are one
needed to execute a set of instructions. Pipelining is
incorporated with 5-stage MIPS processor architecture to which takes the operands from registers and write the result
improve its performance. back to a register. Format of the register instruction is as
shown in Fig. 3.
A. MIPS Pipelined Architecture In Fig. 3, Opcode stands for operation code of the instruction.
Rs and Rt are source registers. Shamt is the shift amount and
Instructions are first stored in the instruction memory.
Functval stands for function value.
Based on the PC value, processor selects the instruction from
the memory and passes it on to Decode Issue environment, 2) I-type and J-type Instruction: I-type is immediate
which will decode the instruction into Operation code, instruction and J-type is Jump instruction. Format of the I-type
Operand register and Destination register. Next, it will take instruction is shown in Fig. 4. and J-type is in Fig. 5. The
the value from corresponding registers, and gives to the ALU instruction set is given in Table II and III for I-type and J-
for execution. type respectively.
There are 2 ALUs in our proposed processor:
• First one is ‘Dedicated ALU’ for Multiplication Opcode Rs Rt Rd Shamt Functval
Operation. When ALU is executing multiplication 31 26 25 21 20 16 15 11 10 6 5 0
operation, it will hold all other instruction for 4 processor
clock cycles. This multiplication operation is Fig. 3. R-type Instruction Format
implemented using Booth Algorithm. This is for the
CISC operation.
Opcode Rs Rd I-data
• Second ALU is for all other instructions (ADD, SUB,
AND, OR , NOR, NAND, XOR, Shift Left and Shift 31 26 25 21 20 16 15 0
Right).
Fig. 4. I-type Instruction Format
After this it will Write-Back to the respective destination
Registers, as given in the instruction.
For special Instructions, it does the following: Opcode J-address of 26 bits
• For J-type instructions, this will take the 26-bit 31 26 25 0
address and left shift it by 2, to make it 28 bit (this is
done to make it into a word format). Fig. 5. J-type Instruction Format
• For Load and Store instructions, address is calculated
by the ALU and is given to the PC. TABLE I. R-TYPE INSTRUCTIONS
Sl.No. Instruction Action Opcode Functval
1 ADD $s1,$s2,$s3 $s3$s1+$s2 000000 000001
2 SUB $s1,$s2,$s3 $s3$s1-$s2 000000 000010
3 MUL $s3$s1*$s2 000000 000011
$s1,$s2,$s3
4 AND $s1,$s2,$s3 $s3$s1 (AND)$s2 000000 000100
5 OR $s1,$s2,$s3 $s3$s1 (OR) $s2 000000 000101
6 NOR $s1,$s2,$s3 $s3$s1 (NOR) $s2 000000 000110
7 NAND $s3$s1 (NAND) 000000 000111
$s1,$s2,$s3 $s2
8 XOR $s1,$s2,$s3 $s3$s1 (XOR) $s2 000000 001000
9 DIV $s1,$s2,$s3 $s3$s1 / $s2 000000 001001
10 SLT $s1,$s2,$s3 Set s3 if s1<s2 000000 001010

Fig. 2. Implementation Approach


TABLE II. I-TYPE INSTRUCTIONS Step 5: Drop the least significant bit from P to obtain the final
Sl.No. Instruction Action Opcode product of m1 * m2.
1. ADDI $s1,$s2,100 $s2$s1+100 000001
2. SUBI $s1,$s2,100 $s2$s1-100 000010
3. MULI $s1,$s2,100 $s2$s1*100 0000 11
4. ANDI $s1,$s2,100 $s2$s1 (AND) 100 000100
5. ORI $s1,$s2,100 $s2$s1 (OR) 100 000101
6. NORI $s1,$s2,100 $s2$s1 (NOR) 100 000110
7. NANDI $s1,$s2,100 $s2$s1 (NAND) 100 000111
8. XORI $s1,$s2,100 $s2$s1 (XOR) 100 001000
9. DIVI $s1,$s2,100 $s2$s1 / 100 001001
10. SLTI $s1,$s2,100 Set s2 if s1<100 001010
13. BEQ $s1,$s2,25 Go to location $s2 if s1=25 001101
14. BNE $s1,$s2,25 Go to location $s2 if 001110
s1!=25
15. BGT $s1,$s2,25 Go to location $s2 if s1>25 001111
16. BLT $s1,$s2,25 Go to location $s2 if s1<25 010000
17. SLL $s1,$s2,03 Shift Left Logical 010001
$s2$s1<<3
18. SRL $s1,$s2,03 Shift Right Logical 010010 Fig. 6. Multiplier Architecture
$s2$s1>>3

TABLE III. J-TYPE INSTRUCTIONS V. FPGA IMPLEMENTATION


Sl.No. Instruction Action Opcode The above proposed 32-bit MIPS processor with one
1. J 2500 Jump to PC <= 2500 010011 CISC operation, is implemented in Verilog language and
2. JAL 2591 Jump to PC <= 2591, $ra = $ra 010100 finally emulated in Xilinx Nexsys Spartan 3E series FPGA.
+4 , SP [$ra] = Old PC value ; { $ra is The different steps involved in mapping the Verilog source
address of stack pointer } code are Synthesis, Translate, Map, Place & Route and
3. JR $ra PC <= SP[$ra] ; $ra = $ra - 4 010101 finally generating the Bit file. These steps are carried out
using Xilinx ISE Design Suite 12.4. Finally Functional
IV. CISC OPERATION- MULTIPLICATION verification is done and the waveforms are obtained in Xilinx
Here the CISC operation- multiplication is implemented ISIM.
using Booth multiplication algorithm. Booth’s multiplication The processor is tested at a maximum clock frequency of
algorithm implements 32-bit signed binary number 177.336MHz (5.639ns time period) and verified the ISA. Fig.
multiplication using 2’s compliment method. Booth 7 shows how the proposed processor is connected to input-
multiplier architecture is shown in Fig. 6. output pins of FPGA.
Booth's algorithm is implemented by repeatedly adding In FPGA implementation, the output of Processor is
values A and S to a product P and then performing a shift written into a Block RAM. Then from the Block RAM output
right operation. Let m1 and m2 be the multiplicand and is read using an External Clock, and given to an on-board 7-
multiplier, respectively; and let n1 and n2 represent the segment display.
number of bits in m1 and m2.
Step 1: First determine the value of A, S and P. Length of
each of them is n1 + n2 + 1 .
A: Fill the most significant bits with value of m1 and the
remaining (n2 + 1) bits with zeros.
S: Fill the most significant bits with 2’s compliment of m1
and remaining bits with zeros.
P: Fill the most significant n1 bits with zeros, append the
value of m2 to the right and fill the last bit with zero.
Step 2: Following operations are performed based on the two Fig. 7. Procssor connected with Spartan3 FPGA
least significant bits of P:
VI. SIMULATION RESULTS
01: Find P + A. Ignore any overflow.
The proposed design is simulated Xilinx ISE and ISIM.
10: Find P + S. Ignore any overflow. Design emulation is done after obtaining the Bit-file from
Xilinx ISE and dumping that to the target FPGA. The outputs
00: Use P directly in the next step. are connected to the on-board 7-segment display of the target
11: Use P directly in the next step FPGA (Spartan3E).
Step 3: Right shift one place, the value obtained in the 2nd Here we are demonstrating the simulation result for a
step and assign the value to P. Fibonacci series test. Fibonacci series is obtained by writing
code using instructions in ISA. Following are the instructions
Step 4: Repeat steps 2 and 3 for n2 times. that are placed in the Instruction memory:
0: data = 32'd1; TABLE IV. FPGA HARDWARE UTILIZED BY THE DESIGN
Hardware units in Number Total number of Percentage
4: data = {6’d0, `Reg1, `Reg0, `Reg3, 5'd0, 6'd1}; FPGA occupied units Utilized
8: data = {6’d0, `Reg3, `Reg1, `Reg4, 5'd0, 6'd1}; Slices 403 4656 8%
12: data = {6’d0, `Reg3, `Reg4, `Reg5, 5'd0, 6'd1}; Slice Flip Flops 494 9312 5%

16: data = {6’d0, `Reg5, `Reg4, `Reg6, 5'd0, 6'd1}; 4 input LUT 502 9312 5%

20: data = {6’d0, `Reg5, `Reg6, `Reg7, 5'd0, 6'd1}; Bounded IOBs 98 232 42%

24: data = {6’d0, `Reg6, `Reg7, `Reg5, 5'd0, 6'd1}; VII. CONCLUSION AND FUTURE SCOPE
28: data = {6’d0, `Reg5, `Reg7, `Reg1, 5'd0, 6'd1}; In this paper we presented a 32-bit MIPS Processor with a
32: data = {6’d0, `Reg1, `Reg5, `Reg4, 5'd0, 6'd1}; CISC operation- multiplication. Typically a MIPS processor
follows RISC architecture. Here we implemented the
36: data = {6’d0, `Reg1, `Reg4, `Reg5, 5'd0, 6'd1}; Multiplication operation using a 32-bit booth multiplier,
which completes multiplication in 4 processor cycles. This
In the above code 0, 4, 8… 36 are PC values. The output greatly enhances the processor speed, whenever we need to
obtained will be 1, 2, 3, 5, 8, 13, 21, 34, and 55. The initial execute a multiply instruction. But the disadvantage with this
values that are loaded into the Reg0 and Reg1 are 0 and 1 CISC operation is it increases the hardware complexity of the
respectively. Fig. 8 shows the simulation waveform obtained processor.
for Fibonacci series test. Fig. 9 shows a multiplication
We implemented the processor design in Verilog
instruction output which was implemented using a Booth
Hardware Description Language (HDL) and verified the
multiplier.
results in Xilinx Nexsys Spartan3E board. The design is
The 32-bit MIPS processor with one CISC operation is verified at a maximum clock speed of 177MHz.
verified using appropriate test cases. For this we did
functional verification and then FPGA emulation, for each Usually in a processor design, the scope of parallelism is
testcases. limited due to some data and control hazards. These data
hazards are Read after Write (RAW), Write after Read
The FPGA hardware utilized by the processor design are (WAR) and Write after Write (WAW). These data hazards
given in Table.IV. can be solved using Dynamic Scheduling algorithm. So it is
always better to incorporate Dynamic Scheduling with a 32-
bit processor, so that overall speed of the processor will be
A. Authors and Affiliations enhanced.
REFERENCES

[1] David A. Patterson John L. Hennessey ,Computer architecture: a


quantitative approach, 3rd edition, Morgan Kaufmann Publishers.
[2] Balaji valli, A. Uday Kumar, B.Vijay Bhaskar, “FPGA implementation
and functional verification of a pipelined MIPS processor” ,
Fig. 8. Output waveform for Fibanocci Series test
International Journal Of Computational Engineering Research
(ijceronline.com) Vol. 2 Issue. 5.
[3] Paresh Kumar Pasayat, Manoranjan Pradhan, Bhupesh Kumar Pasayat
“ FPGA based implementation of 8-bit ALU of a RISC processor
using Booth”, International Journal of Engineering Research &
Technology (IJERT),Vol. 2 Issue 8, August – 2013.
[4] Marri Mounika,, Aleti Shankar “Design & Implementation Of 32-Bit
Risc (MIPS) Processor” International Journal of Engineering Trends
and Technology (IJETT) – Volume 4 Issue 10 - Oct 2013.
[5] MIPS® Architecture for Programmers Volume II-B: The
Fig. 9. Output waveform for a multiplication operation performed using a microMIPS32™ Instruction Set, Revision 3.05 April 04, 2011.
Booth multiplier

You might also like