Analysis of 16-Bit and 32-Bit RISC Processors
Analysis of 16-Bit and 32-Bit RISC Processors
Abstract— The reduced instruction set computer, or RISC, combination of binary numbers what can be called as
is a microprocessor that executes small and similar instructions operands or Opcode bits. There was an Opcode for every
that execute in about similar time. The objective is to reduce type of instruction that could be implemented in the CPU
the complexity of instructions which in turn reduces the cost, under certain classifications. This made the instruction set of
cycle time and the operating power. Though the 16-bit RISC the RISC processor much easier to understand [1]. The 16-
has been around since 1970s, it has not been up to the mark Bit RISC CPU underlines all the above points and expresses
and has posed a significant number of technical barriers. This the simplicity of the instruction set architecture.
was the very reason for the development of 32-bit and 64-bit
RISC processors and the concept of pipelining. In this paper, Unlike the 16-Bit RISC, the modern day 32-Bit MIPS
our objective is to study behavioral model of 16-bit and 32-bit processor which happens to be a commercial success,
RISC processor and their independent instruction sets. The 16- accommodates the concept of pipelining because of which
bit RISC processor is a non-pipelined Harvard architecture- the processor improves on a lot of fronts as compared to its
based CPU having separate data memory and instruction predecessor generations. Storing and execution takes place in
memory. The 32-bit RISC is a pipelined processor borrowing a sequential manner which is a clear indication of the fact
its implementation strategies from MIPS architecture. The that instruction overlapping takes place during execution.
processors include GPRs (General Purpose Register) and Flag Pipeline is divided into several stages and all these stages are
registers (Carry, Zero etc.). The model discussed will simulate
connected together via pipe like structures (latches). Each
optimized Multiplier algorithm and will try to optimize the
pipeline segment comprises of two parts i.e., input registers
data path since Arithmetic and Logical operations consume
more power along with high execution delay. The paper aims
followed by combinational circuits. The data through the
to draw a comparative study between the models based on pipeline is held in the registers and the combinational logic
their instruction sets and performance elements such as circuits perform several arithmetic and binary operations on
speedup, power dissipated etc. The individual models have it. The output from one stage is given as an input to the next
been designed and simulated and have been finally integrated one. Since the introduction of pipelining, the throughput
in a top-level module via XILINX ISE Design suit 14.7 along increased by the reduction of CPl. The instructions are
with power analysis. executable effectively in one clock cycle [2 ].
Keywords—RISC (Reduced Instruction Set Computing),16- Our objective is to draw a comparative analysis between
Bit, 32-Bit, pipelining, Verilog, Xilinx, Instruction Set, delay, the two processors on the basis of their performance,
power analysis, operatingfrequency, system architecture physical quantities and the complexity of instructions that
can be executed which will provide us with a detailed study
of making a choice between the two processors based on the
I. In t r o d u c t io n
application requirement. In the upcoming sections II and III
As the name suggests, the RISC microprocessor has a the architectures of the two processors have been discussed
limited number of instructions which can be executed much followed by experimental analysis in section IV followed by
faster since they are much smaller and simpler. They require a conclusion based on the results obtained in section V.
much lesser number of transistors which makes them cost
effective in the terms of both designing and production. The
idea behind the implementation is that RISC executes
simplified instructions as compared to complex ones which II. Sy s t em Ar c h it e c t u r e o f A 16-Bi t RISC Pr o cesso r
requires a complex design. The RISC instruction set The Harvard Architecture based design of 16-Bit RISC
comprises of simple and basic instructions while a complex processor presented here incorporates 8 general purpose
instruction can be executed via a combination of these basic registers, a basic Arithmetic Logic Unit (ALU) for basic
instructions. Since the instructions complete in a single operations such as addition and shift operations, data
cycle, it allows the processor to handle multiple instructions
memories and an instruction set of 14 instructions. It
at the same time. The instructions are register based hence,
data transfer takes place from one register to another. The supports a load store architecture where all operations are
instruction register was such programmed that it accepted a performed in the registers. Since it is a non-pipelined
processor it is much easier to understand and implement.
1318
978-1-6654-0521-8/21/$31.00 ©2021 IEEE
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 01,2021 at 22:27:26 UTC from IEEE Xplore. Restrictions apply.
2021 7th International Conference on Advanced Computing & Communication Systems (ICACCS)
A LU Control
ALU O pcode A L U cnt ALU I n s t r u c t io n
O £ _ (H e x ) O p e r a tio n
10 Xxxx 000 LW/SW Load/Store
1319
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 01,2021 at 22:27:26 UTC from IEEE Xplore. Restrictions apply.
2021 7th International Conference on Advanced Computing & Communication Systems (ICACCS)
time time cycles instructions Fig 4. Describes how an operation is executed in the
----------- —-------- 1 __________ proposed model. This includes from fetching the instruction
program cycle instruction program from the memory to memory write back and getting the final
result of the operation [8 ].
Fig. 3 Performance Equation of a Processor
A. Pipelining
Though the 16-Bit RISC processor is non-pipelined yet it As the name suggests, it allows storing and executing
is able to execute simple instructions effectively and instructions in a sequential manner. The traditional CPU
overcomes the shortcomings of the traditional CISC dealt with the problem of excessive delay due to no other
processor where the number of instructions are minimized at task being executed when one of them is being processed.
the cost of CPI (cycles per instruction). RISC follows a This is a scenario similar to that of an operating system in
different strategy by reducing the CPI at the cost of the idle state when an I/O device is functional. This causes
number of instructions per program. This is depicted in Fig. excessive time delay. This problem is solved by the principle
3. The next section discusses what the 32-Bit RISC processor of pipelining which allows parallel execution i.e., while
brings in with increasing technological requirements. opcode 1 has been fetched and being executed, opcode 2 can
be fetched and decoded leading to parallel running of
multiple instructions at the same time [9]. The processor
receives the following advantages due to pipelining:
III. Sy s t e m a r c h it e c t u r e o f 32-Bi t RISC Pr o cesso r
• The cycles per instructions (CPI) of the processor are
The 32-Bit RISC processor is a 5-stage pipelined, reduced while the speed up (in theory) increases by a
Harvard architecture-based implementation MIPS processor factor of the number of stages of pipelining.
which has been a commercial success. A 32-bit machine
• Pipelining also reduces the delay between completed
works better as compared to a 16-bit machine since it cannot
instructions which is referred to as throughput. This
only address twice as many unique memory addresses, but
can also access data in memory in wider “chunks”. The is because multiple instructions are processed at
every instant due to which the average time taken per
advantage is that, all other things being equal, a 32-bit
computer is functionally going to be slightly less than twice instruction reduces.
as fast as its 16-bit counterpart. • Pipelining allows more complex instructions to be
With data width increasing from 16 to 32 bits, new incorporated in the ALU, since the design is much
application areas such as graphics or manipulation of large faster as compared to the previous ones. This also
data structures opened up along with improved scope of increases the extent of applications where such a
working in domains such as convolution. Complex processor can be used. E.g., DSP (Digital Signal
computations such as calculating n* power of a number take Processing) applications such as convolution
humongous amount of effort and time and hence a 16 bit is • Pipelining enables CPUs to operate at a higher
not fit for this purpose [7]. frequency than the RAM due to which the overall
performance of the computer increases substantially.
This is because the net combinational circuit is
simplified reducing the net time period (delay).
1320
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 01,2021 at 22:27:26 UTC from IEEE Xplore. Restrictions apply.
2021 7th International Conference on Advanced Computing & Communication Systems (ICACCS)
5. Write Back where the computed data is written 2. I type instruction encoding where the ALU
back in the register, the location of which is operations are performed on one source register
available in the final opcode. data and on immediate data (specified in the
opcode) and the result is stored in destination
register. The last 15 bits are designated for the
The objective of achieving maximum processing speed as immediate data on which operation is to be
compared to a similar non pipelined processor is performed as specified by the user.
accomplished via utilization of hardware to the maximum
3. J type instruction encoding where the instruction
extent which improves the speed even when more
complicated instructions are being executed [1 0 ]. encoding is used for jump instructions which
change the flow of execution of instructions in the
processor and takes the program sequence to the
B. Modified Instruction Set specified memory address [11]. Here the instruction
Due to availability of more general-purpose registers and is of 32 bits. The first 6 bits from the left side,
increase in the length of opcode, 6 bits are used for the contains opcode of the operation to be performed
purpose of defining a type of instruction which means that up while the next 26 bits contain immediate data
to 63 instructions can be realized. The proposed MIPS 32-Bit address where the program sequence is to be
architecture contains complex instructions such as transferred.
multiplication, comparison instructions such as contents of
Table II. Instruction Set
register equal to zero.
As presented in Fig. 6 , there are majorly 3 types of Instruction Set
instruction encoding present in this processor. O pcode A L U O p e r a t io n I n s t r u c t io n
1321
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 01,2021 at 22:27:26 UTC from IEEE Xplore. Restrictions apply.
2021 7th International Conference on Advanced Computing & Communication Systems (ICACCS)
Flag registers are available in the 16-Bit RISC The performance comparison is best achieved by
processors as well but their reach and usage are limited simulating a similar program in both the processors and
due to simpler computations and a limited instruction set. then judging them on various fronts. For this purpose, a
When an instruction is executed, the output of the ALU simple addition program has been first implemented in the
fetches a biproduct which can be evaluated by the changes 16-Bit RISC processor and then 32-Bit RISC processor.
in the flag register [13]. 5-Bits are being used out of 32- The implementation has been done on Xilinx ISE suit
Bits. These flags are evident in Fig. 7. 14.7.
1. Parity Flag (PF) For 16-Bit processor, since register 3 (R3) can
The 1st bit of the flag register is occupied by PF contain garbage values, the subtract command is used in
whose objective is to count the number of 1 s in order to clear the contents of R3 (‘0000110110100001’)
the result. If the number of 1s is even then PF is and transfer them to register 2 (R2). After this the number
set ( 1 ) else it remains reset (0 ). 7 is added (‘1110100110000111’) to the contents of R2
and then the result is stored in R3. Similar operations of
2. Zero Flag (ZF) adding 5 (0101) and 13 (1101) are conducted and the
The 2nd bit of the flag register is occupied by ZF result is finally stored in R2. At every positive edge of
whose objective is to determine whether the clock (clk), computation takes place.
result of the ALU or the instruction (BEQZ,
BNE etc.) is zero or not. If the result is 0; the ZF But the limitation of the simulation is that the
is set else it remains reset. computation can only use 8 -Bit numbers (max) even
though the result of the ALU can go up-to 16 bits. The
3. Sign Flag (SF) following is evident in Fig. 8 . This shortcoming is
The 3rd bit of the flag register is occupied by SF overcome by the usage of 32-Bit RISC as given in Fig. 9.
whose objective is to determine whether the In case of 32-Bit processor, both the ADI command and
result (signed number) obtained is within the ADD are being used. Since the registers are already in
range of signed numbers. The MSB of the result clear state, register (R1) is initialized with the value of 10
is used for this purpose. If it is 1 then the number using ADI command (‘001010 00000 00001
is negative while if it is 0 then the number is 0000000000001010’). Similarly, R2 and R3 are initialized
positive. with 20 and 25 respectively.
4 3 2 1 0
CF ACF SF ZF PF
Fig 8. Simulation Result for 16-Bit RISC
1322
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 01,2021 at 22:27:26 UTC from IEEE Xplore. Restrictions apply.
2021 7th International Conference on Advanced Computing & Communication Systems (ICACCS)
O n -C h ip P o w e r
ctkl
li cm
► I f k[31:fl OûûûûûûOûûOûûûi
1__1 D yn am ic: 0.0 0 2 W (2 % )
► I f PC01:O]
► I f IF.IDJR[31:0j 14 %
' 1 C lo c k s : 0.001 W (1 4 % )
► I f IFJD_NRC[31:0]
y ^ ID_EX_A[31:fj
S ig n a ls : 0 .0 0 1 W (6 % )
► 1$ ID_EX.B[31;0|
►H ID.EXJR[i1:0] L o g ic : 0 .0 0 1 W (5 % )
y | f lD_EXJmm[31:fl
75%
► | f iD.EXtypell'O] JO : 0.001 W (7 5 % )
y I f EX.MEM_ALUout[31:ffl ¡XXXXXX. (aoo t
y | f EX.MEM.IR01A
y | f EX.MEMjyptßO] n D e v ice Static: 0 .1 0 4 W (9 8 % )
H ¿i; MEM.,VB.ALUout[31:0]
y I f MEM.WB.IRI31:0] m xm xm m ;
y | f MEM_WB_typ([2;0i XXX
srun 1000 ns
Simulator is doing circuit initialization process.
Finished orcut intakzätion process,
RO-O
Total On-Chip Pow er: 0.106 W
Rl-10
R2-20 Design P ow er Budget: Not Specified
R3-25 P ow er Budget Margin: N/A
R4-30
R5-5S Junction Tem perature: 26.2*C
Fig 10. Factorial computation on 32-Bit processor Fig 12. Power Analysis of 32-Bit RISC Processor
1323
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 01,2021 at 22:27:26 UTC from IEEE Xplore. Restrictions apply.
2021 7th International Conference on Advanced Computing & Communication Systems (ICACCS)
processor loses a significant amount of time dealing with
Table III. Timing and Frequency Summary Comparison
the stack element.
16-Bit 32-Bit Pipelined
Nonpipelined RISC All the aforementioned points suggest that the 16-Bit
RISC processor is now limited to a situation specific demand
because on one hand operating on 16-Bit increases code
Speedup 1 5 density as compared to a fixed 32-Bit format, the
instruction set performance and ability to improvise
Maximum makes 32-Bit processor a much better option in general.
Operating 78.654 MHz 139.438 MHz
Frequency VI. Re f e r en ces
Maximum
[1] Sivaram a P. D andam udi, "A G uide To R IS C P rocessor
Combinational 13.981 ns 7.028 ns F or P rogram m ers A n d Engineers" in , Springer.
Delay [2] G M am un B, Shabiul I. and Sulaim an S, “A Single
C lock C ycle M IPS R IS C P ro cesso r D esign using
While the power analysis highlights the power VHDL”
consumption of each processor, the timing analysis and [3] Sam iappa Sakthikum aran, S. S alivahanan, V. S.
maximum operating frequency which is available in Table K anchana B haaskaran. "16-B it R IS C pro cesso r design
for co nvolution application", 2011 International
III. highlight the extent to which minimum time period C onference on R ecen t T rends in Inform ation
and combinational delay is affected [17]. Even though the T echnology (IC R T IT ), 2011
word length has been increased, yet the affect of [4] C handran V enkatesan, M. T habsera Sulthana, M.
pipelining is more prominent, hence reducing the net G .Sum ithra, M. Suriya. "D esign o f a 16-Bit H arvard
combinational delay. Structure R ISC P ro cesso r in C adence 45nm
T echnology", 2019 5th In ternational C onference on
A d v an ced C om puting & C om m unication System s
The next section will draw a detailed comparison (IC A C C S), 2019
between the two processors based on the experimental [5] "P rocessor D esign", Springer Science and B usiness
results and observations made. M edia LLC , 2007
[6] G uang-M ing Tang; Pei-Y ao Q u; X iao-C hun Ye; D ong-
R u i FanR. N icole, “L ogic D esig n o f a 16-bit B it-Slice
V. Co n c l u s io n
A rithm etic L ogic U n it fo r 3 2-/64-bit R SFQ
M icroprocessors” IE E E T ransactions o n A pplied
In general, a 16-Bit processor costs lower than a 32- Superconductivity, vol. 28, issue no. 4 - 31,Jan. 2018.
Bit processor, owing to the fact that its internal data path [7] C handran V , A li K S, G nanaprakash V, “E nergy
E fficien t and H ig h sp e e d R ounding-B ased A pproxim ate
is narrower, so lesser number of transistors are required M ultiplier” .
for manufacturing it because of which a reasonable
[8] J.B. D ennis; G.R. G ao; “A n efficient p ipelined dataflow
amount of area is available. This space can be used for pro cesso r architecture” IE E E Supercom puting
accommodation of features (chip memory, peripheral '88:P roceedings o f the 1988 A C M /IE E E C onference on
interfaces etc.). S upercom puting, Vol. I, 6 A ug. 2002
[9] Iro P antazi-M ytarelli; “The h istory an d use o f pipelining
com puter architecture: M IPS pipelining
The aforementioned point is evident from the given im plem entation” 2013 IEEE L ong Island System s,
observations made in the previous section i.e., the total A pplications and T echnology C onference (L ISA T ), 3
power consumption for a 32-Bit processor is about 60% M ay 2013
more than that for the 16-Bit processor. Comparing the [10] B ai-Z hongY ing, C om puter O rganization, Science Press,
two figures we can see that the dynamic power 2000.11
consumption for the 32-Bit processor is much more as [11] J. L. H ennessy, "V L SI P rocessor A rchitecture", IEEE
T ransaction o n C om puters, vol. c-33, no. 12, Dec. 1984.
compared to the 16-Bit processor. The reason for this
[12] R ohit Sharm a, V ivek K um ar Sehgal, N itin N itin1,
phenomenon is the higher operational frequency. A higher P ran av Bhasker, Ishita V erm a; “D esig n and
frequency leads to higher number of operations that are Im plem entation o f a 64-bit R IS C P rocesso r using
being performed during a cycle which further leads to the V H D L ” U K S im 2009: 1 1th International C onference on
C om puter M odelling and Sim ulation, 978-0-7695-3593-
dynamic power consumption. Apparently, the 32-Bit 7/09, 2009 IEEE.
processor is approximately 70% faster than the 16-Bit [13] S. P. R itpurkar; M. N. T hakare; G. D. K orde; “D esign
processor. These observations are expected as the 32-Bit and sim ulation o f 32-B it R IS C architecture b ased on
processor is capable of storing more computational values M IPS using V H D L ” IE E E 2015 International
and the pipelined architecture of the processor reduces the C onference on A dvanced C om puting and
C o m m unication System s, 12 N ov. 2015
size of each instruction cycle thereby increasing the
[14] N. Sureka, R. Porselvi and K. K um uthapriya, "A n
operating frequency and decreasing the combinational E fficient H igh Speed W allace Tree M ultiplier", 2013
delay. In case of arithmetic calculations, although 16-Bit International C onference on Inform ation
processor can offer a decent processing speed in a small C om m unication and E m bedded System s (ICICES).
system at minimum cost but in case of applications that [15] V ivado D esig n Suite U ser G uide: Synthesis (U G901).
require high efficiency such as floating-point arithmetic, [16] V ivado D esig n Suite U ser G uide: Im plem entation
(U G 904)
32-Bit processors should be preferred. The 16-Bit
[17] Sivaram a P. D andam udi, "P rocessor D esig n Issues" in
G uide to R ISC P rocessors, Springer, pp. 13-36.
1324
Authorized licensed use limited to: BOURNEMOUTH UNIVERSITY. Downloaded on July 01,2021 at 22:27:26 UTC from IEEE Xplore. Restrictions apply.