Bit RISC Processor Design Using Verilog
Bit RISC Processor Design Using Verilog
FPGA
A
Dissertation
Submitted in the fulfilment of the requirements
For the award of degree
Of
MASTER OF TECHNOLOGY
In
Embedded System Design
By
JIKKU JEEMON
(Roll No. 3146510)
KURUKSHETRA -136119
CERTIFICATE
This is to certify that the dissertation entitled ''''8-Bit RISC Processor
Design using Verilog HDL on FPGA" is the authentic record of work
done by Jikku Jeemon under my guidance and supervision. This
dissertation is being submitted to the National Institute of Technology,
Kurukshetra towards the fulfilment of the requirements for the award of
degree oi Master of Technology in Embedded System Design.
I hereby declare that the work being presented in this dissertation entitled
''8-Bit RISC Processor Design using Verilog HDL on FPGA'\ submitted
towards the fulfilment of the requirements for the award of degree, Master
of Technology in Embedded System Design to the School of VLSI Design
and Embedded Systems, National Institute of Technology, Kurukshetra, is
an authentic record of my work carried out from July 2015 to June 2016,
under the guidance of Prof A. K/GUPTA, School of VLSI Design and
Embedded Systems, National Institute of Technology, Kurukshetra.
I have not submitted the matter embodied in the dissertation for the award
of any other degree.
Jikku Jeemon
Roll No. 3146510
School of VLSI Design and Embedded Systems
Date:
II
ACKNOWLEDGEMENT
I would like to express my deep gratitude and appreciation to all the people who have
helped and supported me in the process of dissertation. Without their help and support, 1
would not have been able to reach this level of satisfaction with what 1 have learnt and
accomplished during my Master's dissertation.
First and foremost, I would like to express my deep sense of respect and gratitude
towards my supervisor. Dr. A. K. Gupta, Professor, School of VLSI Design and
Embedded Systems, NIT Kurukshetra, for giving me opportunity to do my master's
dissertation under his guidance. I am very thankful for his endless support, motivation,
patience, encouragement and guidance during the study. His professional knowledge
and faith in me were very important and gave me the strength to conclude this work.
I would also like to thanks my friends for sharing their valuable thoughts and
knowledge, which motivated me to do better.
Finally, none of this would have been possible without incredible support of my parents.
They were always supporting me and encouraging me with their best wishes.
Jikku Jeemon
Roll No. 3146510
School of VLSI Design and Embedded Systems
III
LIST OF FIGURES
Figure 2.1 Archilectureof 8-bit RISC processor [17] 6
V
LIST OF TABLES
Table 3.1 Instruction set 27
Table 4.2 Device utilization of the Spartan-3E Starter kit FPGA board 54
VI
LIST OF ABBREVIATIONS
Abbreviations Meaning
B Borrow flag
C Carry flag
DM Data Memory
IM Instruction Memory
IR Instruction Register
I/O Input/Output
P Parity flag
PC Program Counter
Z Zero flag
VII
ABSTRACT
RISC is a design technique used to reduce the amount of area required, complexity of
instruction set, instruction cycle and cost during the implementation of the design. This
dissertation presents an 8-bit RISC processor design using Verilog Hardware
Description Language (HDL) on FPGA board. The proposed processor is designed
using Harvard architecture, having separate instruction and data memory. The salient
feature of proposed processor is pipelining, used for improving performance, such that
on every clock cycle one instruction will be executed. Another important feature is that
instruction set contains only 34 instructions, which is very simple, easy to learn and
compact. The proposed processor has 8-bit ALU, Two 8-bit I/O ports, serial-in/serial-
out ports, Eight 8-bit general-purpose registers, 4-bit flag register and priority based
three interrupts. In RTL coding one can reduce the dynamic power by using clock-
gating technique, is used for specific modules that will be clocked only when
corresponding control signals are enabled. The proposed processor is physically verified
on Xilinx Spartan 3E Starter Board FPGA at 25MHz clock frequency, which will work
on 2.5 voltage supply.
VIII
TABLE OF CONTENTS
CERTIFICATE I
CANDIDATE'S DECLARATION II
ACKNOWLEDGEMENT Ill
LIST OF FIGURES IV
LIST OF TABLES VI
ABSTRACT VIII
Chapter 1 INTRODUCTION 1
Chapter 5 CONCLUSION 56
REFERENCES 57
APPENDIX A 59
APPENDIX B 64
X
Chapter 1
INTRODUCTION
With the rapid development of the silicon technology and fall in the cost of the
integrated circuit, the usage of RISC processor is increasing extensively in every field.
The architecture principle Reduced Instruction Set Computer is commonly known as
RISC. RISC processors allow special load and store operations to access memory. The
other operations are performed on register-to-register basis. This feature makes
instruction set design more clear and simple as it allows execution of instructions at
one-instruction-per-cycle rate. Simple and transparent addressing modes allow fast
address computation of operands. Thus, the usage of RISC architecture reduces amount
of area required, complexity of instruction set, instruction cycle and cost of the
hardware ([1] and [2]). RISC processor's range of application includes signal
processing, convolution application, commercial data processing, used in
supercomputers such as the K computer, smart phones, tablets and real-time embedded
systems. Pipelining, a typical feature in RISC processors, is an implementation
technique in which multiple operations are performed at same time. It is a form of
parallelism at instruction-level using a single processor, which significantly improves
the performance of the processor. Pipelining increases instruction throughput but does
not reduce instruction latency, which is the time to complete a single instruction from
start to end [3].
The role of reconfigurable processor in embedded system design has increased greatly
during the past decades. Due to the advancement in Field Programmable Gate Array
(FPGA), we have reached a point where architecture of processor can be modified by
programming [4]. Clock power is a vital component of overall dynamic power
consumption, which should be minimized in design to reduce power consumption. One
of the methods to reduce clock power is clock-gating (ANDing) ([5] and [6]), which
dynamically terminates the clock signals in unused modules of the total hardware. This
avoids the urmecessary power dissipation cropped up by charging and discharging of
clock signal at unused gate.
Asynchronous serial communication has advantages of high reliabihty, simple because
it does not require synchronization both communicating sides and cheap because it
requires less hardware, hence is extensively used as a mode of communication between
computer and peripherals. Universal Asynchronous Receiver Transmitter (UART) is a
type of serial communication protocol, which is mostly used for short-distance, low
speed, low-cost data exchange between computer and peripherals. Asynchronous serial
communication is usually implemented by UART, which allows full-duplex
communication in serial link ([7] and [8]).
Modem electronic devices such as desktops, laptops, notebooks, or tablets are using 32-
bit and 64-bit processors. 16-bit processors can be found in larger systems such as
traffic lights, systems controlling power plants and factory controllers. 16-bit embedded
processors are also used in consumer electronics including video game consoles, DVD
players, digital cameras, scanners and printers. Several household appliances including
microwave ovens and washing machines also uses 16-bit embedded processors.
However, 8-bit computers are used extensively as controllers for simple computational
tasks. According to the divide and conquer principle, a common personal computer is
divided into smaller ones (commonly 8-bit) which share information with the main
computer (32 or 64-bit). An 8-bit processor generally handles the drivers for almost
every component card inside a computer. 8-bit type processors are extensively used in
home appliances and industrial specific systems [9]. 8-bit or 16-bit processors are better
than 32-bit processors for system on a chip and microcontrollers that require
extremely low power for functioning and survival.
The objective of this dissertation is to design an 8-bit RISC processor and implement it
on Spartan 3E Starter kit FPGA using Verilog Hardware Description Language (HDL).
The processor is designed using Harvard architecture, having separate instruction and
data memory. Its most important feature is that its instruction set is very simple,
contains only 34 instructions, which is easy to learn. Another important feature is
pipelining, used for improving performance, such that on every clock cycle one
instruction will be executed. It is planned to design an 8-bit RISC processor having the
following main features.
Harvard Architecture
• 256 K X 16 Instruction Memory
• 4 K X 8 Data Memory
8-bit system data bus
3 Interrupts
Eight 8-bit General Purpose Registers
clockfrequency= 25MHz
Two 8-bit I/O Ports
Serial-in and serial-out ports
2.5V voltage supply
Clock gating for power reduction
The dissertation is organized in five chapters. A brief outline of each chapter is
described below:
Chapter 2 presents previous work done related to the proposed processor as available
in literature.
Chapter 4 describes simulation results. The simulation of the proposed 8-bit RISC
processor is carried out in Xilinx's simulation tool ISim.
This chapter presents the background for this work. Section 2.1 of this chapter presents
brief history of 8-bit processor. In Section 2.2, the related works are described.
The Intel's 8008 was the first 8-bit monolithic microprocessor to market in April 1972.
It operated up to 0.8 MHz, had 3,500 transistors in a PMOS technology, with 10-micron
line width. There were 48 instructions. The address space was 16 kilobytes, but direct
addressing was not supported [10].
The Intel's 8080 was a great improvement over the prior 8008 chip, incorporating many
features into the chip that required the use of external hardware with the 8008. The 8080
a superset of 8008, an NMOS design, with 8-bit words and a 16-bit address bus was
released in 1974. It required ±5 volts and +12 volts and had six general-purpose
registers and accumulator. It used 6,000 transistors, maximum rating of 0.8 watts, had
48 instructions and operated at 2 MHz [11].
The Intel's 8085 was an advanced version of the 8080 and featured simplified hardware
that needed only a single +5V supply, it included a clock-generator and bus-controller
circuits on the chip and was introduced in 1976. It was binary compatible with the 8080,
but required less supporting hardware, allowing simpler and less expensive
microcomputer systems. The 8085 used a multiplexed data/address bus to reduce chip
pin-out. This required external de-multiplexing of the 16-bit address and the 8-bit data.
It had 48 basic instructions, maximum rating of 1.5 watts, operated up to 6 MHz,
featured serial in/out port and supported four vectored interrupts [12].
Reduced instruction set computer (RISC) is a CPU design technique based on the
perception that a simpHfied instruction set yields higher performance when linked with
a microprocessor architecture which can perform those instructions using fewer clock
cycles per instruction. The term RISC was coined by David Patterson of the Berkeley
RISC project, although similar concepts had appeared before [13].
The CDC 6600 designed by Seymour Cray in 1964 used a load/store architecture with
only two addressing modes (register-register, and register-immediate constant) and 74
opcodes, with the basic clock cycle/instruction issue rate being 10 times faster than the
memory access time [14].
IBM 801 is the first recognized RISC system, which was started in 1975 by John Cocke
and completed in 1980. The 801 was eventually produced in a single-chip form as
the ROMP (Research Office products Micro Processor) in 1981 .It was designed for
small tasks and was used in the IBM RT-PCin 1986, which turned out to be a
commercial failure. However, the 801 inspired several research projects, including new
ones at IBM that would eventually lead to the IBM POWER instruction set architecture
[15].
The most public RISC designs, however, were the results of university research
programs run with fiinding fi-om the DARPA VLSI Program. The Berkeley
RISC project started in 1980 under the direction of David Patterson and Carlo H.
Sequin. Berkeley RISC was based on gaining performance by pipelining and an
aggressive use of a technique known as register windowing. Berkeley RISC project
delivered the RISC-I processor in 1982 consisting of only 44,420 transistors (compared
with averages of about 100,000 in newer CISC designs of the era). RISC-I had only 32
instructions, and yet completely outperformed any other single-chip design. They
followed this up with the 40,760 transistor, 39 instruction RISC-II in 1983, which ran
over three times as fast as RISC-I [16].
CwrtrriUnit
Iiuauc&(m Register
ZZEEE
Instnictkm decoder
1
Untvosal Band
Siift Register SUfi
rotate:
ALU Accmmdator
Another related work [18], describes an 8-bit RISC processor that consists of arithmetic
logic unit, control unit, shifter and rotator. The architecture of this processor is shown in
Figure 2.2. The processor is designed with load/store (Von Neumann) architecture, one
shared memory for instructions (program) and data with one data bus and one address
bus between processor and memory. Instruction and data are fetched in sequential order
so that the latency incurred between the machine cycles can be reduced. In'this design,
most instructions are of uniform length and similar structure, arithmetic operations are
restricted to CPU registers and only load and store instructions access memory. Three
stages of pipelining have been incorporated in the design, which increases the speed of
operation. This processor can be used as a systolic core to perform mathematical
computations like solving polynomial and differential equations.
y
^
CCfytTROL
na^TmicTi0N RE@isn»i
liKtT
m s m U C S n O N DECODER
^
ISI li _«_
UmVGRSAt
SJIMIiiRSi
*S^ ^N
> ACCUMUtAtOn
H^&atB
The design of a 16-bit processor has been reported [19] with customised instruction set
for soft-core RISC processor. Instruction Set Architecture (ISA) contains 35 basic
instructions. Among all soft-core processors, RISC design is widely adopted for its
single clock cycle instructions and less resource requirement compared to CISC
approach. The computer architecture of 16-bit RISC processor is shown in Figure 2.3. It
describes the custom simulation of a RISC soft-core processor's instruction set that is
based on Microchip PIC16C5X architecture. Memory address remapping algorithm is
introduced to remap the memory address to correct physical memory address due to
memory banking scheme being applied. Simulation process is done on a highly
customizable Java based computer architecture simulator. It provides features to insert
customized instruction in an assembly language and has the ability to perform
simulation at microcode level to variety of CPU architectures.
tON<^-3t!
r^A
m OUT
This chapter presents a design of an 8-bit RISC processor. Section 3.1 presents the
architecture of proposed processor. The architecture of the proposed processor is shown
in the figure 3.1. Section 3.2 presents the description of functional modules. Section 3.3
describes the instruction set architecture.
The 8-bit RISC processor is designed using Harvard architecture, having separate
instruction memory and data memory. The Figure 3.1 shows the complete architecture
of proposed system. The proposed processor is having 8-bit ALU, Two 8-bit I/O ports,
serial-in and serial-out ports, Eight 8-bit general registers, 3 interrupts and 4-bit flag
register having zero flag (Z), carry flag (C), borrow flag (B) and parity flag (P). The
proposed RISC processor is running at 25MHz and on 2.5 voltage supply. The 8-bit
SYSTEM DATABUS used for transferring data between different modules and 8-bit
Accumulator used for arithmetic and logical operations are also integral part of the
proposed processor. The Instruction Memory (IM) and Data Memory (DM) have
different address and data buses for communicating between different modules. The
interrupt module contains three interrupts, which are priority based. The proposed
processor's most important feature is that its instruction set is very simple, contains only
34 instructions, which is easy to learn.
Another important feature is pipelining, used for improving performance and provides a
way to reduce average execution time per instruction. The reduction can be observed as
decreased the number of clock cycles per instruction (CPI), as falling off in the clock
cycle time, or as a combinational effect. The pipelining architecture used for the
proposed processor is shown in Figure 3.2. The proposed processor requires only two
clock cycles for the execution of an instruction (jump instruction is an exception (TX2
also)), i.e. one fetch (TFl) and one execution cycle (TXl), which are mutually
exclusive. By the pipelining technique, while executing one instruction next instruction
is fetched, such that on every clock cycle one instruction will be executed. For efficient
reduction of power, clock gating is used for specific modules which be clocked, only
when it is required. Data memory and Register set are the modules where clock gating
is used. All loading to the registers is takes place at falling edge of clock and all control
STACKPC PCS
SYSTEM DATABUS
PCU
IS
^ IM DM ^H-f
PCR PC IMAKJiffiUS
^%' BC
mDATABUS DM ADDRBUS
IS,.
^ P O
Flag r ^ ^
RO
Register <r^ 4^^
*-v^ VQ
Module
Rl ^ 8 ^Pl
^^1^ ^ ^
"•V^ R2
'^^^ R3 ^-^
^l-l^ R4 *-4^
^ 1 ^ ^ R5 ^«-V^
M •—/
1^ *-/- R7 •^7^
TFl TXl
TFl TXl TX2
1 . TFl TXl
Iiranchiu;; instructions
In this work, the RISC processor consists of blocks namely, Program Counter Unit
(PCU), Instruction Memory, Data Memory, Control Unit, Register set, Arithmetic &
Logical Unit (ALU), I/O module. Interrupt module and Serial Module.
mcpc
Program Counter
OPC
(PC)
LPC •I
Vl8 To IMADDRBUS
LPCS >18 J8 \"
OPCS 18 FromlMDATABUS
s^
STACKPC PCR PCS • ^ &IR
RESET
4
CLK
SIGNALS FUNCTIONS
CLK System Clock
OPC To Output PC data on IMADDRBUS
LPC To Load PC with PCR content
INCPC To Increment PC value by one
OPCS To Load PC with PCS content
LPCS To Load PCS with PC content
RESET To Reset PC
OSTACKPC To Load PC with STACKPC content
3.2.2 Instruction Memory (IM)
Instruction Memory is 16-bit wide and having 262,144 address locations, so that any
practical real time programs can be fitted into it. In fetch cycle, when the corresponding
control signals are enabled, a 16-bit data bus labelled as IMDATABUS gets
Instruction Memory (IM) content corresponding to the valid address location provided
bytheIM_ADDRBUS.
Xl8 FromlM ADDRBUS
Instrnction Mernoiy
RDIM ^ (IM)
' ? 256KX16 \16 To IM_DATABUS
,,.„,^,, „„.,. n s ^ p s s s ^
SIGNAL FUNCTION
RDIM To output IM content to IMDATABUS
corresponding to the address provided by the
IM_ADDRBUS
The Control Unit receives inputs from Interrupt Module about the states of its three
interrupts through IFO, IFl and TMFO flags, which are taken into account in Tstate
counter module. Control Unit also receives inputs from Flag register regarding the states
of its four flags, which are used for control signal generation related to branching
instructions. The Control Unit receives inputs from Serial Module regarding
transmission or reception of data through TXF and RXFflags,which are set whenever a
transmission or reception is completed.
The Low Power Unit mainly uses clock-gating technique to reduce power dissipation.
The Data Memory and Register set are the main memory elements, which consume
power, so gated clock is provided to these modules. Whenever loading to a
memory/general-purpose register corresponding module will be activated. Thus, power
consumption in the proposed processor has been reduced by the use of clock-gating
technique.
TXF
IR —• IRX • LPU
4- Output elks
''
IFl Tstate Instruction 16 From
counter Decoder IM DATABUS
TMFO
8 Jo
ZCBP v« ^ SYSTEM DATABUS
^
^ '»'<• f IF
All loading to memory/register takes place at falling edge of clock when the
corresponding control signals are high.
SIGNALS FUNCTIONS
LIR To output IM_DATABUS content to IR and PC is
incremented by one
OIRSYS To output IR content to SYSTEMDATABUS
OIRDM To output IR content to DMADDRBUS
3.2.4 Data Memory (DM)
Data Memory is 8-bit wide and has 4096 address locations. Data Memory gets the
required address location by 12-bit address bus, DMADDRBUS, from control unit.
Data memory provides read and write control and can be accessed by 8-bit data bus
SYSTEMDATABUS. Clock signal required for this module is provided by control
unit, which is active only during loading operations for power reduction. All loading to
the memory occurs at falling edge of clock, when corresponding control signals are
high.
RDDM
WRDM
(DM)
"w 4Kx8
12 From DM ADDRBUS
^
DM C L K
SIGNALS FUNCTIONS
DMCLK Data Memory Clock
RDDM To output DM content to SYSTEM_DATABUS
corresponding to the address provided by the
DM_ADDRBUS
WRDM To input SYSTEM_DATABUS content to DM
corresponding to the address provided by the
DM ADDRBUS
3.2.5 Accumulator (A)
Accumulator is an 8-bit wide register is shown in Figure 3.7. Accumulator is connected
to SYSTEMDATABUS by a bidirectional data bus, which is used for data transfer
instructions. Accumulator is also connected to ALUDATABUSA and
ALU_RESULT data buses, which are used for arithmetic and logical instructions.
Accumulator is connected to two 8-bit data buses for communicating with 1/0 module.
The increment, decrement, rotate right, rotate left and compliment operations are
performed on accumulator data. Accumulator is cormected to TBUFF register in Serial
Module for sending the data required for transmission through serial-out port 'txout'
and is connected to RBUFF register in Serial Module for storing the data received
through serial-in port 'rxin'.
INCA DECACMA RL RR
8 ^oALU_DATABUS_A
4 «™4
8 From ALU RESULT
I
CLK LIN OOUT LSIN OSOUT
SIGNALS FUNCTIONS
CLK System clock
RESET To Reset Accumulator
LA To Load Accumulator from SYSTEMDATABUS
OA To Output Accumulator to SYSTEM_DATABUS
INCA To Increment Accumulator
DECA To Decrement Accumulator
CMA To Complement all the bits of Accumulator
LALU To Load Accumulator from ALURESULT
OALU To Output Accumulator to ALUDATABUSA
RR To Rotate the bits of Accumulator in right direction
RL To Rotate the bits of Accumulator in left direction
LIN To Load Accumulator from 8-bit input port PO
OOUT To Send Accumulator content to 8-bit output port
PI
LSIN To Load Accumulator from 8-bit register RBUFF
in Serial Module
OSOUT To Send Accumulator content to 8-bit register
TBUFF in Serial Module
3.2.6 Register Set
Register set contains eight 8-bit registers RO, Rl, R2, R3, R4, R5, R6 and R7, which can
be used for storing data that arefrequentlyused. Register set is connected to ALU unit
by ALU_DATABUS_R, which is a unidirectional data bus for performing arithmetic
and logic operations. It is also connected to SYSTEMDATABUS by a bidirectional
data bus for loading and storing data. All loading to the register occur at falling edge of
clock, when corresponding control signals are high. The clock input to Register set is
gated-clock, which is active only during loading to any one of the registers. In the
Figure 3.8, in the confrol signals LRX, OERX and OERALX, X is 0 - 7 implying RO,
Rl, R2, R3, R4, R5, R6 and R7 register respectively.
SYSTEM DATABUS ALU_DATABUS_R
\»
RO
Rl
R2 4-
R3
^ R4
R5
Re -^
- ^ R7 -^
^
V, sS v»
SIGNALS FUNCTIONS
R_CLK Register set clock
LRX To Load RX from SYSTEMDATABUS where X=
0,1,2, 3,4, 5, 6 or 7.
OERX To Output RX to SYSTEM_DATABUS
OERALX To Output RX to ALUDATABUSR
RESET To Reset all eight registers
3.2.8 ALU unit
ALU is connected to Accumulator, and general-purpose registers by its 8-bit buses
ALU_DATABUS_A and ALU_DATABUS_R. ALU unit consists of AND, OR, XOR,
ADD, SUB operations. RX shown in figure 3.9 can be any of the register RO, Rl, R2,
R3, R4, R5, R6 or R7. The result of operation is stored in Accumulator by the bus
labelled as ALU_RESULT. The zero flag (Z), Carry flag(C), Borrow flag (B) and parity
flag (P) are updated according to the ALU operation, which are stored in 4-bit Flag
Register. Parity flag is set only when resultant of ALU operation contains odd number
of ones.
EAND
AND
EOR
ADD
EXOR ^
EADD OR
™ ^ -
ESUB
SUB
XOR
CLK
Oi
ALU DATABUS A ALU DATABUS R
ALU RESULT
RX
"I
SYSTEM DATABUS
4
Figure 3.9 Block Diagram of ALU unit
SYSTEM DATABUS
EAJJD
T
CLK
edge of clock.
WhereX = 0,1,2, 3,4, 5, 6or7.
T
EOR CXK
edge of clock.
WhereX = 0,1,2,3,4, 5 , 6 o r 7 .
SYSTEM_DATABUS
If 0ALU=1 and 0ERALX=1, data from
accumulator register and RX register are
*—4* A transferred to ALU data buses and if
' ! EX0R=1 then XOR operation will take
s XOR J place in XOR ALU block.
" Parity flag (P) and Zero flag (Z) are
1 *\} I
4 1^ » RX updated. When LALU=1, the result is
A transferred to accumulator during falling
EXOR
i\
1
CLK
edge of clock.
WhereX = 0, 1,2, 3,4, 5 , 6 o r 7 .
\f ]f ^ ;'_ £
- ^ 1^ ^4 From SYSTEM_DATABUS
c
' • • • MJL,,^
B Flag Roister
. .1 , ^ J S ^ ^ "
P
' 7
4
CLK SETCLRF
SIGNALS FUNCTIONS
CLK System clock
SETCLRF To Set/Clear flags bits by SYSTEMDATABUS
content
BAND A and RX, Parity (P) and Zero (C) flags are updated
EOR A or RX, Parity (P) and Zero (C) flags are updated
EXOR A xor RX, Parity (P) and Zero (C) flags are updated
EADD A + RX, Carry (C), Parity (P) and Zero (C) flags are
updated
ESUB A - RX, Borrow (B), Parity (P) and Zero (C) flags are
updated
3.2.9 I/O Module
The I/O module has one 8-bit input port and one 8-bit output port for communicating
with external environment, which can be a sensor, actuator or even another
microprocessor. The input port PO and output port PI are directly connected to
Accumulator, which will control the data flow through the ports when corresponding
control signals are enabled.
PO \S To Accumulator
10 INTCON
II INTERRUPT MODULE
-#
i « ^
SIGNALS FUNCTIONS
CLK System clock
LINTCON To enable/disable interrupts and mask II
interrupt by SYSTEMDATABUS content.
CLRTMRF Clear timerflagTMFO
3.2.11 Serial Module
The Serial Module contains rxin as serial-in port and txout as serial-out port. The serial
communication is based on UART protocol shown in Figure 3.20. The baud rate used
for this Serial Module is 115200 per second. The data transmission starts with a start bit
of 0, followed by the individual data bits of the word with the Least Significant Bit
(LSB) being sent first and then stop bit 1. The Serial Module consists of two 8-bit
registers TBUFF and RBUFF for storing data while transmission and reception. The
data stored in TBUFF register is shifted out during serial data transmission and process
is reversed for RBUFF register. The baud rate (clock) required for serial communication
is provided by the control unit. Serial Module will provide TXF and RXFflagsto
Control Unit regarding transmission or reception of data, which will be set whenever a
transmission or reception respectively is completed.
Data bits
1 L Idle state
Idle state LSB MSB
BAUDCLK
The Control Unit will generate 59 control signals and 4 output clocks for the proper
working of all modules.
Instruction Memory:
Control signals: RDM
Input bus: IM_ADDRBUS [17:0]
Output bus: IMDATABUS [15:0]
Control Unit:
Control signals: OIRSYS, LIR and OIRDM
Input bus: IMDATABUS [15:0]
Output bus: DM_ADDRBUS [15:0], SYSTEM_DATABUS [7:0]
Flag inputs: Z, C, B, P, RXF, TXF, IFO, IFl and TMFO
Data Memory:
Control signals: RDDM and WRDM
Input bus: DM_ADDRBUS [15:0]
Output bus: SYSTEM_DATABUS [7:0]
Accumulator:
Control signals: LA, OA, LALU, OALU, INCA, DECA, CMA, RR, RL, LIN, OOUT,
LSIN, OSOUT and RESET
Input bus: PO [7:0], ALURESULT [7:0]
Output bus: PI [7:0], ALU_DATABUS_A [7:0]
Input/Output bus: SYSTEM_DATABUS [7:0]
Register set:
Control signals: LRO, LRl, LR2, LR3, LR4, LR5, LR6, LR7, OERO, OERl, 0ER2,
0ER3, 0ER4, 0ER5, 0ER6 and 0ER7
Input/Output bus: SYSTEMDATABUS [7:0]
Output bus: ALU_DATABUS_R [7:0]
ALU Unit:
Control signals: OERALO, OERALl, 0ERAL2, 0ERAL3, 0ERAL4, 0ERAL5,
0ERAL6, 0ERAL7, EAND, EXOR, EOR, EADD and ESUB
Input bus: ALU_DATABUS_A [7:0], ALU_DATABUS_R [7:0]
Output bus: ALURESULT [7:0]
Flag Register:
Control signals: SETCLRF
Flag output: Z, C, B and P
Interrupt Module:
Control signals: LINTCON and CLRTMRF
Flag output: IFO, IF 1, TMFO
Serial Module:
Flag output: TXF and RXF
The main advantage of this instruction set architecture is the use of SAV PC instruction
that saves PC incremented by two value in PCS. SAV PC can be used before jump
instruction so that after jumping to another location, using RES PC (loads PCS content
to PC) we can come back to next instruction after jump instruction. The combined use
of SAV PC and jump instruction, will act like a CALL instruction thus reducing the
number of instruction required without sacrificing functionality. We can also use jump
instruction alone to jump to specific location. Another special instruction is RETI that is
used to restore the PC value after the execution of interrupt service routine. SETCLRF
is a special instruction, which can be used to clear or set every flag in flag register.
EDINTER is another special one, which can be used to enable/disable interrupts and
mask the II interrupt when needed.
Serial data transmission is possible by the use of SOUT TBUFF and WAIT TXF.
Initially use SOUT TBUFF to send accumulator content to TBUFF. Then use WAIT
TXF for waiting until data is completely transmitted serially from TBUFF register. For
serial data reception, initially use WAIT RXF for waiting until data is completely
received by RBUFF register and then use SIN RBUFF to store the serial data received
in RBUFF register to accumulator. IN PO is used to take 8-bit parallel input data from
Port PO that is connected to external hardware. OUT PI is used to send 8-bit parallel
data to Port PI, which is connected to external hardware. Thus by using rxin, txout. Port
PO and Port PI, the proposed processor can communicate with different types of
external hardware.
Table 3.1 shows instruction set of proposed RISC processor. In the Table 3.1
Opcode column, rrr is register select code from 000 to 111, x is don't care, a is address
and d is data. In the Table 3.1 Mnemonic column, RX is register representing any one of
RO, Rl, R2, R3, R4, R5, R6 and R7, A is accumulator, M is Data Memory address and
add 18 is Instruction Memory address. In Operation column, update flags means update
the flags corresponding to that operation.
module ins_decoder(
input inclk,
input [7-.01 PORTO,
output [7:0] PORTl,
input 10,11,
output reg clk=l'bO,
incut IF0,IF1,TMF0,
inout [3:0] ZCBP,
output LR0,LRl,LR2,LR3,LR4,LR5,LR6,LR7,OER0,OERl,OER2,OER3,OER4,OER5,OER6,OER7,
output OERALO,OERAL1,OERAL2,OERAL3,OERAL4,OERAL5,OERAL6,OERAL7,
output RESET,
output OIRDM,RDDM,WRDM,
output LA,OA,LALU,OALU,INCA,DECA,CMA,RR,RL,
output EAND,EXOR,EOR,EADD,ESUB,
output SETCLRF,CLRTMRF,
output LINTCON,
output LIN,OOUT,
output R_CLK,DM_CLK,
output reg baudclk=l'bO,
output reg c=l'bO,
input rxin,
output LSIN,OSOUT,
output txout,
inout RXF,TXF,
inout [15:0] IM_DATABUS,
inout [17:0] PCS_DATABUS,
inout [7:0] SYSTEM_DATABUS,
output [11:0] DM_ADDRBUS,
output [17:0] IM_ADDRBUS,
output [7:0] ace);
reg [15:0] IRX;
reg [17:0] PCR;
reg [17:0] PC=18'bO;
reg [17:0] PCS;
reg [17:0] STACKPC;
reg [15:0] IM [0:262143];
reg [15:0] IR;
always@(posedge incll<)
begin
cik=~clk;
end
reg [7:0] cb =8'b0;
always@(posedge inclk)
begin
if(cb==217)
begin
baudclk=~baudclk; //baudclock generation for serial module
cb=0;
c=l'bl;
end
else
begin
cb=cb+l;
c=l'bO;
end
end
wireOPC,LPC,INCPC,OPCS,LPCS,OSTACKPC,OIRSYS,RDIM,LPCR,LIR;
wire DT,ALU,BR,MIO;
/ / I/O buses
assign SYSTEM_DATABUS = OIRSYS?IRX[7:0]:'bz;
assign DM_ADDRBUS = OIRDM?IRX[ll:0]:'b2;
assign PCS_DATABUS = (LPCS|OPCS)?((LPCS?PC:18'bO)|(OPCS?PCS:18'bO)):'bz;
assign IM_ADDRBUS = OPC?PC:'bz;
assign IM_DATABUS = RDIM?IM[IM_ADDRBUS]:'bz;
endmodule
/ * Tstate counter * /
module tstate_counter(
input clk,IFO,IFl,TMFaTXF,RXF, //input clock 25MHz
input [7:2] IRS,
output reg TF1=0,TX1=0,TX2=0);
reg [3:0]counter=5'b00;
reg[7:0]c = 8'b0;
always@(posedge dk)
begin
case(counter)
5'b0:begin
TFl=l'bl;
TXl=l'bO;
TX2=l'bO;
counter=5'b00001;
end
5'b00001:begin
if(IFO) //check for interrupt flag IFO
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b00011;
end
else if(IFl) //check for interrupt flag IFl
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b00011;
end
elseif(TMFO), //check for timer interrupt flag TMFO
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b00011;
end
elseif(IR8[7:2]==6'bll0101) //check for wait TXF instruction
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b01000;
end
else if(IR8[7:2]==6'bll0110) / / check for wait RXF instruction
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b01001;
end
else if(IR8[7:6]==2'blO) //check for branching instruction
begin
TFl=l'bO;
TXl=l'bl;
counter=5'b00010;
end
elseif(IR8[7:2]==6'bll0011) //check for RESTORE PC instruction
begin
TFl=l'bO;
TXl=l'bl;
counter=5'b0;
end
elseif(IR8[7:21==6'bll0100) //check for return interrupt instruction RET!
begin
TFl=l'bO;
TXl=l'bl;
counter=5'bO;
end
else if(IR8[7:2]==6'bll0000) / / check for halt instruction
begin
TFl=l'bO;
TXl=l'bO;
end
else
begin
TFl=l'bl;
TXl=l'bl;
counter=5'b00001;
end
end
5'b00010:begin
TFl=l'bO;
TXl=l'bO;
TX2=l'bl;
counter=5'bO;
end
5'b00011:begin
TFl=rbl;
TXl=l'bO;
TX2=l'bO;
counter=5'b00100;
end
5'b00100:begln
if(IR8[7:6]==2'blO)
begin
TFl=l'bO;
TXl=rbl;
counter=5'b00010;
end
elseif(IR8[7:2]==6'bll0011)
begin
TF1=0;
TXl=l'bl;
counter=5'bO;
end
else if(IR8[7:2]==6'bll0100)
begin
TFl=l'bO;
TXl=l'bl;
counter=5'b0;
end
else
begin
TFl=l'bl;
TXl=l'bl;
counter=5'b00101;
end
end
5'b00101:begin
if(IR8[7:6]==2'blO)
begin
TFl=l'bO;
TXl=l'bl;
counter=5'b00010;
end
else if(IR8[7:2]==6'bll0011)
begin
TFl=l'bO;
TXl=l'bl;
counter=5'b0;
end
else if(IR8[7:21==6'bll0100)
begin
TFl=l'bO;
TXl=l'bl;
counter=5'bO;
end
else
begin
TFl=l'bl;
TXl=l'bl;
counter=5'b00110;
end
end
5'b00110:begin
if(IR8[7:6]==2'blO)
begin
TFl=l'bO;
TXl=l'bl;
counter=5'b00010;
end
else if(IR8[7:2]==6'bll0011)
begin
TFl=l'bO;
TXl=l'bl;
counter=5'b0;
end
else if(IR8[7:2]==6'bll0100)
begin
TFl=l'bO;
TXl=rbl;
counter=5'b0;
end
else
begin
TFl=l'bl;
TXl=l'bl;
counter=5'b00111;
end
end
5'b00111:begin
if{IR8[7:6]==2'blO)
begin
TFl=l'bO;
TXl=l'bl;
counter=5'b00010;
end
else if(IR8[7:21==6'bll0011)
begin
TFl=l'bO;
TXl=l'bl;
counter=5'bO;
end
else lf(IR8[7:2]==6'bll0100)
begin
TFl=l'bO;
TXl=l'bl;
counter=5'b0;
end
else
begin
TFl=l'bl;
TXl=rbl;
counter=5'bO;
end
end
5'b01000:begin
if(IFO)
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b00011;
end
else if(IFl)
begin
TFl=rbO;
TXl=l'bO;
counter=5'b00011;
end
else if(TMFO)
begin
TFl=l'bO;
TXl=l'faO;
counter=5'b00011;
end
else if(TXF && c==217)
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b0;
end
else
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b01000;
end
end
5'b01001:begin
if(IFO)
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b00011;
end
else if(IFl)
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b00011;
end
elseif(TMFO)
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b00011;
end
else if(RXF && c==217)
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b0;
end
else
begin
TFl=l'bO;
TXl=l'bO;
counter=5'b01001;
end
end
endcase
end
always(S)(posedge elk)
begin
if(c==217)
c=0;
else
c=c+l;
end
endmodule
/ * Data Memory * /
module data_memory(
input elk, //gated clock for data memory
Input RDDM,WRDM,
input [11:0] DM_ADDRBUS,
inout [7:0] SYSTEM_DATABUS);
reg [7:0] DM [0:4095];
assign SYSTEM_DATABUS = RDDM?DM[DM_ADDRBUS]:'bz;
always@{negedge elk)
if(WRDM)
DM[DM_ADDRBUS]=SYSTEM_DATABUS;
endmodule
//accumulator instantiation
accumulator
acal(.clk{clk)..OA(OA),.LA(l^),.OALU(OALU),.LALU(l-ALU),.SYSTEM_DATABUS(SYSTEM_DATABUS),.ALU_D
ATABUS_A(ALU_DATABUS_A),.result(result),.Acc(aec),.INCA(INCA),.DECA(DECA),.CMA(CMA),.RR(RR),.RL(
RL),.RESET(RESET),.LIN(LIN),.OOUT{OOUT),.PORT0(PORT0),.PORTl(PORTl),.baudclk(baudclk),.c(c),.rxin{r
xin),.txout(txout),.LSIN(LSIN),.OSOUT(OSOUT),.RXF(RXF),.TXF(TXF));
//ALU operations
always@(negedge cll<)
begin
if(EAND)
begin
flagreg[3]=Z;
flagreg[0]=P;
end
else if(EXOR)
begin
flagreg[3]=Z;
flagreg[0]=P;
end
else if(EOR)
begin
flagreg[31=Z;
flagreg[01=P;
end
else if(EADD)
begin
fiagreg[3]=Z:
flagreg[2]=Q
fiagreg[01=P
end \
else if(ESUB)
begin
flagreg[3]=Z;
flagreg[l]=B;
flagreg[0]=P;
end
else if(SETCLRF)
begin
flagreg=SYSTE M_DATABUS[3:01;
end
end
endmodule
module accumulator!
input elk, //input clock 25MHz
input RESET,
input [7:0] PORTO,
output reg [7:0] PORTl,
input LIN,OOUT,
input LA,OA,OALU,LALU,INCA,DECA,CMA,RR,RL,
inout [7:0] SYSTEM_DATABUS,
inout [8:0] result,
output [7:0] ALU_DATABUS_A,
input baudclk, //baud clock 115200 per second
input c,
input rxin,
input LSIN,OSOUT,
output reg txout=l'bl,
output reg RXF=l'bO,TXF=l'bO,
output reg [7:0] Acc=8'b0);
reg [7:0] data;
reg [7:0] RBUFF;
reg [7:0] TBUFF;
reg [3:0] countrx=4'bl001;
reg\3:0l counttx=4'bl001;
reg t x b = l ' b l ;
//accumulator operations
assign SYSTEM_DATABUS = OA?Acc:'bz;
assign ALU_DATABUS_A = OALU?Acc:'bz;
wire temp;
assign temp = (RR|RL)?((RR?Acc[0]:l'bO)|(RL?Acc[7]:l'bO)):'bz;
always@(negedge elk)
begin
if(LA)
Acc=SYSTEM_DATABUS;
else if{LALU)
Ace = result[7:0];
else if(INCA)
Ace = Acc+1;
else if(DECA)
Ace = Acc-1;
else if(CMA)
Ace = ~Acc;
else if(RR)
begin
Ace = A c c » l ;
Acc[7] = temp;
end
else if{RL)
begin
Ace = A c e « l ;
Acc[0] = temp;
end
else if(RESET)
Acc=8'b0;
else if(LIN)
Acc=PORT0; //input port PORTO
else if(OOUT)
PORTl=Acc; //output port PORTl
else if(LSIN)
Acc=RBVif?;
else if(OSOUT)
begin
TBUFF=Acc; //TBUFF register in serial module
txb=l'bO;
end
else if(TXF & c)
txb=l'bl;
end
/ * Register set * /
module register_set(
input cll<, //gated clock for register set
input RESET,
input LR0,LR1,LR2,LR3,LR4,LR5,LR6,LR7,
input OER0,OERl,OER2,OER3,OER4,OER5,OER6,OER7,
input OERAL0,OERALl,OERAL2,OERAL3,OERAL4,OERAL5,OERAL6,OERAL7,
inout [7:0] SYSTEM_DATABUS,
output [7:0] ALU_DATABUS_B);
reg [7:0] R0,R1,R2,R3,R4,R5,R6,R7;
assign SYSTEM_DATABUS
{OER0|OERl|OER2|OER3|OER4|OER5|OER6|OER7)?((OER0?R0:8'b0)|(OERl?Rl:8'b0)|(OER2?R2:8'b0)
I (OER3?R3:8'bO) | (OER4?R4:8'b0) | (OER5?R5:8'bO) | (OER6?R6:8'bO) | (OER7?R7:8'bO)):'bz;
assign ALU_DATABUS_B
(OERALOI OERALl 10ERAL210ERAL310ERAL410ERAL510ERAL61 OERAL7)?((OERAL0?R0:8'b0) | (OERALIPR
l:8'b0) I (OERAL2?R2:8'bO) | (OERAL3?R3:8'b0) | (OERAL4?R4:8'b0) | (OERAL5?R5:8'b0) | (OERAL6?R6:8'b0) |
(OERAL7?R7:8'bO)):'bz;
always@{negedge elk)
begin
if(LRO)
RO=SYSTEM_DATABUS;
else if(LRl)
R1=SYSTEM_DATABUS;
else if(LR2)
R2=SYSTEM_DATABUS;
else if(LR3)
R3=SYSTEM_DATABUS;
else if(LR4)
R4=SYSTEM_DATABUS;
else if(LR5)
R5=SYSTEM_DATABUS;
else if(LR6)
R6=SYSTEM_DATABUS;
else if(LR7)
R7=SYSTEM_DATABUS;
else if(RESET)
begin
RO=8'bO;
Rl=8'b0;
R2=8'b0;
R3=8'bO;
R4=8'b0;
R5=8'b0
R6=8'b0
R7=8'b0
end
end
endmodule
/ * Interrupt Module * /
module interrupt_module(
input elk, //input clock 25MHz
input 10,11,
output reg IFO.IFl,
output reg TMFO=l'bO,
inout 17:0] SYSTEM_DATABUS,
input LINTCON,CLRTMRF);
reg [2:01INTCON=3'bO;
reg 19:0] timer=10'bO;
Figure 4.2 represents the simulation of Instruction Memory. When RDIM is high,
IMDATABUS gets the Instruction Memory content corresponding to the address
provided by IM_ADDRBUS. For example, when RDIM is high, IM_ADDRBUS
content is 00 0000 0000 0000 0001, IM_DATABUS gets 0000 1000 0000 0110, which
is corresponding to address location 00 0000 0000 0000 0001 of Instruction Memory.
Figure 4.3 represents the simulation of Data Memory. It shows that whenever RDIM is
high, SYSTEM_DATABUS gets the Data Memory content corresponding to the
address provided by DMADDRBUS. When WRDM is high. Data Memory gets
SYSTEM_DATABUS content, corresponding to address provided by DMADDRBUS.
For example, when WRDM is high and SYSTEMDATABUS content is 10101010,
DM_ADDRBUS content is 0000 0000 0001, at falling edge of clock Data Memory with
address location 0000 0000 0001 will be loaded by 10101010.
1,000 ns 1,050 ns
JMOifXXX
XXJOOOOi
XXX>000(
INMM llwue 1 1,000 IB 1,05)|« 1,100 m 1,1501» i.roore Lja-B 1.30On! l,350n5 l,-«Ons l.'^SOns 1
I^H —
[" •
,. .1
,
mt^^U.^— 1
I^H^ _Li_^'''
IH-
B ^ ^ n n.
HRHHH
mrm.
Er « r;.,.A
X ooiiooi X
Z2ZZZ2ZZ
ZZZZZ2ZZ
r T
X 000! mi X
t 1
inrnn
00000001! X 222222222
• iiijtzrz • ~ .•- ' tniiooio X 7fimn
• ^ ^ 9 n...
H ^ ^ ^ ^ ^ l QOOC urn X I 3000001 ' 00000010 X 1 moou X 00110010 X 000] 1001 _ x _ IIOOIID X 001: 3011 X OCOOOOi X ooooooo
LA "1
ooococoo
' ' 1, ' : 5101010
)
' •
00000001 }
OOOGCOIO
\
ooaoon X 1111009 >
00000100
OOCOOIQI 1
00000110
)
)?
ijoo^xiu
xxx)Oooa 00000000 X rrrnn x wic l O U X OOOOOOi
: X 22ZZZ2Z2 ;• 00000100 luoooo X oo^xoC'Oi >
KXMOOa
lamn x ( joooioi X 77777777 X oooooin ;• c'.lazin }
F • -J
'. .. \
r^
r ~ ~ " ~1
— - L_
•|
r
' ^ r
Figure 4.5 Simulation of Register set
Figure 4.6 represents simulation of ALU unit. When EXOR is high, ALU will perform
AND operation on 8-bit inputs ALU_DATABUS_A and ALUDATABUSB, result is
obtained on ALURESULT. Flag register is labelled as flagreg (4-bit), will update its
content according to the ALU operation. For example, when EXOR is high,
ALU_DATABUS_A is 01011001 and ALU_DATABUS_B is 01010001, hence
ALU_RESULT becomes 001111100. In this case, zero flag (Z) and parity flag (P) will
be updated, but carry flag (C) and borrow flag (B) will remain unchanged.
1 1 r-n
—1 r -1 i i
"^
. •
n n__r™i
• ; x _ t ;• ..™_
1
p-J ""' .1
• 1
r—' - 1
r"
xxxx XXX X 1 DOOOIIO X 0000 XJll
r
X 0000100*
-T X oeioiioi ) • 10110000 X iiiiiio X u i ln i l
xxxx XXX X lOlOOOO X OOOC KTOl X OOOOIOC' X 0101000! ) • 00010101 X <\ 0000100 X oooa )000
XXXWOOOOC : ZZZZ2ZZZZ X _ 0 .1010110 X 0000 0100 X i i i m i : 1 X 001111100 ) ' 000010000 X 1 njoocoio X 00001 0000
0000 X oc 11 X 0010 X 0011 X oui 1110
_j:-r_L_, .,.^^^
____j
r
rr~~~ r~"
Figure 4.6 Simulation of ALU unit
Figure 4.7 represents simulation of Interrupt module part 1. When LINTCON is high
and at falling edge of clock, INTCON register becomes 110, which is
SYSTEMDATABUS content. When INTCON register's MSB bit is 1 (bit 2), timer
interrupt will be enabled. When bit 1 of INTCON register is 1, both external interrupts
is will be enabled. When LSB bit (bit 0) of INTCON register is 0, interrupt 11 is not
masked, thus all three interrupt are enabled. In Figure 4.7, when 10 interrupt is high, IFO
flag is set. When II interrupt is high, IFl flag is set. When both 10 and II are high, only
IFO flag is set because it is having highest priority. It shows that timer flag TMFO is low
and timer register is incrementing in each clock cycle as timer interrupt is enabled.
lj20Qn
x::!L m _Eia_-
y, 00000110 X
Figure 4.8 represents simulation of Interrupt module part 2. In Figure 4.8, when timer
register reaches its maximum value 1023, timer flag TMFO is set. When CLRTMRF is
high and at falling edge of clock, TMFO is cleared.
J_
Figure 4.9 represents simulation of Serial Module. In Figure 4.9, rxin is input serial port
and txout is output serial port. Both ports are 1 at idle state. In Figure 4.9, elk is the
system clock at 25MHz and baudclk is baud rate required for serial communication,
which is 115200 per second. When rxin becomes 0, by UART protocol start bit has
arrived which is followed by 8-bit data. RXF flag is set when reception is completed,
which will be high for one baudclk. The 8-bit received data is stored in RBUFF register.
When LSIN is high and at falling edge of clock, accumulator becomes 11011100, which
is RBUFF content. When OSOUT is high and at falling edge of clock, TBUFF becomes
11011100, which is accumulator content. From next rising edge of baudclk, txout will
transmit the 8-bit data on TBUFF (11011100) serially by UART protocol. After
completion of transmission TXF flag is set, which will be high for one baudclk.
^ ^ ^ ^ ^ J ^ ^ ^ ^
2C'ts
1111t • r i 1 1 1 1 ] 1 1 ] 1 1 «« (Dis
1.1J
, sous , lOOus 12) us
J 1 1 1 1 1 I 1 1 1 1 1 1 1 1 11 1 1 1 1 1 I 1 1II11
l«us IB.JS ISOus aou! :JK5 ,
i ^ 9 ^ H i _ p _ n -E_FL aJiLE 1 m m^ ^ ™
n n n n r n D n n f:3_n_qLJILIW^
^ ^ H ^ ^ H M«li;f? ^ifH LJKil&Jm. ,:»iiJ. 1 -,-1.' , - , . .
, ^ ,, ^ L ^ „^ ,„^^
--_^
mwi.m = E 3 r ~ ' IfsAtnt ff^'' ^ 'I nu„".L_J
—1 r i_
- • , , - ,
^^^H^^^Hi
r~~i
W^B^^m . m
UAU^^H 0001 DOOO
X uoiiia
oooooou K lamoo
Figure 4.10 represents the simulation of control unit. In figure 4.10, Instruction Memory
IM is having data on 9 address locations (0-8). By referring to appendix A, the Opcode
corresponding to each address location can be find out. Instructions are MOVI A,
00000110 (IM [0]), MOV R6, A (IM [1]), MOVI A, 00000101 (IM [2]), ADD A, R6
(IM [3]), JUMP 00 0000000000000111 (IM [4] and IM [5]), DEC A (IM [6]), INC A
(IM [7]), HALT (IM [8]). By referring to appendix B, control signals corresponding to
each instruction and in which tstate it is occurred, can be find out. In Figure 4.10, TFl
represents fetch cycle, TXl represents execution cycle 1 and TX2 represents execution
cycle 2. Instruction in IM [0] is having OPC, RDIM and LIR in TFl and OIRSYS and
LA in TXl cycle. Instruction in IM [1] is having OPC, RDIM and LIR in TFl and LR6
and OA in TXl cycle. As control signals in fetch cycle is common for all-,fi^omnext
instruction onwards control signals in execution cycle (TXl and TX2) will be
considered. Instruction in IM [2] is having OALU, 0ERAL6, EADD and LALU in TXl
cycle. Instruction in IM [4] and IM [5] (JUMP) is having OPC, RDIM, OIRPC and
LPCR in TXl cycle and LPC and OPCR in TX2 cycle. Instruction in IM [6] is having
DECA in TXl cycle and instruction in IM [7] is having INCA in TXl cycle. IM [8] is
1,000 ns 1,050 (IS 1,100 ns I,l50n6 1,2X1 ns 1,250 ns 1,300 ns 1,350 ns
QOOO X 0001 ^
1
T
r ••' - • ^' -1
r~
l^.-r^.-.. :
1—J 1
l-.-zi: ""
F . ....^,.^ I ,.,.^'
^
I. : .:,.
I ,.^,^ i
trr
JSiMSHJSIIK ' * * * • - ^ v ^ s f • / . , , J « = -. t t - l K . ¥ - l . . : ^ i - - _ ~ ^' \— J. - — - • • - ' ~
r - . - ~ •-.
1
F" 1
r~—:
. -r.u.is^'^.t's' 4 i . ' "** - i J n " •"" ' 1
\ .ii
- ~l
r":^" ..J
r
^ — :' \: !
"M>*«4%i:faJ ._i.,„.;_
ZX DC xx DC
y ~ i 3000101 ~X~
ooooQoooa...")(ooi oiooooorXooo^i Moor)(ooooioooo CrXoioooiiiior; [ IflOOi MOOOOOOOOO yploit x:
3100 i~XiioeoooQo5~
woooa xxxxxxxxxxx 00000< OOOOOOOOOll]
IDC
ioSL~)(ooooMooo Cyoooiouooo: :ooooiooooQ.."X5ii ooitiiory" lOOOOOOOOOO 0000 xzx 0101001000. .00000000000000
OOOOIOOOOOOOOIIC OOOIOIIOOOOOOOOO OOOOlOOOOOOOOiei OlOOOmiOOOOOOO lOOOOOOOOOOOOOOO 0000000000000111 O1010100000( 0000 OIOIOOIOOOOOO. o
ooooicnoc)oooiio
OOOIOIIOC MOOOOO
OOOOMOOC XJOOlOl
010001111X100000
tOOOOOOOG 1000000
OOOOOOOOOOOOlll
OIOIOIOOC lOOOOOO
010100 IOC n o o o o o
UOOOOOOC3000000
HALT state, it stops fetching next instructions, an interrupt is required to exit from halt
state. The control unit will start fetch and execute from IM [0] until it reaches halt state.
For example consider the instruction ADD A, R6, which is at IM [3]. In fetch cycle
(TFl) PC content is 3, as OPC is high IMADDRBUS gets PC content (3). RDIM is
also high, thus IM_DATABUS gets the 16-bit Opcode 0100011110000000 (ADD A,
R6) corresponding to address provided by IMADDRBUS. LIR is high on fetch cycle,
hence at falling edge of clock. Instruction Register (IR) is loaded by 16-bit Opcode and
PC is incremented by one. Thus by the end of TFl cycle PC becomes 4 and ready for
fetching of next instruction during the execution of current instruction. At rising edge of
next clock cycle, IR content is loaded to IRX register, which is used for decoding the
instruction and generating control signals corresponding to the Opcode provided. As IR
content is changed at every falling edge of clock and we need Opcode to be stable for
one full clock cycle for proper execution of instruction by pipelining architecture, IR
content is transferred to IRX. During execution cycle (TXl) OALU, 0ERAL6, EADD
and LALU are high. As OALU is high, ALU_DATABUS_A gets accumulator content
(00000101) and as 0ERAL6 is high, ALUDATABUSB gets register R6 content. As
EADD is high, result of addition is on ALU_RESULT (00001101). When LALU is
high and at falling edge of clock, accumulator (ace) becomes 00001101, which is
ALURESULT content. In flag register ZCBP, zero flag (Z), carry flag (C) and parity
flag (P) are updated. In Figure 4.10, TXl and TFl are high during same time for most
part of execution, which shows the pipelining architecture and TX2 is high only during
the execution of JUMP instruction (branching instruction), which is an exception to the
pipelining architecture. In Figure 4.10, DT, ALU, BR and MIO represents type of
instruction, which are data transfer, arithmetic and logical, branching and machine
control and I/O instructions respectively.
The Table 4.1 presents a simple twenty-line assembly language program for simulation.
In Table 4.1, IM addr represents instruction memory address and ISR is Interrupt
Service Routine. The Table 4.2 shows the device utilization of the Spartan-3E Starter
kit FPGA board. The Figure 4.11, 4.12 and 4.13 shows the simulation results of the
proposed 8-bit RISC processor with pipeline architecture for the program in Table 4.2.
Simulation results in Figure 4.11 shows the control signals and clock signals
corresponds to Register set and Data Memory. RCLK and DMCLK are the gated-
clock inputs corresponding to Register set and Data Memory, which are activated only
when instruction corresponding to loading a memory/general-purpose register is
fetched.
In Figure 4.12, Accumulator is represented as ace with initial value 01010010 and flag
register is represented as ZCBP. It also shows interrupt flags and relevant buses used by
the processor for executing the given program. It shows that the STACKPC stores
current PC value whenever an interrupt flag is raised, PCS stores address of next
instruction after jump instruction (11) and PCR store address of the next instruction to
be executed (19), while a branching instruction is being executed.
In Figure 4.13, DT, ALU, BR and MIO represents type of instruction, which are data
transfer, arithmetic and logical, branching and machine control and I/O instructions
respectively. The tstates TFl, TXl and TX2 are also shown, which behaves in different
manner, while a branching instruction is being executed or an interrupt flag is raised.
Control Unit will generate 59 control signals and 4 clock signals while simulating the
program in Table 4.1, but for convenience, by Figure 4.11, 4.12 and 4.13, major part of
total simulation is shown.
UXOra 1.300 m 1,^00 rt 1,500 (« 1,600ns IjTOCns l,«X!ns i.m)ii! 1 2.000m ijmni^
r —1
r—J
fSSl
ra n rJ
n
L_ElL_Cn_ ca..n I i_a_n_ o n e 3 a n n n n D n
z X 1 X z
r-~i
r~n
o r rT tr-i t —1 Fjn
Ffe^ 1—1 r n p-»Ti
L .. . (
J = l
STOoc:x5rirxnrogyrxMo-xooi7)0(;g::xr>n
DODC
nxnczTuc IDQDQDCDCO )(XXX33EXX)®C XTDC DCO
j®ai)(X)(X)C2iX I Z X J D O G [XXI XXsXlDQaDQC SDOCOQCEDO
omooio "X 00000110 y oooooiai X~«ooooii X~ ^ oo<3ubo Xooo.-X oooiooio"
XXX., Xoii Xooo- Xooo XoQO^ )@3^rxTOr)gior)(m: \iooo«xqo...XooQrrxnoiooooQ. ^ lOOCOOOQOgOOOQQC
^^Fryu«noQorxoi5rxoiorxaoiioodQo; )(0QQ..,.)(H0iOOQG0...)(00rr)
«Mxxxx»--XooO(iiQioooQor^kcoooooo«xxxxxiX5i|rXiiiBiiq»rx^ X^^QC"
[5] Jagrit Kathuria, M.Ayoub khan, Arti noor, "A review of clock gating techniques",
International Journal of Electronics and Communication Engineering, Vol. 1, No. 2
pp.l06-114,Aug2011.
[8] HU Hua, BAI Feng-e. Design and Simulation of UART Serial Communication
Module Based on Verilog HDL [J]. J ISUANJ IYU XIANDAIHUA 2008 Vol. 8
[9] Antonio Hernandez Zavala, Oscar Camacho Nieto, Jorge A. Huerta Ruelas, Arodi
R. Carvallo Dominguez, "Design of a General Purpose 8-bit RISC Processor for
Computer Architecture Learning" Computacion y Sistemas, Vol. 19, No. 2, 2015, pp.
371-385.
[10] Patrick Stakem, "4- and 8-bit Microprocessors, Architecture and History", 2 June
2013.
[13] https://en.wikipedia.org/wiki/Reduced_instruction_set_coinputing
[14] Ralph Grishman, "Assembly Language Programming for the Control Data 6000
Series", Algorithmics Press, 1974.
[16] Carlo Sequin, David Patterson, "Design and bnplementation of RISC I",
Proceedings of the Advanced Course on VLSI Architecture, University of Bristol, July
1982.
[17] Ramandeep Kaur, Anuj, "8 Bit RISC Processor Using Verilog HDL", International
Journal of Engineering Research and Applications (IJERA), Vol. 4, Issue 3, March
2014, pp. 417-422.
[18] R Uma, "Design and performance analysis of 8 bit RISC processor using Xilinx
tools". International Journal of Engineering Research and Applications (IJERA),
March-April 2012, pp. 053-058
[19] Ahmad Jamal Salim, Sani Irwan Md Salim, Nur Raihana Samsudin, Yewguan Soo,
"Customized Instruction Set Simulation for Soft-core RISC Processor," IEEE
Transactions on Control and System Graduate Research Colloquium (ICSGRC 2012)",
2012, pp. 38-42.
APPENDIX A
INSTRUCTION SET ARCHITECTURE
DATA TRANSFER INSTRUCTIONS
3. MOV A, M ;M=>A
Group Mem/ Data Memory address
S/D
code Reg
0 0 1 0 @ @ @ @ @ @ @ \ @ @ @ @ @
A = Accumulator
M == Data Memory address
RX = Register, can be any one of RO, Rl, R2, R3, R4, R5, R6 or R7
Group code = 00 represents data transfer operation
Mem = Memory => presence denoted as 1
Reg = Register ==> presence denoted as 0
S - Source => presence denoted as 0
D = Destination => presence denoted as 1
Im data=Immediate data =>presence denoted as 1
Register code = startsfrom000 to 111
X = don't care bits
@ = address bits
# = data bits
ARITHMETIC AND LOGICAL INSTRUCTIONS
6.CMA ; A <= ~A
Group ALU/ACC/CLR/ Ace code Don't care
code INTER op
0 1 0 1 0 0 0 X X X X X X X X X
ALU/ACC/CLR/INTER op
ALU = ALU operation => denoted as 00
ACC = Accumulator operation => denoted as 01
CLR = Clear operation => denoted as 10
INTER = Interrupt operation => denoted as 11
Group code = 01 represents Arithmetic and logic operation
8. DEC A ; A <= A-1
Group ALU/ACC/CLR/ Ace code Don't care
code INTER op
0 1 0 1 0 1 0 X X X X X X X X X
BRANCHING INSTRUCTIONS
MC/IO
MC = Machine control instruction => presence denoted as 0
10 = Input/output instruction => presence denoted as 1
APPENDIX B
MICRO OPERATIONS
DATA TRANSFER INSTRUCTIONS
1. MOV A, RX
TFl TXl
PC -> IM
IMdata -> IR
RX -> A
PC+1 -> PC
OPC
OERX
RDIM
LA
LIR
2. MOV RX, A
TFl TXl
PC -> IM
PC+1 -> PC
OPC
LRX
RDIM
OA
LIR
3. MOV A, M
TFl TXl
PC->IM
IRX[11:0] -> DM
IMdata -> IR
DMdata->A
PC+1-> PC
OPC RDDM
RDIM OIRDM
LIR LA
4. MOV M, A
TFl TXl
PC->IM
IRX[11:0]->DM
IMdata -> IR
A->DMdata
PC+1 -> PC
OPC WRDM
RDIM OIRDM
LIR OA
PC -> IM
IMdata->IR IRX[7:0]->A
PC+1-> PC
OPC
OIRSYS
RDIM
LA
LIR
At the positive edge of every TXl cycle of all instructions, IR content is moved to IRX
for execution of instruction
PC->IM
PC+1-> PC
OALU
OERALX
OPC
BAND
RDIM
LALU
LIR
update flags
2. XOR A, RX
TFl TXl
PC -> IM
PC+1 -> PC
OALU
OPC OERALX
RDIM EXOR
LIR LALU
update flags
3. OR A, RX
TFl TXl
PC->IM
IMdata->1R A <- A or RX
PC+1 -> PC
OALU
OPC OERALX
RDIM EOR
LIR LALU
update flags
4. ADD A, RX
TFl TXl
PC->IM
PC+1 -> PC
OALU
OERALX
OPC
EADD
RDIM
LALU
LIR
update flags
5. SUB A, RX
TFl TXl
PC -> IM
PC+1 -> PC
OALU
OERALX
OPC
ESUB
RDIM
LALU
LIR
update flags
6.CMA
TFl TXl
PC -> IM
PC+1 -> PC
OPC
RDIM CMA
LIR
7. INC A
TFl TXl
PC -> IM
PC+1 -> PC
OPC
RDIM INCA
LIR
8. DEC A
TFl TXl
PC->IM
PC+1 -> PC
OPC
RDIM DECA
LIR
9.RRA
TFl TXl
PC -> IM
IMdata->IR A[7]<-A[0]
OPC
RDIM RR
LIR
lO.RLA
TFl TXl
PC -> IM
OPC
RDIM RL
LIR
PC->IM
PC+1 -> PC
OPC
OIRSYS
RDIM
SETCLRF
LIR
12. CLRTMRF
TFl TXl
PC -> IM
IMdata->IR TMFO=>0
PC+1 -> PC
OPC
RDIM CLRTMRF
LIR
13. EDINTER 3-bit data
TFl TXl
PC -> IM
PC+1 -> PC
OPC
OIRSYS
RDIM
LINTCON
LIR
BRANCHING INSTRUCTIONS
1. JUMP addl8
TFl TXl TX2
PC->IM PC->IM
OPC
OPC
RDIM LPC
RDIM
OIRPC OPCR
LIR
LPCR
2. JZ addl8
TFl TXl TX2
PC->IM PC->IM
IMdata->IR IMdata->PCR[15:0]
PCR -> PC
PC+1 -> PC IRX[10:9] -> PCR[17:16]
OPC OPCR
OPC
RDIM check zero flag
RDIM
OIRPC if Z=l LPC is set
LIR
LPCR elseifZ=OINCPCisset
3. JC addl8
TFl TXl TX2
PC -> IM PC->IM
OPC OPCR
OPC
RDIM check carry flag
RDIM
OIRPC If C=l LPC is set
LIR
LPCR elseifC=OINCPCIsset
4. JB addl8
TFl TXl TX2
PC->IM PC -> IM
OPC OPCR
OPC
RDIM check borrow flag
RDIM
OIRPC If BF=1 LPC is set
LIR
LPCR elseifBF=OINCPCisset
5. JP addl8
TFl TXl TX2
PC -> IM PC->IM
OPC OPCR
OPC
RDIM check parity flag
RDIM
OIRPC if P=l LPC is set
LIR
LPCR elseif P=OiNCPCisset
MACHINE CONTROL AND lO INSTRUCTIONS
I.HLT
TFl TXl
PC->IM
Stops operations. An interrupt is
IMdata -> IR
necessary to exit from the halt state.
PC+1 -> PC
OPC
RDIM -
LIR
2. RESET
TFl TXl
PC->IM
PC+1 -> PC
OPC
RDIM RESET
LIR
3. SAV PC
TFl TXl
PC -> IM
PC+1 -> PC
OPC
RDIM LPCS
LIR
4. RES PC
TFl TXl
PC->IM
PC+1 -> PC
OPC
RDIM OPCS
LIR
5. RETI
TFl TXl
PC->IM
PC+1 -> PC
OPC
RDIM OSTACKPC
LIR
6. WAIT TXF
TFl TXl
PC->IM
PC+1 -> PC
OPC
RDIM -
LIR
7. WAIT RXF
TFl TXl
PC -> IM
PC+1 -> PC
OPC
RDIM -
LIR
8. IN PO
TFl TXl
PC -> IM
PC+1 -> PC
OPC
RDIM LIN
LIR
9. OUT PI
TFl TXl
PC->IM
IMdata->IR A->P1
PC+1 -> PC
OPC
RDIM OOUT
LIR
10. SIN RBUFF
TFl TXl
PC->IM
IMdata->IR RBUFF-> A
PC+1 -> PC
OPC
RDIM LSIN
LIR
ll.SOUTTBUFF
TFl TXl
PC -> IM
PC+1 -> PC
OPC
RDIM LSIN
LIR
V5/^^__j^.