0% found this document useful (0 votes)
17 views19 pages

Simple CPU

SDAFSF

Uploaded by

chief artificer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views19 pages

Simple CPU

SDAFSF

Uploaded by

chief artificer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

12/26/2017 Simple CPU

Simple CPU :)
Modern CPU's are complex beasts, highly optimised and tricky to understand. This makes it very difficult to see why it was constructed in the way it was. Part of the
problem is the requirement for backwards compatibility i.e. a new processor has to be able to run code from the previous generations. This incremental development
can result in a very confusing instruction set and 'cluttered' hardware architecture. Also, when you do start to dig down the literature a lot of things are not always
fully disclosed i.e. to protect those industrial secrets. However, at their hearts all processors are simple machines and in some respects have not changed that much
since the 1940s i.e. they still use instructions and perform a Fetch-Decode-Execute cycle. Therefore, to prove that processors are actually very simple to build and
understand i developed a simple 8bit CPU architecture that can be implemented in an FPGA. To be honest it wasn't really designed, it evolved, therefore, the
hardware could be optimised quite a bit. However, the aim was to break the processor down into its fundamental building blocks i.e. Boolean logic gates. Then
combining these to form more complex components e.g. adders, multiplexers, flip-flop, counters etc, which are at the heart of any computer. The basic block
diagram of this computer is shown in figure 1, a very simple machine, made from registers, multiplexers and an adder. The operation of this machine and its
components were discussed in Lectures from a top level view point. To give a different point of view i'm now going to explain its operation from the bottom up. This
processor will be implemented in a Spartan 3 FPGA, its hardware defined in schematics. Each schematic can be downloaded and simulated using the Xilinx ISE
ISim tool.

Figure 1 : Simple CPU

A computer is made up of four basic building blocks:

Logic: every block within the computer can be considered to be made from Boolean logic gate, however, this category refers specific, larger logic blocks e.g.
adders, address decoders, instruction decoders etc.
Multiplexers: from one point of view a computer just moves information from one point to another. Controlling the path taken by this information are
multiplexers, switching junctions, allowing information to be passed between functional blocks.
Registers: fast, short term memory. As part of the Fetch-Decode-Execute cycle a computer needs to remember its state, the instruction to be processed and any
results generated.
Memory: this computer uses a classic Von Neumann architecture i.e. one memory, storing both the program (instructions) to be executed and the data to be
processed in the same memory device.

This processor has three multiplexers (MUX) controlling the data and address buses. Multiplexers are switches allowing the processor to select information from
multiple data sources and route it to a single destination. To select which data source should be used a multiplexer has one or more control lines as shown in figure 2.
This 2:1 MUX has two data inputs (A,B), one output (Z) and an input (SEL), selecting which one of the two inputs should be connected to its output. This
hardware’s operation is defined by its truth table shown in figure 3. When SEL=0 input A is connected to the output Z. When SEL=1 input B is connected to the
output Z.

Figure 2 : 2:1 bit multiplexer, circuit diagram (top), truth table (bottom)

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 1/19
12/26/2017 Simple CPU
This multiplexer can select between two bits, however, within the processor we need to select between 8bit buses. To achieved this the multiplexer is replicated eight
times i.e. one per bit in the bus, as shown in figure 3. Note, each mux2_1_1 rectangle (symbol) contains the circuit shown in figure 2. To save space i have only
shown the first three multiplexers, a full circuit diagram is available here: (Link ). The circuit symbol for this 8bit multiplexer is shown in figure 4, it's interface has
three 8bit buses (thick lines) and one signal (thin lines):

A(7:0) - input bus, eight bit values, bits labelled A7,A6,A5,A4,A3,A2,A1,A0


B(7:0) - input bus, eight bit values, bits labelled B7,B6,B5,B4,B3,B2,B1,B0
Z(7:0) - output bus, eight bit values, bits labelled Z7,Z6,Z5,Z4,Z3,Z2,Z1,Z0
SEL - input signal, Boolean, selects which input is connected to the output

Figure 3 : 2:1 byte multiplexer circuit diagram (first three stages only)

Figure 4 : 2:1 bye multiplexer symbol

In addition to 2:1 multiplexers shown in figure 1 the ALU also needs a 4:1 multiplexer (discussed later) i.e. a multiplexer that has four inputs and one output. This
can be constructed from three 2:1 byte multiplexers, as shown in figure 5. Note, each mux2_1_8_v1 rectangle (symbol) contains the circuit shown in figure 3. The
Xilinx schematics and symbols for these multiplexers can be downloaded here: (Link ).

A(7:0) - input bus, eight bit values, bits labelled A7,A6,A5,A4,A3,A2,A1,A0


B(7:0) - input bus, eight bit values, bits labelled B7,B6,B5,B4,B3,B2,B1,B0
C(7:0) - input bus, eight bit values, bits labelled C7,C6,C5,C4,C3,C2,C1,C0
D(7:0) - input bus, eight bit values, bits labelled D7,D6,D5,D4,D3,D2,D1,D0
Z(7:0) - output bus, eight bit values, bits labelled Z7,Z6,Z5,Z4,Z3,Z2,Z1,Z0
SEL0 - input signal, Boolean, first stage selection
SEL1 - input signal, Boolean, second stage selection

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 2/19
12/26/2017 Simple CPU

Figure 5 : 4:1 byte multiplexer, circuit diagram and truth table

All number crunching is performed in the Arithmetic and Logic Unit (ALU), implementing its main arithmetic functions. To implement an ADD function, one of the
core instructions of any computer, there are a number of different hardware solutions, each having advantages and disadvantages. However, to avoid an in depth
discussions of their relative merits i'm going to stick with the basic half adder, as shown in figure 6. This circuit adds together two bits A and B, producing a Sum
and Carry. Working through the truth table you can see the addition process 0+0=0, 0+1=1, 1+0=1. As this is a binary (base 2) machine, the maximum value any
digit can store is 1, therefore, when A=B=1 the Sum output can not represent the result of 2, so a Carry is generated. This Carry would then be added to next digit in
the number.

Figure 6 : Half adder, circuit diagram (top), truth table (bottom)

On its own a half adder is not that useful as we need to add together 8bit numbers, but it can be used as the building block for a full adder, as shown in figure 7.
Note, each half_adder rectangle (symbol) contains the circuit shown in figure 6. This circuit can add together three bits: A,B and Cin. Another way to think about
this circuit is that it counts the number of 1's:

0+0+0=00 : all zeros, result 00


0+0+1=01 : one 1, result 01
0+1+1=10 : two 1s, result 10, or decimal 2
1+1+1=11 : three 1s, result 11, or decimal 3

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 3/19
12/26/2017 Simple CPU

Figure 7 : Full adder, circuit diagram (top), truth table (bottom)

Again at first glance the ability to add three bits together may not seem that useful, however, if we replicate this hardware eight times, connecting the Carry out
(Cout) of the previous stage to the Carry in (Cin) of the next we can build a Ripple adder, as shown in figure 8. Note, again each rectangle (symbol) contains a sub-
circuit, in this case full_adder as shown in figure 7. To save space i have only shown the first three full adders, a full circuit diagram is available here: (Link ). An
alternative way of considering this hardware is the pseudo code shown in figure 9. Each full adder is a modulus 2 adder, conceptually the addition process starts with
the least significant digit (LSD) and 'ripples' through the hardware to the most significant digit (MSD) i.e. bits X0 and Y0 are added together to produce a Sum Z0
and a Carry C1, this Carry is then added to the next significant digits X1 and Y1 etc. This sequential behaviour does limit the hardware's performance, but, don't
forget that the hardware associated with each digit's addition is all working in parallel e.g. best case performance: 987+12=999, no carries would be generated, the
additions of 9+0, 8+1 and 7+2 would all be performed in parallel and complete in one unit of time. However, worst case performance: 999+1=1000, this would take
four units of time owing to the carries having to ripple through the hardware from the LSD to the MSD. Note, its silly to say, but important to remember that
hardware is not software. When analysing hardware some elements may have a sequential behaviours, but ALL logic gates will be working in parallel.

Figure 8 : Ripple adder circuit (first three stages only)

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 4/19
12/26/2017 Simple CPU

Figure 9 : Ripple adder architecture

Figure 10 : Ripple adder symbol

The top level circuit symbol for the ripple adder is shown in figure 10 (containing the circuit shown in figure 8), this symbol's interface has three 8bit buses (thick
lines) and two signals (thin lines):

A(7:0) - input bus, eight bit values, bits labelled A7,A6,A5,A4,A3,A2,A1,A0


B(7:0) - input bus, eight bit values, bits labelled B7,B6,B5,B4,B3,B2,B1,B0
Cin - input signal, Boolean, Carry input from previous calculations
Cout - output signal, Boolean, Carry out from ripple adder

The complete ALU circuit diagram is shown in figure 11. With a few small modifications the ripple adder can be used to implement a subtraction function. The
subtraction hardware can be implemented using 2s complement i.e. subtraction by the addition of negative numbers. To generate a negative number each bit is
inverted and then 1 is added to the result, as shown in the example below.

123 45 = 000101101 123 = 01111011 001111011


- 45 inv = 111010010 + 111010011
----- +1 = 111010011 -----------
78 001001110 = 78
----- -----------
11111 1

To invert each bit an array of XOR logic gates are used, bitwise_inv_v1, as shown in figure 12. This circuit has an 8bit input bus (A), each bit is XORed with the
signal EN. XORing a bit with 0, returns the same value. XORing a bit with 1, returns the inverse of that value. To add 1 the Carry-In (Cin) signal to the ripple adder
is set to 1, incrementing the final result. In the ALU this is controlled using signals S2 and S3. The ADD and SUB functions can be simulated using the Xilinx ISim
software tools, as shown in figure 13. In this waveform diagram the calculations 123+45 and 123-45 are performed. Remember when you use a 2's complemented
representation in a calculation the final Carry-out is ignored. The Xilinx schematics, symbols and VHDL testbench for this ALU can be downloaded here: (Link ).

A B Z
0 0 0 A XOR 0 = A
0 1 1 A XOR 1 = NOT A
1 0 1
1 1 0

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 5/19
12/26/2017 Simple CPU

Figure 11 : ALU circuit diagram

Figure 12 : Bitwise XOR

Figure 13 : ALU simulation, ADD and SUB

The Carry-in input can also be used to perform an increment function i.e. Z=A+0+1. This is quite a common requirement within a program e.g. incrementing a
counter, or the processor's program counter. To achieve this function the B input of the adder needs to be set to zero. To do this the replicate_v1 and bitwise_and_v1
circuits shown in figures 14 and 15 are used. The replicate_v1 component uses buffers to drive the same signal onto each bit of its output bus Z. These signals are
then ANDed with the data on the B input of the ALU. If they are ANDed with 1, the value on the B input of the adder is unaffected. If they are ANDed with 0, the

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 6/19
12/26/2017 Simple CPU
value on the B input of the adder is set to zero. In the ALU this is controlled using signals S2 and S4. The INC function can be simulated using the Xilinx ISim
software tools, as shown in figure 16. In this waveform diagram the calculations 123+1 and 45+1 are performed.

Figure 14 : replicate

Figure 15 : Bitwise AND

Figure 16 : ALU simulation, INC

The bitwise_and_v1 component is also used in the ALU to perform the bitwise logical AND function i.e. Z=A AND B, as shown in the example below.

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 7/19
12/26/2017 Simple CPU

A B Z 10101010
0 0 0 A AND 0 = 0 & 11110000
0 1 0 A AND 1 = A ----------
1 0 0 10100000
1 1 1 ----------

To select the ALU's function the final 4:1 multiplexer is used, control lines S0 and S1 selecting either the adder's output, bitwise AND output, input A or Input B.
The full set of control signals are shown below.

S4 S3 S2 S1 S0 Z
0 0 0 0 0 ADD (A+B)
0 0 0 0 1 BITWISE AND (A&B)
0 0 0 1 0 INPUT A
0 0 0 1 1 INPUT B
0 1 1 0 0 SUBTRACT (A-B)
1 0 1 0 0 INCREMENT (A+1)
1 0 0 0 0 INPUT A
0 0 1 0 0 ADD (A+B)+1
0 1 0 0 0 SUBTRACT (A-B)-1

Figure 17 : Arithmetic and Logic Unit (ALU) symbol

Computer's execute instructions using the Fetch-Decode-Execute cycle, therefore, the processor must remember what phase it is in so that it can progress to the next.
This temporary memory is implemented using flip-flops, each storing 1 bit of data, as defined by the state table shown in figure 18. A flip-flop has an input pin D
and an output pin Q, the value on D is written to Q when there is a change from a logic 0 to a logic 1 on the CLK pin. Another way to think about the CLK pin is
that it is the write, or update control signal i.e. the CLK line is pulsed to store a value. Owing to electronic reasons which i will quick skip over all CLK lines must
be connected to the same system clock i.e. a square wave signal that determines the operating speed of the processor. This would mean that every flip-flop would
update its output every clock cycle, which would not be very useful. Therefore, to control when different flip-flops update their outputs we use the clock enable
input pin CE. If CE=0 the CLK pin is ignored. If CE=1 the CLK pin is used.

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 8/19
12/26/2017 Simple CPU
Figure 18 : D-type flip-flip

Within the processor temporary values are stored in registers, within a register each bit is stored in a flip-flop, as shown in figures 19,20 and 21. To make larger
registers multiple smaller registers are grouped together. Note, the rectangular components register_4 and register_8 contain the circuit diagrams shown in figure 19
and 20 respectively. The operation of the 8bit register can be simulated using the Xilinx ISim software tools, as shown in figure 22. In this waveform diagram the
values 123 and 45 are stored in the register, using the CLK, CLR and CE lines to control when these values are updated. The Xilinx schematics, symbols and VHDL
testbench for these registers can be downloaded here: (Link ).

Figure 19 : Four bit register, symbol (left), circuit (right)

Figure 20 : Eight bit register, symbol (left), circuit (right)

Figure 21 : Sixteen bit register, symbol (left), circuit (right)

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 9/19
12/26/2017 Simple CPU

Figure 22 : Eight bit register simulation

Flip-flops are also used to generate the sequence of control signals needed to perform the functions defined by each instruction. These are contained within the
decoder block (figure 1), the circuit diagram symbol form this component is shown in figure 23. Inside this component are the instruction decoding logic and
sequence generators needed to control the processor, as shown in figure 24. A high resolution diagram can be downloaded here: (Link ).

MUXA : output, ALU A input MUX control


MUXB : output, ALU B input MUX control
MUXC : output, address MUX control, selecting PC or IR
EN_DA : output, accumulator (ACC) register update control
EN_PC : output, program counter (PC) register update control
EN_IR : output, instruction register (IR) update control
RAM_WE : output, memory write enable control
ALU_S0 : output, ALU control line
ALU_S1 : output, ALU control line
ALU_S2 : output, ALU control line
ALU_S3 : output, ALU control line
ALU_S4 : output, ALU control line
IR : input bus, 8bits, high byte of instruction register, contains opcode
ZERO : input, driven by 8bit NOR gate connected to ALU output, if 1 indicates result is zero
CARRY : input, driven by carry out (Cout) of ALU
CLK : input, system clock
CE : input, clock enable, normally set to 1, if set to 0 processor will HALT
CLR : input, system reset, if pulsed high system will be reset

Figure 23 : Decoder symbol

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 10/19
12/26/2017 Simple CPU

Figure 24 : Decoder circuit diagram

This processor has a slightly modified Fetch-Decode-Execute cycle, to save an adder i added an additional phase, it now has a Fetch-Decode-Execute-Increment
cycle. The final phase incrementing the PC to the address of the next instruction (when needed). To identify which phase the processor is in a sequence generator is
used, as shown in figures 25 and 26. This is a simple ring counter, using a one-hot encoded value to indicate the processor's state, as shown in figure 27. Initially the
value 1000 is loaded into the counter (fetch code), on each clock pulse the one-hot bit is then moved along the flip-flop chain, looping back to the start after four
clock cycles. To determine the processor's state you simply identify which bit position is set to a logic 1.

1000 : Fetch
0100 : Decode
0010 : Execute
0001 : Increment

Figure 25 : Sequence generator symbol

Figure 26 : Sequence generator

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 11/19
12/26/2017 Simple CPU
Figure 27 : Sequence generator simulation

During the Fetch phase the current instruction, pointed to by the program counter (PC) is loaded into the instruction register (IR). Then in the decode phase the high
byte of the 16bit instruction is decoded by the instruction decoder shown in figure 28. A high resolution diagram can be downloaded here: (Link ). As discussed in
Lectures this processor only has a very limited instruction set:

Load ACC kk : 0000 XXXX KKKKKKKK


Add ACC kk : 0100 XXXX KKKKKKKK
And ACC kk : 0001 XXXX KKKKKKKK
Sub ACC kk : 0110 XXXX KKKKKKKK
Input ACC pp : 1010 XXXX PPPPPPPP
Output ACC pp : 1110 XXXX PPPPPPPP
Jump U aa : 1000 XXXX AAAAAAAA
Jump Z aa : 1001 00XX AAAAAAAA
Jump C aa : 1001 10XX AAAAAAAA
Jump NZ aa : 1001 01XX AAAAAAAA
Jump NC aa : 1001 11XX AAAAAAAA

The top 4-6 bits defining the opcode, where X=Not used, K=Constant, A=Instruction Address, P=Data Address. For those of you who know your machine code you
will recognise these instructions are based on the original PicoBlaze machine code (Link), as this is the next processor architecture we will look at in Lectures. The
instruction decoder converts the unique 8bit opcode into a one-hot value, these are then used during the Decode and Execute phases to control the processor's
hardware. To ensure these signals are not active during the Fetch and Increment phases they are ANDed with the logical OR of the Decode and Execute signals from
the sequence generator, as shown in figure 29.

Figure 28 : Instruction decoder

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 12/19
12/26/2017 Simple CPU

Figure 29 : Complete instruction decoder

The processor supports unconditional and conditional JUMP instructions. The conditional JUMP instructions are based on the result of the last ALU operation i.e.
the zero and carry bits for the ADD, SUB and AND instructions. These are stored in a 2bit register, as shown in figure 30. Note, the zero bit is generated by an 8bit
NOR gate connected to the ALU's output bus i.e. a NOR only produce a 1 when all of its inputs are 0.

Figure 30 : Status register

The sequence_generator, instruction_decoder and status_register form the core elements of the processor's control unit (decoder block in figure 1). The signals from
these units are used to generate the control signals for the system MUXs, ALU and REGs, as shown in figure 31. This table defines the state of each control signal,
for each phase, to implement that instruction's function. This table also defines the control signals needed for the Fetch and Increment phases. The logic that drives
these control signals (right hand side of figure 24) is derived from this table.

Figure 31 : Control signals

Most of this control logic is quite intuitive, a slightly more complex bit is the Jump logic shown in figure 32. If the processor is in the Execute phase, the instruction
decoder and status signals determine if the program counter (PC) should be updated i.e. should the jump address be loaded into the PC. If a JUMP instruction is
taken, then the system does not need to increment the PC, as it already contains the address of the next instruction. Therefore, when the processor is in the Increment
phase it checks to see if a jump has been taken, if there has been the PC is not enabled i.e. the result PC+1 is not stored in the program counter.

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 13/19
12/26/2017 Simple CPU

Figure 32 : Jump logic

The memory used in the system (figure 33) stores both instructions and data i.e. a Von Neumann architecture. This could be constructed from the built in memory
components used in the FPGA, but they are a real pain to configure i.e. initialise with the machine code and data values needed. Therefore, decided to cheat and use
a bit of VHDL. This is a Hardware Description Language (HDL) representation of what the memory should do, abstracting away from the low level logic gates its
actually made from. This allow me to simply type in the binary values as shown in figure 34. Note, only shown the first few values. This description is then
synthesised (converted) by the Xilinx tools into the required hardware components. The complete computer system is shown in figure 35. A high resolution diagram
can be downloaded here: (Link ).

Figure 33 : RAM

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 14/19
12/26/2017 Simple CPU

Figure 34 : VHDL (asynchronous, later modified to be synchronous)

The test program shown in figure 34 (machine code, comments on the right in green), loads the value stored in memory location 10 and adds 10 to it. If this does not
generate an overflow i.e. a value larger than 255, the result is written back to memory location 10. If it does generate an overflow i.e. 250+10, the value is saturated
to the maximum value i.e. 255. Therefore, program counts from 0 to 250 and then stops, as shown in the simulation shown in figure 36, look at the bottom row, this
shows the contents of memory location 10 as a hexadecimal value. The Xilinx schematics, symbols and VHDL testbench for the complete system can be
downloaded here: (Link ). Have a play, modify the data values or write your own program, then re-run the simulation (top_level_v1_tb).

Figure 35 : Complete system

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 15/19
12/26/2017 Simple CPU

Figure 36 : Test program simulation

A requirement when writing you first program on any new processor is to print "HELLO WORLD" to the screen. Normally a simple task, however, for a processor
with no screen and only seven basic instructions a little more of a challenge. The simplest way to add a screen is to use a serial terminal (Link)(Link), then bit-bang
out serial data packets (Link). However, you first need an output port, as shown in figure 37. This is simply a flip-flop who's CE pin is only enabled for a specific
address, in this example address 0xFF (255). When data is written to this address, RAM is updated as normal, but, data bit 0 is also stored in this flip-flop, its Q
output being connected to the TX line of a serial bus.

Figure 37 : Serial Output port

Each character in the "HELLO WORLD" message string is stored in memory, locations 0xF0 to 0xFD, as an ASCII values (Link). To simplify the program they are
actually stored as their inverted forms e.g. H = 0x48 (01001000), inverted = 0xB7 (10110111). The program reads each character's bits and outputs these on the
serial port. The serial data packet format for the character 'K' (0x4B = 01001011) is shown in figure 38 (Link). Each bit is allocated a time slice on the serial port's
line. The default speed for the serial port is 9600 bits per second i.e. each bit is valid for 104 us (1/9600), packets start with a start bit (1) and finish with a stop bit
(0). These packets are received by a terminal program running on a remote computer and displayed on its screen. The serial packets and terminal display are shown
in figures 39 - 41.

Figure 38 : Serial packet format (wiki link above)

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 16/19
12/26/2017 Simple CPU

Figure 39 : HELLO WORLD serial packet (captured on scope)

Figure 40 : First three characters "HEL" (captured on scope)

Figure 41 : Complete message displayed in terminal

The program to send the message string "HELLO WORLD" to the serial terminal is shown below. Hopefully most of the code is self explanatory :). The next
character to be transmitted is read from memory location 0xE0, a bit mask is applied using a bitwise AND to select the desired bit. Based on this result a conditional

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 17/19
12/26/2017 Simple CPU
jump then selects a 0 or a 1, which is stored to memory location 0xFF i.e. the serial port. This is repeated seven more times until all eight character bits have been
outputted. The program below could of been substantially reduced if a SHIFT instruction had been included. The final twist is how the program scans through the
message string. As this processor does not support indirect addressing modes, self modifying code is used to access the next character in the string i.e. instructions
from 0x4F to 0x58. The program reads an INPUT instruction from memory, it then adds 1 to this instruction. As the INPUT instruction's address field is in the lower
byte this changes the address to the address of the next character in the string. This modified instruction is then written back to memory and executed by the
program. It goes without saying that self modifying coded i.e. a program that rewrites itself is not a good idea, however, its very useful in this case. The program
then fetches the next character, storing it to memory location 0xE0, if this is a NULL the program finishes, otherwise the program jumps to memory location 0x02
and repeats the TX code. Note, from a software structure point of view would of been nice if the processor had supported subroutines, perhaps for version 2.

DATA_RAM_WORD'("1010000011110000"), --00 INPUT ACC 0xF0 - load first char


DATA_RAM_WORD'("1110000011100000"), --01 OUTPUT ACC 0xE0 - update tx memory
DATA_RAM_WORD'("0000000000000001"), --02 LOAD ACC 1 - Start Bit
DATA_RAM_WORD'("1110000011111111"), --03 OUTPUT ACC 0xFF - Set serial port high
DATA_RAM_WORD'("0000000011001110"), --04 LOAD ACC 0xCE - wait loop (104us)
DATA_RAM_WORD'("0100000000000001"), --05 ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110000000101"), --06 JUMP NC 0x05 - repeat 256 times
DATA_RAM_WORD'("1010000011100000"), --07 INPUT ACC 0xE0 - read character
DATA_RAM_WORD'("0001000000000001"), --08 AND ACC 0X01 - mask bit0
DATA_RAM_WORD'("1001000000001011"), --09 JUMP Z 0x0B - if zero output
DATA_RAM_WORD'("0000000000000001"), --0A LOAD ACC 1 - else set 1
DATA_RAM_WORD'("1110000011111111"), --0B OUTPUT ACC 0xFF - Set serial port bit
DATA_RAM_WORD'("0000000011001110"), --04 LOAD ACC 0xCE - wait loop (104us)
DATA_RAM_WORD'("0100000000000001"), --0D ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110000001101"), --0E JUMP NC 0x0D - repeat 256 times
DATA_RAM_WORD'("1010000011100000"), --0F INPUT ACC 0xE0 - read character

DATA_RAM_WORD'("0001000000000010"), --10 AND ACC 0X02 - mask bit1


DATA_RAM_WORD'("1001000000010011"), --11 JUMP Z 0x13 - if zero output
DATA_RAM_WORD'("0000000000000001"), --12 LOAD ACC 1 - else set 1
DATA_RAM_WORD'("1110000011111111"), --13 OUTPUT ACC 0xFF - Set serial port bit
DATA_RAM_WORD'("0000000011001110"), --04 LOAD ACC 0xCE - wait loop (104us)
DATA_RAM_WORD'("0100000000000001"), --15 ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110000010101"), --16 JUMP NC 0x15 - repeat 256 times
DATA_RAM_WORD'("1010000011100000"), --17 INPUT ACC 0xE0 - read character
DATA_RAM_WORD'("0001000000000100"), --18 AND ACC 0X04 - mask bit2
DATA_RAM_WORD'("1001000000011011"), --19 JUMP Z 0x1B - if zero output
DATA_RAM_WORD'("0000000000000001"), --1A LOAD ACC 1 - else set 1
DATA_RAM_WORD'("1110000011111111"), --1B OUTPUT ACC 0xFF - Set serial port bit
DATA_RAM_WORD'("0000000011001110"), --04 LOAD ACC 0xCE - wait loop (104us)
DATA_RAM_WORD'("0100000000000001"), --1D ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110000011101"), --1E JUMP NC 0x1D - repeat 256 times
DATA_RAM_WORD'("1010000011100000"), --1F INPUT ACC 0xE0 - read character

DATA_RAM_WORD'("0001000000001000"), --20 AND ACC 0X08 - mask bit3


DATA_RAM_WORD'("1001000000100011"), --21 JUMP Z 0x23 - if zero output
DATA_RAM_WORD'("0000000000000001"), --22 LOAD ACC 1 - else set 1
DATA_RAM_WORD'("1110000011111111"), --23 OUTPUT ACC 0xFF - Set serial port bit
DATA_RAM_WORD'("0000000011001110"), --04 LOAD ACC 0xCE - wait loop (104us)
DATA_RAM_WORD'("0100000000000001"), --25 ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110000100101"), --26 JUMP NC 0x25 - repeat 256 times
DATA_RAM_WORD'("1010000011100000"), --27 INPUT ACC 0xE0 - read character
DATA_RAM_WORD'("0001000000010000"), --28 AND ACC 0X10 - mask bit4
DATA_RAM_WORD'("1001000000101011"), --29 JUMP Z 0x2B - if zero output
DATA_RAM_WORD'("0000000000000001"), --2A LOAD ACC 1 - else set 1
DATA_RAM_WORD'("1110000011111111"), --2B OUTPUT ACC 0xFF - Set serial port bit
DATA_RAM_WORD'("0000000011001110"), --04 LOAD ACC 0xCE - wait loop (104us)
DATA_RAM_WORD'("0100000000000001"), --2D ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110000101101"), --2E JUMP NC 0x2D - repeat 256 times
DATA_RAM_WORD'("1010000011100000"), --2F INPUT ACC 0xE0 - read character

DATA_RAM_WORD'("0001000000100000"), --30 AND ACC 0X20 - mask bit5


DATA_RAM_WORD'("1001000000110011"), --31 JUMP Z 0x33 - if zero output
DATA_RAM_WORD'("0000000000000001"), --32 LOAD ACC 1 - else set 1
DATA_RAM_WORD'("1110000011111111"), --33 OUTPUT ACC 0xFF - Set serial port bit
DATA_RAM_WORD'("0000000011001110"), --04 LOAD ACC 0xCE - wait loop (104us)
DATA_RAM_WORD'("0100000000000001"), --35 ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110000110101"), --36 JUMP NC 0x35 - repeat 256 times
DATA_RAM_WORD'("1010000011100000"), --37 INPUT ACC 0xE0 - read character
DATA_RAM_WORD'("0001000001000000"), --38 AND ACC 0X40 - mask bit6
DATA_RAM_WORD'("1001000000111011"), --39 JUMP Z 0x3B - if zero output
DATA_RAM_WORD'("0000000000000001"), --3A LOAD ACC 1 - else set 1
DATA_RAM_WORD'("1110000011111111"), --3B OUTPUT ACC 0xFF - Set serial port bit
DATA_RAM_WORD'("0000000011001110"), --04 LOAD ACC 0xCE - wait loop (104us)
DATA_RAM_WORD'("0100000000000001"), --3D ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110000111101"), --3E JUMP NC 0x3D - repeat 256 times
DATA_RAM_WORD'("1010000011100000"), --3F INPUT ACC 0xE0 - read character

DATA_RAM_WORD'("0001000010000000"), --40 AND ACC 0X80 - mask bit7


https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 18/19
12/26/2017 Simple CPU
DATA_RAM_WORD'("1001000001000011"), --41 JUMP Z 0x43 - if zero output
DATA_RAM_WORD'("0000000000000001"), --42 LOAD ACC 1 - else set 1
DATA_RAM_WORD'("1110000011111111"), --43 OUTPUT ACC 0xFF - Set serial port bit
DATA_RAM_WORD'("0000000011001110"), --04 LOAD ACC 0xCE - wait loop (104us)
DATA_RAM_WORD'("0100000000000001"), --45 ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110001000101"), --46 JUMP NC 0x45 - repeat 256 times
DATA_RAM_WORD'("0000000000000000"), --47 LOAD ACC 0 - stop bit
DATA_RAM_WORD'("1110000011111111"), --48 OUTPUT ACC 0xFF - Set serial port bit
DATA_RAM_WORD'("0100000000000001"), --49 ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110001001001"), --4A JUMP NC 0x49 - repeat 256 times
DATA_RAM_WORD'("0000000000000000"), --4B LOAD ACC 0 - stop bit
DATA_RAM_WORD'("1110000011111111"), --4C OUTPUT ACC 0xFF - Set serial port bit
DATA_RAM_WORD'("0100000000000001"), --4D ADD ACC 1 - Increment count
DATA_RAM_WORD'("1001110001001101"), --4E JUMP NC 0x4D - repeat 256 times
DATA_RAM_WORD'("1010000001010010"), --4F INPUT ACC 52 - read instruction

DATA_RAM_WORD'("0100000000000001"), --50 ADD ACC 1 - increment address field


DATA_RAM_WORD'("1110000001010010"), --51 OUT ACC 52 - update instruction
DATA_RAM_WORD'("1010000011110000"), --52 INPUT ACC 0xF0 - execute instruction
DATA_RAM_WORD'("1110000011100000"), --53 OUTPUT ACC 0xE0 - update tx memory
DATA_RAM_WORD'("0001000011111111"), --54 AND ACC 0xFF - is char NULL?
DATA_RAM_WORD'("1001010000000010"), --55 JUMP NZ, 2 - no, TX
DATA_RAM_WORD'("0000000011110000"), --56 LOAD ACC 0xF0 - restore original instruction
DATA_RAM_WORD'("1110000001010010"), --57 OUT ACC 52
DATA_RAM_WORD'("1000000001010110"), --58 JUMP 56 - yes, halt

DATA_RAM_WORD'("0000000010110111"), --F0 H = 0x48 inv 10110111


DATA_RAM_WORD'("0000000010111010"), --F1 E = 0x45 inv 10111010
DATA_RAM_WORD'("0000000010110011"), --F2 L = 0x4C inv 10110011
DATA_RAM_WORD'("0000000010110011"), --F3 L = 0x4C inv 10110011
DATA_RAM_WORD'("0000000010110000"), --F4 O = 0x4F inv 10110000
DATA_RAM_WORD'("0000000011011111"), --F5 'SP' = 0x20 inv 11011111
DATA_RAM_WORD'("0000000010101000"), --F6 W = 0x57 inv 10101000
DATA_RAM_WORD'("0000000010110000"), --F7 O = 0x4F inv 10110000
DATA_RAM_WORD'("0000000010101101"), --F8 R = 0x52 inv 10101101
DATA_RAM_WORD'("0000000010110011"), --F9 L = 0x4C inv 10110011
DATA_RAM_WORD'("0000000010111011"), --FA D = 0x44 inv 10111011
DATA_RAM_WORD'("0000000011110010"), --FB 'CR' = 0x0D inv 11110010
DATA_RAM_WORD'("0000000011110101"), --FC 'LF' = 0x0A inv 11110101
DATA_RAM_WORD'("0000000000000000"), --FD 'NULL' = 0x00
DATA_RAM_WORD'("0000000000000000"), --FE
DATA_RAM_WORD'("0000000000000000") --FF

Had to make a few small changes to the schematics and VHDL files to minimise the hardware size e.g. in its present form the memory is implemented by the
software tools as a 256:1 16bit multiplexer, which takes up quite a bit of space. Adding a clock allows the memory to be mapped to a BlockRam i.e. the default
RAM on the FPGA. The final project that prints "HELLO WORLD" can be downloaded here: (Link).

What next for this computer, i what to see if i can get it to fit into a Xilinx 9572 CPLD, which we use for teaching hardware design, would need to use external
RAM/ROM, however, the main limitation is that this programmable hardware only has a very small amount of hardware e.g. 72 flip-flops.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Contact details: email - [email protected], telephone - 01904 32(5473)

Back

https://www-users.cs.york.ac.uk/~mjf/simple_cpu/index.html 19/19

You might also like