Memory Controller For A 6502 CPU in VHDL: Michel Wilson, 1047981
Memory Controller For A 6502 CPU in VHDL: Michel Wilson, 1047981
Memory Controller For A 6502 CPU in VHDL: Michel Wilson, 1047981
May 2006
The 6502 soft core implemented in VHDL on an FPGA development board as used by the
Embedded Software Lab of the Delft Technical University currently only uses on-chip RAM. This
leads to a shortage of memory in certain situations. To solve this problem, a memory controller
for the soft core which enables access to the external RAM chips available on the development
board has to be developed. This project describes the design and the development of such a
controller, and an accompanying program to test the working of the controller and the memory.
Contents
Contents 1
1 Introduction 3
1.1 About the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 About this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Action plan 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Approval and adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Project description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Project goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Milestones and deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.3 Requirements and constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Project phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 Requirements analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Design decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.3 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.5 Final product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Code documentation 12
4.1 VHDL code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.1 Toplevel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.2 Memory management unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1.3 Memory controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.4 I/O controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Memory test program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1
A Working with the FPGA under Linux 23
A.1 USB to Serial Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
A.2 Xilinx software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Bibliography 26
2
C H A P T E R 1
Introduction
3
C H A P T E R 2
Action plan
2.1 Introduction
2.1.1 Motivation
Currently, a 6502 soft core is used in the Embedded Software lab, which is implemented in VHDL
on a Digilent Spartan-3 Development Board, using the Xilinx ISE WebPACK design software ([3]).
The soft core is used for research, and for the course in4073 (Embedded Real-Time Systems). All
programs using this core are severely constrained by the limited amount of available memory in
the FPGA chip on the Development Board. The primary goal of the project is therefore to increase
the amount of memory available to the soft core, using the external memory available on the
development board.
4
The other deliverable is the C source code of the test program, also including documentation,
explaining how to use the program, and explaining why the test procedure is deemed to be an
accurate indication of the working or failure of the MMU and/or the memory.
2.3.3 Development
The actual implementation takes place in this phase. If any design decisions from the previous
phase are found to be infeasible, an alternative decision is to be sought, in consultation with the
client. The documentation of the previous phase should be updated in this case.
2.3.4 Testing
All implemented systems are tested, to confirm that they indeed perform according to the speci-
fication. Any inconsistencies are to be corrected.
5
C H A P T E R 3
3.1 Introduction
In this chapter, the proposed architecture of the memory management unit extension for the
6502 softcore will be described, along with the design decisions which were made to reach this
proposal. Also, the choice of the testing algorithms will be discussed. The VHDL code was
developed using the Xilinx ISE WebPACK design software, version 7.1i, under Linux ([3].
As there is a clear distinction between interfacing with the external memory, and interfacing
with the rest of the processor, the memory controller will be divided into a part which accesses
the external memory, and a part which translates the memory address. These components will
be discussed in detail, below. Lastly, the test program will be discussed.
MMU
15..12
RAM
address 11..0 .. ..
. .
read flag
RAM
write flag
CPU
data
I/O
ROM
6
MMU
15..12
RAM
address 11..0 .. ..
. .
read flag
RAM
write flag
data External
CPU memory
I/O
ROM
Memory
controller
7
Address Existing situation New situation
0000h internal RAM internal RAM
1000h internal RAM internal RAM
2000h internal RAM internal RAM
3000h internal RAM internal RAM
4000h internal RAM internal RAM
5000h unused external RAM
6000h unused external RAM
7000h unused external RAM
8000h unused external RAM
9000h unused external RAM
A000h unused external RAM
B000h unused external RAM
C000h unused external RAM
D000h unused unused
E000h I/O I/O
F000h ROM ROM
Signal Function
CLK Central 50MHz clock
halt Enable/disable the controller, see also timing section
read_enable Read the selected memory address
write_enable Write the selected memory address
address (19 downto 0) Memory address
data_in (7 downto 0) Data to memory
data_out (7 downto 0) Data from memory
Signal Function
mem_enable_1
Chip enable signals
mem_enable_2
mem_upper_1 Enabling/disabling of the upper output byte of the memory
mem_upper_2 chips
mem_lower_1 Enabling/disabling of the lower output byte of the memory
mem_lower_2 chips
mem_out_enable Output enable
mem_wr_enable Write enable
mem_address (17 downto 0) Memory address
mem_data_1 (15 downto 0)
Data lines
mem_data_2 (15 downto 0)
8
3.4.1 Address space distribution
As there are two different memory chips, the addresses can be distributed in different ways. The
two choices are to distribute the data linearly over the two chips, or to stripe the data across the
two chips with some block size. The striping approach can be used to increase performance when
accessing the memory sequentially. In this case however, the memory speed is high enough to
achieve what we want, so striping is not necessary. Therefore, the data is distributed linearly over
the two chips, using the MSB to select the chip to use.
As the chips have a 16 bits wide data path, the LSB is used to drive the signals which select
the upper/lower byte, and to select the required data signals. The MSB is used to select one of the
two memory chips. A 4-way multiplexer is thus needed, using the MSB and the LSB to select one
out of four 8 bit words from the data lines.
3.4.3 Timing
In the existing system, the CPU and the memory system are enabled every other clocktick, using
the halt signal. For a memory read, this means we have one clock cycle of 20ns to do all the
work. For a write, we can also use the clock cycle in which the halt signal is high.
The memory has an access time of 10ns, making the above possible. The address is loaded at
the start of the clockcycle, and the multiplexer controlling the output is set. After 10ns, the data
becomes available. The multiplexer output should thus be connected directly to the data output,
with no latches or registers in between.
Writing is slightly more complicated. The data has to be setup 8ns before the falling edge
of the signals controlling the write (i.e. the lower/upper signals, the enable signal, and the write
enable signal). The data is written on the falling edge of one of the write controlling signals. This
means that two clock cycles are needed to write a byte of data. When halt is high, before the
write, the setup is done: the lower/upper signals, and the enable signal are directly connected to
the address bits driving them. The memory address is directly connected to the other address
bits. When halt goes low, the write signal is asserted, and the data inputs are connected to the
right output. The write is committed at the end of the cycle, when halt goes high again, by
de-asserting the write signal. An example write waveform can be seen in figure 3.3.
9
CLK LLLLHHHH LLLLHHHHH
halt HHHH LLLLLLLLLLHHHH
write enable FFÆHHHHHHHHHHHHHHFF
address VVVVVVVVVVVVVVVVVV
data in VVVVVVVVVVVVVVVVVV
10
Five address bits remain, these are connected to a register which can be used to switch memory
banks. This register, in turn, is memory mapped to an available address using the (existing) I/O
mechanism. See also figure 3.4.
bank address
19 15 14 0
11
C H A P T E R 4
Code documentation
Entity description
The data lines to connect to the external memory chips are added to the entity description of the
MMU:
31 SRAM_ADDR : out std_logic_vector(17 downto 0);
32 SRAM1_CE : out std_logic;
33 SRAM1_LB : out std_logic;
34 SRAM1_UB : out std_logic;
35 SRAM1_IO : inout std_logic_vector(15 downto 0);
12
36 SRAM2_CE : out std_logic;
37 SRAM2_LB : out std_logic;
38 SRAM2_UB : out std_logic;
39 SRAM2_IO : inout std_logic_vector(15 downto 0);
40 SRAM_OE : out std_logic;
41 SRAM_WE : out std_logic;
Internal signals
Internal signals are added to connect the external memory to the data bus, and to the control
signals:
55 signal data_mem_to_cpu_ext: std_logic_vector(7 downto 0);
64 signal read_flag_ext : std_logic;
72 signal write_flag_ext : std_logic;
And an internal signal is added to connect the page select register in the I/O controller to the
memory controller:
75 signal extram_page : std_logic_vector(4 downto 0) := (others => ’0’);
13
Read/write flags, databus
For each 4K aperture in which the external memory is mapped, the read and write flags need to
be set:
205 with address_bus(15 downto 12) select
206 read_flag_ext <= read_flag when "0101", -- 5000h
207 read_flag when "0110",
208 read_flag when "0111",
209 read_flag when "1000",
210 read_flag when "1001",
211 read_flag when "1010",
212 read_flag when "1011",
213 read_flag when "1100", -- C000h
214 ’0’ when others;
243 with address_bus(15 downto 12) select
244 write_flag_ext <= write_flag when "0101", -- 5000h
245 write_flag when "0110",
246 write_flag when "0111",
247 write_flag when "1000",
248 write_flag when "1001",
249 write_flag when "1010",
250 write_flag when "1011",
251 write_flag when "1100", -- C000h
252 ’0’ when others;
Also, the data output of the external memory has to be connected to the central data bus when
one of the apertures in which the external memory is mapped is selected:
259 with address_bus(15 downto 12) select
260 data_mem_to_cpu <= data_mem_to_cpu_0xxx when "0000",
261 data_mem_to_cpu_1xxx when "0001",
262 data_mem_to_cpu_2xxx when "0010",
263 data_mem_to_cpu_3xxx when "0011",
264 data_mem_to_cpu_4xxx when "0100",
265 data_mem_to_cpu_ext when "0101", -- 5000h
266 data_mem_to_cpu_ext when "0110",
267 data_mem_to_cpu_ext when "0111",
268 data_mem_to_cpu_ext when "1000",
269 data_mem_to_cpu_ext when "1001",
270 data_mem_to_cpu_ext when "1010",
271 data_mem_to_cpu_ext when "1011",
272 data_mem_to_cpu_ext when "1100", -- C000h
3 use ieee.std_logic_1164.all;
Entity description
All the signals for connecting the controller are defined. The CLK signal must be connected to the
central clock.
The halt signal must be connected to the central halt signal of the MMU. When this signal is
low, data is read from the memory, or the signals for a write are setup. When the signal is high, a
write is committed to memory. Note that a memory write thus needs two clock cycles, one during
which the halt signal is low, and one during which the halt signal is high.
14
5 entity memctrl is
6 port(CLK : in std_logic;
7 halt : in std_logic;
All signals prefixed with mem_ are used to interface with the external memory on the board.
9 mem_enable_1 : out std_logic; -- enable memory chip 1
10 mem_lower_1 : out std_logic; -- enable lower byte mem chip 1
11 mem_upper_1 : out std_logic; -- enable upper byte mem chip 1
12 mem_data_1 : inout std_logic_vector(15 downto 0);
13 mem_enable_2 : out std_logic; -- enable memory chip 2
14 mem_lower_2 : out std_logic; -- enable lower byte mem chip 2
15 mem_upper_2 : out std_logic; -- enable upper byte mem chip 2
16 mem_data_2 : inout std_logic_vector(15 downto 0);
17 mem_out_enable : out std_logic; -- output enable
18 mem_wr_enable : out std_logic; -- write enable
19 mem_address : out std_logic_vector(17 downto 0);
The last block of signals are used to interface with the controller.
21 read_enable : in std_logic; -- read memory location
22 write_enable : in std_logic; -- write memory location
23 address : in std_logic_vector(19 downto 0);
24 data_in : in std_logic_vector(7 downto 0);
25 data_out : out std_logic_vector(7 downto 0)
26 );
27 end entity memctrl;
Internal signals
The chipselect and byteselect signals are used to select the correct chip, and either the lower
or the upper byte of that chip, for a certain memory address.
29 architecture arch of memctrl is
30 signal chipselect, byteselect : std_logic;
35 mem_lower_1 <= ’0’ when chipselect = ’0’ and byteselect = ’0’ else ’1’;
36 mem_upper_1 <= ’0’ when chipselect = ’0’ and byteselect = ’1’ else ’1’;
37 mem_lower_2 <= ’0’ when chipselect = ’1’ and byteselect = ’0’ else ’1’;
38 mem_upper_2 <= ’0’ when chipselect = ’1’ and byteselect = ’1’ else ’1’;
39 mem_enable_1 <= ’0’ when chipselect = ’0’ else ’1’;
40 mem_enable_2 <= ’0’ when chipselect = ’1’ else ’1’;
The lower bit of the address is used to select either the upper or the lower byte (line 44). The
upper bit is used to select either the first or the second memory chip (line 43). The rest of the
address bits are used to drive the address lines of the memory chips (line 42).
42 mem_address <= address(18 downto 1);
43 chipselect <= address(19);
44 byteselect <= address(0);
15
Write signal
The write signal is inverted and passed through to the memory chips when the controller is not
in the halt cycle. When the controller is in the halt cycle, the external write signal is always high.
This makes sure that there is always a rising edge at the end of a write on one of the control
signals, triggering the memory chip to write the data. See also figure 3.3 on page 10, and the
accompanying discussion on timing on page 9.
45 process (CLK)
46 begin
47 if rising_edge(CLK) then
48 if halt = ’0’ then
49 mem_wr_enable <= not write_enable;
50 else
51 mem_wr_enable <= ’1’;
52 end if;
53 end if;
54 end process;
16
86 end block;
87
Entity description
One signal is added, to set the page of the external memory:
35 extram_page : out std_logic_vector(4 downto 0)
6 #include "fpga.h"
Three macros are defined, to execute a memory test function on each memory page. The macros
are used to keep the code more readable. Initially, functions were used, with a function pointer
as argument. Unfortunately, for some reason, this did not work on the 6502, buggy behaviour
was encountered.
The first macro is used to call the address test functions. These need the current page as
argument. A message is passed to the macro to display status information. The message must
contain a %02x format specifier, to display the current page.
8 #define run_addr_check(function, message) {\
9 for(page = 0; page <= 0x1f; page++) { \
10 POKE(0xe500, page); \
11 sprintf(s, message, page); \
12 fpga_puts(s); \
13 function(page); \
14 } \
15 }
The second macro is used to call the functions for the running inverse test algorithm. These
functions need the current pattern as an argument. The status message contains 2 %d format
specifiers for displaying the number of the current pattern, and the total number of patterns, and
a %02x format specifier, to display the current page.
17
17 #define run_inv(function, pattern, message) {\
18 for(page = 0; page <= 0x1f; page++) { \
19 POKE(0xe500, page); \
20 sprintf(s, message, pattern+1, N_PATTERNS, page); \
21 fpga_puts(s); \
22 function(patterns[pattern]); \
23 } \
24 }
And lastly, a macro is defined to call a running inverse test function, accessing the pages in
reverse order. This is needed for the second inverting pass of the algorithm. The arguments of
the macro are the same as those of the run_inv macro.
26 #define run_inv_reverse(function, pattern, message) {\
27 for(page = 0x1f; page <=0x1f; page--) { \
28 POKE(0xe500, page); \
29 sprintf(s, message, pattern+1, N_PATTERNS, page); \
30 fpga_puts(s); \
31 function(patterns[pattern]); \
32 } \
33 }
Function declarations are included for all test functions in the program.
35 void addr_write_1(unsigned char page);
36 void addr_check_1(unsigned char page);
37 void addr_write_2(unsigned char page);
38 void addr_check_2(unsigned char page);
39 void addr_write_3(unsigned char page);
40 void addr_check_3(unsigned char page);
41 void mov_inv_write(unsigned char pattern);
42 void mov_inv1(unsigned char pattern);
43 void mov_inv2(unsigned char pattern);
44 void mov_inv3(unsigned char pattern);
The variable s serves as a global buffer for storing formatted output messages, which can then
be output to the serial port using fpga_puts.
46 char s[100];
The array patterns contains the patterns used in the moving inverse test. It includes the zero
pattern, eight patterns with a single bit set to one in each of the eight possible positions, and two
patterns with alternating ones and zeroes. N_PATTERNS is set to the total number of patterns.
47 #define N_PATTERNS 11
48 char patterns[] = { 0x00, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0xaa, 0x55 };
The main function calls each of the tests in sequence. First, all the address check tests are run.
Then, for each available pattern, the moving inversion tests are run.
50 int main() {
51 unsigned char page;
52 int i;
53
18
64 run_inv_reverse(mov_inv2, i, "Moving inversion, %d of %d, inversion #2, page 0x
%02x of 0x1f.\r\n");
65 run_inv(mov_inv3, i, "Moving inversion, %d of %d, inversion #3, page 0x%02x of 0
x1f.\r\n");
66 }
67 exit(0);
68 }
The first address write and check functions write, respectively check, the lower eight bits of
the memory address, for each location in the page.
70 void addr_write_1(unsigned char page) {
71 unsigned int mem;
72 for(mem = 0x5000; mem < 0xd000; mem++) {
73 POKE(mem, mem);
74 }
75 }
76
19
115 }
116
20
164 void mov_inv3(unsigned char pattern) {
165 unsigned int mem;
166 unsigned char val;
167 for(mem = 0x5000; mem < 0xd000; mem++) {
168 val = PEEK(mem);
169 if(val != pattern) {
170 sprintf(s, "Mismatch at addr 0x%04x: expecting 0x%02x, got 0x%02x.\r\n",
pattern, val);
171 }
172 }
173 }
21
C H A P T E R 5
The memory controller which was developed fullfills the requirements set for the project. It gives
a significant increase in available memory, using the external RAM on the development board.
The controller is developed as a separate unit, and is reusable in a similar microcontroller. The test
program also fullfills its requirements; during development several errors in the implementation
of the memory controller were discovered and corrrected using the program.
The speed of the memory controller is such that no extra wait cycles have to be introduced.
Theoretically, the speed could be improved slightly, so that writes and reads both only take
one clocktick. In practice this might prove to be difficult, and in this case, it was not needed.
The reusability of the controller is somewhat impeded by the interface dictated by the existing
CPU. There is no real solution for this problem, and it is questionable if reusability should be an
important goal when developing these kind of components.
The structure of the development process has some room for improvement, there was not
much direction from the side of the client in terms of development process. The cause of this
seems to be the conflicting interests of the client and the coordinator; the client was focused mainly
on the end-product, whereas the main interest of the coordinator was in a proper development
process.
22
A P P E N D I X A
As I developed and tested the FPGA software under Linux, I have added some notes on how to
get things working. Hopefully, they may be of use to someone.
23
When using udev for device management, the device inode for the windrvr6 driver is not
automatically created. Add mknod /dev/windrvr6 c 253 0 to your startup scripts to take care
of this. And, of course, all the permissions on the inodes should be correct for the whole thing to
work.
24
A P P E N D I X B
The two timing diagrams below correspond to the read and write cycles used to access the external
memory. They are copied from [1].
tRC
AD D R E S S
t AA
t OHA t OHA
READ1.eps
t WC t WC
OE
t SA
CE LOW
t HA t HA
WE t SA
t PBW t PBW
UB, LB WORD 1 WORD 2
t HZWE t LZWE
HIGH-Z
DOUT DATA UNDEFINED
t HD t HD
t SD t SD
DATAIN DATAIN
DIN VALID VALID
UB_CEWR4.eps
25
Bibliography
26