FPGA Memory
FPGA Memory
limitations.
I will not talk about the HPS side here, only the FPGA side.
Memory systems include:
M10k/MLAB on Cyclone 5.
The following Verilog templates for memory require two cycles to read, and one cycle
to write.
They are less relable at very high clock rates, according the HDL style guide (see
below).
My code uses these templates for M10K and MLAB blocks at 100 MHz with no
problems.
An example which includes state machines to run these memories is below in the
section
titled: Memory block Example -- Qsys sram, M10K block, and MLAB
module M10K_256_32(
output reg [31:0] q,
input [31:0] d,
input [7:0] write_address, read_address,
input we, clk
);
// force M10K ram style
reg [31:0] mem [255:0] /* synthesis ramstyle = "no_rw_check, M10K" */;
//============================================================
// MLAB module for testing
//============================================================
module MLAB_20_32(
output reg signed [31:0] q,
input [31:0] data,
input [7:0] readaddr, writeaddr,
input wren, clock
);
// force MLAB ram style
reg signed [31:0] mem [19:0] /* synthesis ramstyle = "no_rw_check,
MLAB" */;
The HDL style Guide suggest the following code to infer M10K or MLAB blocks, but
it is slower.
NOTE that this template enables the input pipeline registers, so a read takes three
cycles and a write two cycles.
Timing diagrams:
You can set up M10K blocks in at least 3 ways:
The state machine in Verilog can read/write to the same block as the HPS, which thus
acts as a communication channel. The program running on the HPS writes floating
point values into the sram. The sram state machine reads the memory location in sram,
then write back the value to another address, which is read by the HPS program and
printed. There is a separate state machine which read/writes an MLAB block.
The fitter ram summary (line 8) shows that the following code inferred an MLAB
block. Note that I forced it with the synthesis directive. The MLAB timing is the same
as M10K blocks. A read takes two cycles.
//============================================================
// MLAB module for testing
//============================================================
module MLAB_20_32(
output reg signed [31:0] q,
input [31:0] data,
input [7:0] readaddr, writeaddr,
input wren, clock
);
MLAB_20_32 mlab1(
.q(mlab_readdata),
.data(mlab_writedata),
.readaddr(mlab_address),
.writeaddr(mlab_address),
.wren(mlab_write),
.clock(CLOCK_50)
);
This Verilog modification of the project above reads two numbers from the Qsys sram
(connected to HPS and the FPGA fabric) and computes the floating point sum of the
contents of sram address=1 and address=2, when the data flag in Qsys sram address=0
is set to one. The sum is copied into an M10K block, then back into the Qsys sram,
address=3. This roundabout scheme exercises read/write in M10K blocks. An MLAB
block is still counting, as above.
The RAM synthesis summary shows that two blocks were created, and that the clever
compiler figured out that I only used 4 locations of the M10K block.
The HPS program is the same as above and produces this console.
The 1 1 0 on the command line writes a 1 to location zero to trigger the addition and
write-back.
The location zero always reads zero because it is zeroed by the FPGA state machine
before the write statement executes.
FIFO between HPS and FPGA
There is a Qsys FIFO module (chapter 14) available that could make a good interface
between the HPS and FPGA.
See the FIFO page.