0% found this document useful (0 votes)
42 views

FPGA Memory

The document describes the different memory systems available on Altera Cyclone 5 FPGAs, including M10K blocks, MLAB blocks, logic element registers, and Qsys-attached static RAM and external SDRAM. It provides details on the size and capabilities of each type of memory block. It also provides examples of Verilog code for instantiating and accessing M10K and MLAB blocks, and an example design that uses a Qsys-attached M10K block shared between the FPGA and HPS, along with an MLAB block to implement a counter.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

FPGA Memory

The document describes the different memory systems available on Altera Cyclone 5 FPGAs, including M10K blocks, MLAB blocks, logic element registers, and Qsys-attached static RAM and external SDRAM. It provides details on the size and capabilities of each type of memory block. It also provides examples of Verilog code for instantiating and accessing M10K and MLAB blocks, and an example design that uses a Qsys-attached M10K block shared between the FPGA and HPS, along with an MLAB block to implement a counter.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

The memory systems of Altera Cyclone5 FPGAs have various features and

limitations.
I will not talk about the HPS side here, only the FPGA side.
Memory systems include:

 M10K blocks on Cyclone5 SE A5


There are about 390 blocks (~3900 Kbits), each capable of holding:
1-bit x 8K, 2-bit x 4K, 4-bit x 2K, 5-bit x 2K, 8-bit x 1K, 10-bit x 1K, 16-bit x
512, 20-bit x 512, 32-bit x 256, 40-bit x 256.
If you instantiate bigger memories in Verilog, blocks will be automatically
concantenated to build the bigger memory.
There is a data-out pipeline register, which delayss the read by one cycle, for a
total of two cycles.
There are optional input pipeline registers on data, address, write-enable, so a
M10K block read takes 3-cycles if the input registers are enabled, but can be
pipelined.
Dual port read/write is supported.
 MLAB blocks
Up to about 480 blocks, each holding 16, 18 or 20 words of 32-bit data.
MLAB does not support true dual-port RAM
MLAB supports continuos reads. For example, when you write a data at the
write clock rising edge and after the write operation is complete,
you see the written data at the output port without the need for a read clock
rising edge.
 Logic Element Registers
Up to 128,000 bits of memory, but this uses general logic elements very
quickly.
 Qsys-attached startic RAM (M10K blocks)
Easy to use, bus attached memory, which can be accessed from FPGA and HPS
Size is configured in Qsys and uses the pool of available M10K blocks.
 Qsys-attached external SDRAM
Easy to use, bus attached memory, which can be accessed from FPGA and
HPS.
There is ONE actual external SDRAM available on this board.
Configured as 32 Mwords of 16-bit memory (64 MB)

These are explained in several documents.


I have tried here to show specific examples of memory use in realistic state machine
schemes. Refer to:
 Altera Recommended HDL Coding Styles section on Inferring Memory
Functions from HDL Code.
 Recommended Design Practices for FPGAs: Latches, clocks, metastability.
 Advanced Synthesis Cookbook All kinds of cool mathmatical and logical
structures.
 CycloneV Memory Blocks: M10K memory
 Internal Memory (RAM and ROM) Users Guide
 On-Chip Memory Implementations Using Cyclone Memory Blocks

M10k/MLAB on Cyclone 5.

The following Verilog templates for memory require two cycles to read, and one cycle
to write.
They are less relable at very high clock rates, according the HDL style guide (see
below).
My code uses these templates for M10K and MLAB blocks at 100 MHz with no
problems.
An example which includes state machines to run these memories is below in the
section
titled: Memory block Example -- Qsys sram, M10K block, and MLAB

Verilog templates for Cyclone 5 Memory:


//============================================================
// M10K module for testing
//============================================================

module M10K_256_32(
output reg [31:0] q,
input [31:0] d,
input [7:0] write_address, read_address,
input we, clk
);
// force M10K ram style
reg [31:0] mem [255:0] /* synthesis ramstyle = "no_rw_check, M10K" */;

always @ (posedge clk) begin


if (we) begin
mem[write_address] <= d;
end
q <= mem[read_address]; // q doesn't get d in this clock cycle
end
endmodule

//============================================================
// MLAB module for testing
//============================================================
module MLAB_20_32(
output reg signed [31:0] q,
input [31:0] data,
input [7:0] readaddr, writeaddr,
input wren, clock
);
// force MLAB ram style
reg signed [31:0] mem [19:0] /* synthesis ramstyle = "no_rw_check,
MLAB" */;

always @ (posedge clock)


begin
if (wren) begin
mem[writeaddr] <= data;
end
q <= mem[readaddr];
end
endmodule

The HDL style Guide suggest the following code to infer M10K or MLAB blocks, but
it is slower.
NOTE that this template enables the input pipeline registers, so a read takes three
cycles and a write two cycles.

Timing diagrams:
You can set up M10K blocks in at least 3 ways:

 Infer the block memory directly from Verilog


o See HDL style Guide
o Mohammad Dohadwala wrote the following to construct a 512 word 18-
bit memory in one M10K block.
This code simulates correctly in Modelsim because he wrote a single-
cycle delay in the read enable signal.
For synthesis you would use the version titled M10K_256_32 at the
beginning of this section, not this code.
o module RAM_512_18(
o output reg signed [17:0] q,
o input signed [17:0] data,
o input [8:0] wraddress, rdaddress,
o input wren, rden, clock
o );
o
o reg [8:0] read_address_reg;
o reg signed [17:0] mem [511:0];
o reg rden_reg;
o
o always @ (posedge clock)
o begin
o if (wren)
o mem[wraddress] <= data;
o end
o always @ (posedge clock) begin
o if (rden_reg)
o q <= mem[read_address_reg];
o read_address_reg <= rdaddress;
o rden_reg <= rden;
o end
o
endmodule

 Use synthesis comments in Verilog to force memory allocation


o Synthesis attributes
o RAM synthesis attribute
 Use the Altera IP library Memory -- users guide. There is some evidence that
the simulation code
generated by the IP handler does not correctly handle the one-cycle read delay
in M10K blocks.
o RAM
o 2-port RAM -- config1, 2, 3, 5, 6, 7, 10
o ROM,
o shift registers,
o FIFO

Qsys sram and MLAB


This example simultaneously tests the floating point routines, shares a M10K between
HPS and FPGA, and uses a MLAB block to increment a counter. The M10K block is
instantiated in Qsys as dual port memory with two clocks, system clock, and
an 50MHz clock. One slave port is hooked to the HPS bus and the other port is
exported to the FPGA fabric. These exported signals appear in the computer system
template as:
// SRAM shared block with HPS
.onchip_sram_s1_address (sram_address),
.onchip_sram_s1_clken (sram_clken),
.onchip_sram_s1_chipselect (sram_chipselect),
.onchip_sram_s1_write (sram_write),
.onchip_sram_s1_readdata (sram_readdata),
.onchip_sram_s1_writedata (sram_writedata),
.onchip_sram_s1_byteenable (4'b1111),

The state machine in Verilog can read/write to the same block as the HPS, which thus
acts as a communication channel. The program running on the HPS writes floating
point values into the sram. The sram state machine reads the memory location in sram,
then write back the value to another address, which is read by the HPS program and
printed. There is a separate state machine which read/writes an MLAB block.
The fitter ram summary (line 8) shows that the following code inferred an MLAB
block. Note that I forced it with the synthesis directive. The MLAB timing is the same
as M10K blocks. A read takes two cycles.
//============================================================
// MLAB module for testing
//============================================================
module MLAB_20_32(
output reg signed [31:0] q,
input [31:0] data,
input [7:0] readaddr, writeaddr,
input wren, clock
);

reg [7:0] read_address_reg;


// force MLAB ram style
reg signed [31:0] mem [19:0] /* synthesis ramstyle = "no_rw_check,
MLAB" */;

always @ (posedge clock)


begin
if (wren) begin
mem[writeaddr] <= data;
end
q <= mem[readaddr];
end
endmodule
The MLAB state machine reads an address, then writes back the (read_value)+1, and
copies the count to the red LEDs.
(HPS program, top-level, ZIP)
//=======================================================
// MLAB state machine
//=======================================================
wire [31:0] mlab_readdata ;
reg [31:0] mlab_writedata, mlab_data_buffer ;
reg [7:0] mlab_address;
reg mlab_write ;
reg [3:0] mlab_state ;

MLAB_20_32 mlab1(
.q(mlab_readdata),
.data(mlab_writedata),
.readaddr(mlab_address),
.writeaddr(mlab_address),
.wren(mlab_write),
.clock(CLOCK_50)
);

// readout for memory state


assign LEDR = mlab_data_buffer[31:23] ;

// memory based counter


// reads/writes counter for display
always @(posedge CLOCK_50) begin
//set up read
if (mlab_state == 4'd0) begin
mlab_address <= 8'd0 ;
mlab_write <= 1'b0 ;
mlab_state <= 4'd1 ;
end
// wait -- required for read
if (mlab_state == 4'd1) begin
mlab_state <= 4'd2 ;
end
// do the read
if (mlab_state == 4'd2) begin
mlab_data_buffer <= mlab_readdata ;
mlab_write <= 1'b0 ;
mlab_state <= 4'd3 ;
end
// set up write
if (mlab_state == 4'd3) begin
mlab_address <= 8'd0 ;
mlab_writedata <= mlab_data_buffer + 32'd1 ;
mlab_write <= 1'b1 ;
mlab_state <= 4'd0 ;
end
end
Memory block Example -- Qsys sram, M10K block, and MLAB

This Verilog modification of the project above reads two numbers from the Qsys sram
(connected to HPS and the FPGA fabric) and computes the floating point sum of the
contents of sram address=1 and address=2, when the data flag in Qsys sram address=0
is set to one. The sum is copied into an M10K block, then back into the Qsys sram,
address=3. This roundabout scheme exercises read/write in M10K blocks. An MLAB
block is still counting, as above.

The RAM synthesis summary shows that two blocks were created, and that the clever
compiler figured out that I only used 4 locations of the M10K block.

The HPS program is the same as above and produces this console.
The 1 1 0 on the command line writes a 1 to location zero to trigger the addition and
write-back.
The location zero always reads zero because it is zeroed by the FPGA state machine
before the write statement executes.
FIFO between HPS and FPGA

There is a Qsys FIFO module (chapter 14) available that could make a good interface
between the HPS and FPGA.
See the FIFO page.

You might also like