Xilinx ddr2 Memory Interfaces PDF
Xilinx ddr2 Memory Interfaces PDF
March 2006
Memory Interfaces
Solution Guide
Overcoming
Memory Interface
Bottlenecks
INSIDE
ARTICLES
Implementing
High-Performance
Memory Interfaces
with Virtex-4 FPGAs
APPLICATION NOTES
R
Dr. Howard Johnson
The world’s foremost
authority on
signal integrity
Supporting 667 Mbps DDR2 SDRAM interfaces, Virtex-4 FPGAs achieve the highest bandwidth
benchmark in the industry. Based on our unique ChipSync™ technology—built into every I/O—the
Virtex-4 family provides adaptive centering of the clock to the data valid window. By providing reliable
data capture (critical to high-performance memory interfacing), and 75 ps resolution for maximum
design margins, your memory design can now adapt to changing system conditions.
Visit www.xilinx.com/virtex4/memory today, and start your next memory interface design for
Virtex-4 FPGAs with the easy-to-use Memory Interface Generator software.
View The
TechOnLine
Seminar Today
©2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners.
Faster, But More Challenging
Memory Interfaces
Solution Guide
W
PUBLISHER Forrest Couch
[email protected]
408-879-5270
Welcome to the Memory Interfaces Solution Guide, an educational journal of memory interface
EDITOR Charmaine Cooper Hussain design and implementation solutions from Xilinx. Engineers in the semiconductor and electronics
design community tasked to create high-performance system-level designs know well the growing
ART DIRECTOR Scott Blair
challenge of overcoming memory interface bottlenecks. This guide seeks to bring light to current
memory interface issues, challenges, and solutions, especially as they relate to extracting maximum
ADVERTISING SALES Dan Teie
1-800-493-5551
performance in FPGA designs.
Toward the latter half of the 1990s, memory interfaces evolved from single-data-rate SDRAMs to
double-data-rate (DDR) SDRAMs, the fastest of which is currently the DDR2 SDRAM, running
at 667 Mbps/pin. Present trends indicate that these rates are likely to double every four years,
potentially reaching 1.6 Gbps/pin by 2010. These trends present a serious problem to designers in
that the time period during which you can reliably obtain read data – the data valid window –
is shrinking faster than the data period itself.
This erosion of the data valid window introduces a new set of design challenges that require a more
effective means of establishing and maintaining reliable memory interface performance.
Along with the performance issues that attend the new breed of high-performance memories,
designers face a new set of memory controller design issues as well. The complexities and intricacies
of creating memory controllers for these devices pose a wide assortment of challenges that suggest a
need for a new level of integration support from the tools accompanying the FPGA.
In this guide, we offer a foundational set of articles covering the broad selection of resources and
solutions Xilinx offers, including the latest silicon features in the Virtex™-4 FPGA family that
address the shrinking data valid window, the availability of hardware reference designs to accelerate
your design efforts, and two application notes that discuss the latest technology advances enabling
the design of DDR2 SDRAM interfaces running at 667 Mbps/pin.
Xilinx, Inc.
2100 Logic Drive Enjoy!
San Jose, CA 95124-3400
Phone: 408-559-7778
FAX: 408-879-4780
C O N T E N T S
ARTICLES
APPLICATION NOTES
EDUCATION
BOARDS
by Adrian Cosoroaba
Marketing Manager
Xilinx, Inc.
[email protected]
PC100 PC133 DDR - 200 DDR - 266 DDR - 333 DDR - 400 DDR2 - 400 DDR2 - 533 DDR2 - 667 DDR2 - 800
0.8 1.1 1.6 2.1 2.7 3.2 3.2 4.266 5.33 6.4
Table 1 – The progression from SDR to DDR and DDR2 has allowed today’s systems to maintain their
upward growth path. Speed grades and bit rates are shown for each memory interface.
On-Die Termination
Figure 2 – The HyperLynx free-form schematic editor shows a pre-layout topology
The addition of on-die termination of an unbuffered 2 DIMM module system. Transmission line lengths on the DIMM
(ODT) has provided an extra knob with are from the JEDEC DDR2 unbuffered DIMM specification.
which to dial in and improve signal integri-
ty on the DDR2 interface. ODT is a nation values, allowing you to choose an unbuffered DIMM modules and ODT set-
dynamic termination built into the optimal solution for your specific design. tings of 150 Ohms at each DIMM. You
SDRAM chip and memory controller. It It is important to investigate the effects can simulate the effects of using different
can be enabled or disabled depending on of ODT on your received signals, and you ODT settings and determine which set-
addressing conditions and whether a read can easily do this by using a signal integrity tings would work best for this DDR2
or write operation is being performed, as software tool like Mentor Graphics’ design before committing to a specific
shown in Figure 1. In addition to being HyperLynx product. Consider the example board layout or creating a prototype.
able to turn termination off or on, ODT design shown in Figure 2, which shows a With the 150 Ohm ODT settings,
also offers the flexibility of different termi- DDR2-533 interface (266 MHz) with two Figure 3 shows significant signal degrada-
VDDQ
VIH(AC) min
VREF to AC
Region
VIH(DC) min
Nominal
Slew Rate
VREF(DC)
Nominal
Slew Rate
VIL(DC) max
VREF to AC
Region
VIL(AC) max
VSS
Figure 6 – The waveform illustrates how a nominal slew rate Figure 7 – The HyperLynx oscilloscope shows an automated measurement of the
is defined for a signal when performing a derating in a nominal slew rate for every edge in an eye diagram with the DDR2 slew rate der-
setup condition. The waveform is taken from the DDR2 ating feature. The measurement provides the minimum and maximum slew rates
JEDEC specification (JESD79-2B). that can then be used in the DDR2 derating tables in the JEDEC specification.
Figure 9 – The HyperLynx oscilloscope shows how the tangent line is automati- Figure 10 – The oscilloscope shows how a derating for a hold
cally determined for you in the DDR2 slew rate derating feature. The slew rate condition is being performed on the received signal. The DC
lines in the display indicate that they are tangent lines because they no longer thresholds are used in place of the AC switching thresholds,
intersect with the received signal and Vref intersection. The oscilloscope deter- which are noted in the DDR2 derating dialog.
mines the slew rate of these new tangent lines for you and reports the minimum
and maximum slew rates to be used in the derating tables.
by Olivier Despaux Features that are integrated on the FPGA factor, including output access time, pack-
Senior Applications Engineer silicon die, such as digitally controlled age and routing skew, and data-to-strobe
Xilinx Inc. impedance (DCI), simplify the PCB lay- skew. In the case where the memory con-
[email protected] out design and enhance performance. troller is using a fixed phase-shift to register
This article discusses these design tech- data across the entire interface, the sum of
Wider parallel data buses, increasing data niques and hardware experiment results, the skew uncertainties must be accounted
rates, and multiple loads are challenges for illustrating the effect of design parameters for in the timing budget. If the worst-case
high-end memory interface designers. The on signal integrity. sum of skew uncertainties is high, it reduces
demand for higher bandwidth and the data valid window and thereby limits the
throughput is driving the requirement for Optimizing Timing in DDR SDRAM Interfaces guaranteed performance for the interface.
even faster clock frequencies. As valid sig- Shrinking data periods and significant Table 1 shows an example of timing param-
nal windows shrink, signal integrity (SI) memory timing uncertainties are making eters for a 267 MHz memory interface.
becomes a dominant factor in ensuring timing closure a real challenge in today’s Assuming that data capture is based on
that memory interfaces perform flawlessly. higher performance electronic systems. the timing of the DQS signals, leveraging
Chip and PCB-level design techniques Several design practices help in preserving the source-synchronous nature of the inter-
can improve simultaneous switching out- the opening of the valid data window. For face, it is possible to compute the memory
put (SSO) characteristics, making it easier example, in the case of interfaces with valid data window across the entire data
to achieve the signal integrity required in DDR2 SDRAM devices, the JEDEC stan- bus as follows:
wider memory interfaces. EDA vendors dard allows the memory device suppliers to
are making a wide range of tools available have a substantial amount of skew on the TMEMORY_VDW = TDATA_PERIOD
to designers for optimizing the signal data transmitted to the memory controller. - TDQSCK - TDQSQ - TQHS
integrity quality of memory interfaces. There are several components to this skew 1687 ps - 900 ps - 300 ps - 400 ps = 87 ps
TQHS 400 0 400 Hold skew factor specified In high data rate systems that are subject
by memory vendor to variations in voltage and temperature,
dynamic calibration is required. In leading
TDQSCK 900 450 450 DQS output access time edge interfaces, performing the calibration
from CK/CK# sequence periodically makes this scheme
independent of voltage and temperature
Table 1 – Memory parameters valid data window uncertainty summary
variations at all times.
by Larry French Memory Design, Testing, and Verification Tools • Phase 4 – Production
FAE Manager You can use many tools to simulate or
Micron Semiconductor Products, Inc. • Phase 5 – Post-Production (in the
debug a design. Table 1 lists the five essen-
[email protected] form of memory upgrades or field
tial tools for memory design. Note that this
replacements)
is not a complete list as it does not include
As a designer, you probably spend a signif- thermal simulation tools; instead, it focus-
icant amount of time simulating boards The Value of SI Testing
es only on those tools that you can use to
and building and testing prototypes. It is SI is not a panacea and should be used
validate the functionality and robustness of
critical that the kinds of tests performed on judiciously. SI should not be overused,
a design. Table 2 shows when these tools
these prototypes are effective in detecting although it frequently is. For very early or
can be used most effectively.
problems that can occur in production or alpha prototypes, SI is a key tool for
This article focuses on the five phases
in the field. ensuring that your system is free of a
of product development, as shown in
DRAM or other memory combined in number of memory problems, including:
Table 2:
an FPGA system may require different test • Ringing and overshoot/undershoot
methodologies than an FPGA alone. • Phase 1 – Design (no hardware,
Proper selection of memory design, test, only simulation) • Timing violations, such as:
and verification tools reduces engineering – Setup and hold time
• Phase 2 – Alpha (or Early) Prototype
time and increases the probability of
(design and hardware changes likely to – Slew rate (weakly driven or
detecting potential problems. In this arti-
occur before production) strongly driven signals)
cle, we’ll discuss the best practices for thor-
oughly debugging a Xilinx® FPGA design • Phase 3 – Beta Prototype (nearly – Setup/hold time (data, clock,
that uses memory. “production-ready” system) and controls)
Electrical Simulations SPICE or IBIS Simulation – Electrical Essential Very Valuable Limited Value Rarely Used No Value
Behavioral Simulations Verilog or VHDL Simulation – Behavioral Essential Very Valuable Limited Value Rarely Used No Value
Signal Integrity Oscilloscope and probes; Signal Integrity Unavailable Critical Limited Value Rarely Used No Value
possibly mixed-mode to
Margin Testing Unavailable Essential Essential Essential Essential
allow for more accurate
signal capture Compatibility Unavailable Valuable Essential Essential Essential
Margin Testing Guardband testing and Table 2 – Tools for verifying memory functionality versus design phase
four-corner testing by
variation of voltage
and temperature • SI is time-consuming. Probing 64-bit
or 72-bit data buses and taking scope
Compatibility Testing Functional software shots requires a great deal of time.
testing or system
reboot test • SI uses costly equipment. To gather
accurate scope shots, you need high-
Table 1 – Memory design, test, cost oscilloscopes and probes.
and verification tools
• SI takes up valuable engineering
resources. High-level engineering
– Clock duty cycle and differential
analysis is required to evaluate scope
clock crossing (CK/CK#) Figure 1 – Typical signal integrity shot shots.
– Bus contention from an oscilloscope
• SI does not find all errors. Margin and
By contrast, SI is not useful in the beta thousand scope shots in our SI lab dur- compatibility testing find errors that are
prototype phase unless there are changes to ing memory qualification testing. Based not detectable by SI.
the board signals. (After all, each signal net on this extensive data, we concluded The best tests for finding FPGA/
is validated in the alpha prototype.) that system problems are most easily memory issues are margin and compati-
However, if a signal does change, you can found with margin and compatibility bility testing.
use SI to ensure that no SI problems exist testing. Although SI is useful in the
with the changed net(s). Rarely – if ever – is alpha prototype phase, it should be Margin Testing
there a need for SI testing in production. replaced by these other tests during beta Margin testing is used to evaluate how sys-
SI is commonly overused for testing prototype and production. tems work under extreme temperatures
because electrical engineers are comfort- Here are some other results of our and voltages. Many system parameters
able looking at an oscilloscope and using SI testing: change with temperature/voltage, includ-
the captures or photographs as documen-
• SI did not find a single issue that ing slew rate, drive strength, and access
tation to show that a system was tested
was not identified by memory or time. Validation of a system at room tem-
(Figure 1). Yet extensive experience at
system-level diagnostics. In other perature is not enough. Micron found that
Micron Technology shows that much
words, SI found the same failures as another benefit of margin testing is that it
more effective tools exist for catching fail-
the other tests, thus duplicating the detects system problems that SI will not.
ures. In fact, our experience shows that SI
capabilities of margin testing and Four-corner testing is a best industry
cannot detect all types of system failures.
software testing. practice for margin testing. If a failure is
Limitations of SI Testing
SI testing has a number of fundamental How Does the Logic Analyzer (or Mixed-Mode Analysis) Fit In?
limitations. First and foremost is the
You may have noticed that Table 1 does not include logic analyzers. Although it is rare
memory industry migration to fine-pitch
to find a debug lab that does not include this tool as an integral part of its design and
ball-grid array (FBGA) packages.
debug process, we will not discuss logic analyzers in this article. Because of the cost and
Without taking up valuable board real
time involved, they are rarely the first tool used to detect a failure or problem in a sys-
estate for probe pins, SI is difficult or tem. Logic analyzers are, however, invaluable in linking a problem, after it has been
impossible because there is no way to identified, to its root cause. Like signal integrity (SI), logic analyzers should be used
probe under the package. after a problem has been detected.
Micron has taken several hundred
by Veena Kondapalli The reference design uses the phase-shifted – Divide the speed of the interface by
Applications Engineer Staff outputs of the DCM to clock the interface using multiple devices to achieve a
Cypress Semiconductor Corp. on the transmit side. This configuration gives given bandwidth
[email protected] the best jitter and skew characteristics. • Read: valid window worst-case 440 ps
QDR II devices include the following fea-
The growing demand for higher perform- tures: • Write: valid window worst-case 460 ps
ance communications, networking, and • Address and control signal timing
• Maximum frequency of operations -
DSP necessitates higher performance mem- analysis: command window worst-
250 MHz - tested up to 278 MHz
ory devices to support such applications. case 2360 ps
Memory manufacturers like Cypress have • Available in QDR II architecture with
developed specialized memory products burst of 2 or 4 Conclusion
such as quad data rate II (QDR II) SRAM • Supports simultaneous reads/writes For more information about QDR II and
devices to optimize memory bandwidth for and back-to-back transactions without Virtex-4 devices, see Xilinx application note
a specific system architecture. In this article, bus contention issues XAPP703, “QDR II SRAM Interface for
I’ll provide a general outline of a QDR II Virtex-4 Devices,” at www.xilinx.com/bvdocs/
SRAM interface implemented in a Xilinx® • Supports multiple QDR II SRAM
appnotes/xapp703.pdf, as well as Cypress
Virtex™-4 XC4VP25 FF6688-11 device. devices on the same bus to:
application note “Interfacing QDR-II
Figure 1 shows a block diagram of the – Increase the density of the memory SRAM with Virtex-4 Devices” at
QDR II SRAM design interface, with the resource www.cypress.com.
physical interface to the actual memory
device on the controller.
two data reads per clock cycle. It uses one USER_CLK0 QDR_K K
USER_CLK270 QDR_K_n K
port for writing data and one port for read-
USER_RESET
ing data. These unidirectional ports sup- CLK_DIV4 QDR_SA
18
SA
USER_W_n
port simultaneous reads and writes and 4
USER_BW_n QDR_W_n W
allow back-to-back transactions without (SDR) 18 4 (SDR)
USER_AD_WR QDR_BW_n BW
the bus contention that may occur with a QDR_D
36 (DDR)
D
(SDR) 36 QDR II SRAM
USER_DWL
single bidirectional data bus. (SDR) 36
Device
USER_DWH QDR_CQ CQ
USER_WR_FULL NC CQ
Clocking Scheme
QDR_R_n R
The FPGA generates all of the clock and USER_R_n
36 (DDR)
(SDR) 18 QDR_Q Q
USER_AD_RD
control signals for reads and writes to mem-
USER_RD_FULL
ory. The memory clocks are typically gener- DOFF
SIGNAL INTEGRITY
Xilinx Low-Noise FPGAs Meet SI Challenges
Unique chip package Essential noise
control
and I/Os accelerate To address these issues,
careful printed circuit-
system development board (PCB) design and
layout are critical for con-
trolling system-level noise
T
he good news is plentiful. Advances in and crosstalk. Another
silicon technology are enabling higher important consideration is
system performance to satisfy the the electrical characteris-
requirements of networking, wireless, tics of the components
video, and other demanding applications. At mounted on the PCB. With
the same time, an abundance of I/O pins its Virtex™-4 FPGAs,
with faster data edge-rates enables higher Xilinx, Inc., uses special
interface speeds. chip I/O and packaging
Design Example: 1.5 volt LVCMOS 4mA, I/O, 100 aggressors shown
However, every advance creates new technologies that signifi-
design challenges. Wide buses with hun- cantly improve signal-integrity not only at the Seven times less crosstalk
dreds of simultaneously switching outputs chip level, but also at the system-level. Analysis by independent signal-integrity expert
(SSOs) create crosstalk. The undesirable "Xilinx preempts signal-integrity issues Dr. Howard Johnson verifies the ability of
effects of this electrical noise include jitter at the chip level," says Xilinx Senior Director SparseChevron technology to control SSO
that threatens the stability of high-band- of Systems and Applications Engineering noise/crosstalk at the chip level. “High signal
width interfaces. Sharp edge-rates further Andy DeBaets. "This reduces board develop- integrity demands a low-noise chip” states
exacerbate noise problems. ment and debug effort, and may even make Johnson. “Compared to competing 90nm
the difference between a successful and FPGAs, the Virtex-4 FPGAs in SparseChevron
scrapped board design." packaging demonstrate seven times less SSO
“High
In high-speed systems, a significant noise,” Johnson stresses
signal integrity demands source of crosstalk is inductive coupling with- Xilinx Virtex-4 devices include several
a low-noise chip.
”
in the PCB via field under the BGA package. other features to help designers improve
In systems with sub-optimal design, noise system-level signal integrity prior to board
from simultaneously switching outputs can layout. Programmable edge rates and drive
reach levels that severely degrade perform- strength minimize noise while meeting other
ance. In extreme cases, this noise can even design objectives. Xilinx Digitally Controlled
lead to system failure. A properly designed Impedance (DCI) technology enables designers
BGA package minimizes inductive coupling to implement on-chip line termination for
between I/Os by placing a power/ground pin single-ended and differential I/Os. By eliminat-
Howard Johnson pair next to every signal pin. ing the need for external termination resistors,
The world’s foremost “Xilinx’s innovative SparseChevron™ DCI technology enables designers to minimize
authority on signal package design minimizes crosstalk prob- system component count and significantly
integrity lems that can degrade system performance. simplify board layout and manufacturing.
This is particularly important for wide, To learn more about how Virtex-4
high-speed DDR2 SDRAM or QDR II FPGAs can help control noise in your system,
SRAM memory designs,” DeBaets says. visit www.xilinx.com/virtex4.
Reference Designs
Memory interfaces are source-synchronous inter-
faces in which the clock/strobe and data being
transmitted from a memory device are edge-
aligned. Most memory interface and controller
vendors leave the read data capture implementa-
tion as an exercise for the user. In fact, the read
Give your designs the Virtex-4 FPGA advantage. data capture implementation in FPGAs is the
most challenging portion of the design. Xilinx
provides multiple read data capture techniques
for different memory technologies and perform-
ance requirements. All of these techniques are
implemented and verified in Xilinx® FPGAs.
The following sections provide a brief overview
of prevalent memory technologies.
• Data available both on the positive and XAPP702 DDR 2 SDRAM Controller Read data delayed such that
negative edges of the strobe 16 bits Using Virtex-4 Devices FPGA clock is centered in
DDR 2 SDRAM (Components) data window.
• Bi-directional, non-free-running, differ- SSTL-1.8V Virtex-4 267 MHz XAPP701 Memory Interfaces Data
144-bit
Class II Capture Using Direct Memory read strobe used
ential strobes that are output edge-aligned Registered DIMM
Clocking Technique to determine amount
with read data and must be input center- of read data delay.
Conclusion
For application notes on various memory
technologies and performance require-
ments, visit www.xilinx.com/memory. The
See all the new publications on our website.
summaries in Table 1 and Table 2 can help
you determine which application note is www.xilinx.com/xcell
relevant for a particular design.
Summary This application note describes a DDR2 SDRAM memory interface implementation in a
Spartan-3 device, interfacing with a Micron DDR2 SDRAM device. This document provides a
brief overview of the DDR2 SDRAM device features, followed by a detailed explanation of the
DDR2 SDRAM memory interface implementation.
DDR2 SDRAM DDR2 SDRAM devices are the next generation DDR SDRAM devices. The DDR2 SDRAM
Device memory interface is source-synchronous and supports double-data rate like DDR SDRAM
memory. DDR2 SDRAM devices use the SSTL 1.8V I/O standard.
Overview
DDR2 SDRAM devices use a DDR SDRAM architecture to achieve high-speed operation. The
memory operates using a differential clock provided by the controller. (The reference design on
the web does not support differential strobes. Support for this is planned to be added later.)
Commands are registered at every positive edge of the clock. A bi-directional data strobe
(DQS) is transmitted along with the data for use in data capture at the receiver. DQS is a strobe
transmitted by the DDR2 SDRAM device during reads, and by the controller during writes. DQS
is edge-aligned with data for reads, and center-aligned with data for writes.
Read and write accesses to the DDR2 SDRAM device are burst oriented. Accesses begin with
the registration of an active command and are then followed by a read or a write command. The
address bits registered coincident with the active command are used to select the bank and
row to be accessed. The address bits registered with the read or write command are used to
select the bank and starting column location for the burst access.
Interface Model The DDR2 SDRAM memory interface is layered to simplify the design and make the design
modular. Figure 1 shows the layered memory interface. The three layers consist of an
application layer, an implementation layer, and a physical layer.
© 2004 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and further disclaimers are as listed at http://www.xilinx.com/legal.htm. All other
trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice.
NOTICE OF DISCLAIMER: Xilinx is providing this design, code, or information "as is." By providing the design, code, or information as one possible implementation of this
feature, application, or standard, Xilinx makes no representation that this implementation is free from any claims of infringement. You are responsible for obtaining any rights you
may require for your implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of the implementation, including but not limited to any
warranties or representations that this implementation is free from claims of infringement and any implied warranties of merchantability or fitness for a particular purpose.
User Interface
Implementation Layer
Physical Layer
xapp549_02_113004
DDR2 SDRAM Figure 2 is a block diagram of the Spartan-3 DDR2 SDRAM memory interface. All four blocks
Controller shown in this figure are sub-blocks of the ddr2_top module. The function of each block is
explained in the following sections.
Modules
user_clk
Infrastructure
DDR2_IF
user_data IOBS
Data Path
xapp549_03_113004
Controller
The controller’s design is based on the design shown in XAPP253, Synthesizable 400 Mb/s
DDR SDRAM Controller, but is modified to incorporate changes for the DDR2 SDRAM
memory. It supports a burst length of four, and CAS latencies of three and four. The design is
modified to implement the write latency feature of the DDR2 SDRAM memory. The controller
initializes the EMR(2) and EMR(3) registers during the Load Mode command and also
generates differential data strobes.
The controller accepts user commands, decodes these user commands, and generates read,
write, and refresh commands to the DDR2 SDRAM memory. The controller also generates
signals for other modules. Refer to XAPP253 for detailed design and timing analyses of the
controller module.
Data Path
The data path module is responsible for transmitting data to and receiving data from the
memories. Major functions include:
• Writing data to the memory
• Reading data from the memory
• Transferring the read data from the memory clock domain to the FPGA clock domain
For a description of data write and data read capture techniques, see XAPP768c, Interfacing
Spartan-3 Devices With 166 MHz or 333 Mb/s DDR SDRAM Memories. The write data and
strobe are clocked out of the FPGA. The strobe is center-aligned with respect to the data during
writes. For DDR2 SDRAM memories, the strobe is non-free running. To meet the requirements
specified above, the write data is clocked out using a clock that is shifted 90° and 270° from the
primary clock going to the memory. The data strobes are generated out of primary clocks going
to the memory.
Memory read data is edge-aligned with a source-synchronous clock. The DDR2 SDRAM clock
is a non-free running strobe. The data is received using the non-free running strobe and
transferred to the FPGA clock domain. The input side of the data uses resources similar to the
input side of the strobe. This ensures matched delays on data and strobe signals until the
strobe is delayed in the strobe delay circuit.
Infrastructure
The Infrastructure module generates the FPGA clocks and reset signals. A Digital Clock
Manager (DCM) is used to generate the clock and its inverted version. A delay calibration
circuit is also implemented in this module.
The delay calibration circuit is used to select the number of delay elements used to delay the
strobe lines with respect to read data. The delay calibration circuit calculates the delay of a
circuit that is identical in all respects to the strobe delay circuit. All aspects of the delay are
considered for calibration, including all the component and route delays. The calibration circuit
selects the number of delay elements for any given time. After the calibration is done, it asserts
the select lines for the delay circuit. Refer to XAPP768c for details about delay calibration.
IOBS
All FPGA input and output signals are implemented in the IOBS module. All address and
control signals are registered going into and coming out from the IOBS module.
User Interface Table 1 shows user interface signal descriptions; all signal directions are with respect to the
Signals DDR2 SDRAM controller.
dip1 Input Clock enable signal for DDR2 SDRAM (active low)
This signal enables the dqs_div flop during DDR2 SDRAM memory
rst_dqs_div_in Input
read.
Write Data for DDR2 SDRAM, where 'n' is the width of the memory
user_input_data[(2n-1):0] Input
interface
This active Low signal indicates that read data from DDR2 SDRAM
user_data_valid Output
memory is valid.
sys_rst90 Input 90 degrees phase-shifted Reset generated with system reset input
sys_rst180 Input 180 degrees phase-shifted Reset generated with system reset input.
sys_rst270 Input 270 degrees phase-shifted Reset generated with system reset input.
Notes:
1. All signal directions are with respect to DDR2 SDRAM controller.
Signal Descriptions
user_input_data[(2n-1):0]
This is the write data to DDR2 SDRAM from the user interface. The data is valid on a DDR2
SDRAM write command, where n is the width of the DDR2 SDRAM memory. The DDR2
SDRAM controller converts single data rate to double data rate on the physical layer side.
user_input_address[addwidth:0]
This is the sum of row and column address for DDR2 SDRAM writes and reads. Depending on
address width variable selection, user_input_address is divided into row and column address bits.
user_bank_address[bankaddwidth:0]
Bank address for DDR2 SDRAM. There is a variable through which the bank address is selectable.
user_config_reg1[14:0]
Configuration data for DDR2 SDRAM memory initialization. The contents of this register are
loaded into the mode register during a Load Mode command. The format for user_config_reg1
is as follows:
14 13 11 10 9 7 6 4 3 2 0
Burst_length[2:0]
The controller supports only a burst length of four.
BT
This bit selects the burst type. The controller supports only sequential bursts. This bit is always
set to zero in the controller.
Cas_latency [6:4]
Bits 6:4 select the cas latency. The DDR2 SDRAM controller supports a cas latency of 3 and 4.
Res [9:7]
Bits 9:7 are reserved for future implementation.
TM
This bit is loaded into the TM bit of the Load Mode Register.
WR [13:11]
These three bits are written to WR (write recovery) bits of the Load Mode register.
PD
This bit is written to PD (Power Down Mode) bit of the Load Mode register.
Refer to the Micron DDR2 SDRAM data sheets for details on the Load Mode register.
user_config_reg2[12:0]
DDR2 SDRAM configuration data for the Extended Mode Register. The format of
user_config_reg2 is as follows.
12 11 10 9 7 6 4 3 2 1 0
Refer to the Micron DDR2 SDRAM data sheets for details on the Extended Mode register.
user_command_reg[3:0]
This is the user command register. Various commands are passed to the DDR2 SDRAM
module through this register. Table 2 illustrates the various supported commands.
burst_done
Users should enable this signal, for two clock periods, at the end of the data transfer. The
DDR2 SDRAM controller supports write burst or read burst for a single row. Users must
terminate on a column boundary and reinitialize on a column boundary for the next row of
transactions. The controller terminates a write burst or read burst by issuing a pre-charge
command to DDR2 SDRAM memory.
user_output_data[(2n-1):0]
This is the read data from DDR2 SDRAM memory. The DDR2 SDRAM controller converts DDR
SDRAM data from DDR2 SDRAM memory to SDR data. As the DDR SDRAM data is converted
to SDR data, the width of this bus is 2n, where n is data width of DDR2 SDRAM memory.
user_data_valid
The user_output_data[(2n-1):0] signal is valid on assertion of this signal.
user_cmd_ack
This is the acknowledgement signal for a user read or write command. It is asserted by the
DDR2 SDRAM controller during a read or write to DDR2 SDRAM. No new command should be
given to the controller until this signal is deasserted.
init_val
The DDR2 SDRAM controller asserts this signal after completing DDR2 SDRAM initialization.
ar_done
The DDR2 SDRAM controller asserts this signal for one clock cycle after the auto-refresh
command is given to DDR2 SDRAM.
Note: The output clock and reset signals can be used for data synchronization.
Table 3 shows memory interface signals.
Initializing Before issuing the memory read and write commands, the DDR2 SDRAM memory must be
DDR2 SDRAM initialized using the memory initialization command. The data written in the Mode Register and
in the Extended Mode Register should be placed on user_config_reg1 [14:0] and
Memory user_config_reg2 [12:0] until DDR2 SDRAM initialization is completed. Once the DDR2
SDRAM is initialized, the init_val signal is asserted by the DDR2 SDRAM controller. Figure 3
shows a timing diagram of the memory initialization command.
sys_clk
sys_clkb
init_val
xapp549_09_120804
1. Two clocks prior to placing the initialization command on command_reg [2:0], the user
places valid configuration data on user_config_reg1[14:0] and user_config_reg2[12:0].
2. The user places the initialization command on command_reg [2:0]. This starts the
initialization sequence.
3. Data on user_config_reg1[14:0] and user_config_reg2[12:0] should not be changed for any
subsequent memory operations.
4. The controller indicates that the configuration is complete by asserting the init_val signal.
DDR2 SDRAM Figure 4 shows a DDR2 SDRAM memory write timing diagram for a burst length of four. The
Memory Write waveform shows two successive bursts. Memory write is preceded by a write command to the
DDR2 SDRAM controller. In response to the write command, the DDR2 SDRAM controller
acknowledges with a user_cmd_ack signal on the rising edge of SYS_CLKb. Users should wait
for a user command acknowledged signal before proceeding to the next step.
Two and a half clock cycles after the user_cmd_ack signal assertion, the memory burst
address is placed on user_input_address[addwidth:0] lines. The user_input_address should be
asserted on the rising edge of SYS_CLK. The data to be written into memory should be
asserted with clk90_int_val and should be given to the controller before placing the memory
address on user_input_address. The user data width is twice that of the memory data width.
The controller converts it into double data rate before it is passed to memory.
For a burst length of four, two user_input_data[(2n-1):0] pieces of data are given to the DDR2
SDRAM controller with each user address. To terminate the write burst, burst_done is asserted
on the rising edge of SYS_CLK for two clocks. The burst_done signal should be asserted for
two clocks with the last memory address. Any further commands to the DDR2 SDRAM
controller should be given only after the user_cmd_ack signal is deasserted.
sys_clk
sys_clkb
clk90_int_val
1
user_command_reg[3:0] Write Command
2 6
user_cmd_ack
2.5 clks 4
user_input_address[21:0] Addr 1 Addr 2
3
user_input_data[(2n-1):0] Data 1 Data 2 Data 3 Data 4
5
burst_done
xapp549_05_120604
1. The user initiates a memory write by issuing a write command to the DDR2 SDRAM
controller. The write command must be asserted on the rising edge of the SYS_CLK.
2. The DDR2 SDRAM controller acknowledges the write command by asserting the
user_cmd_ack signal on the rising edge of the SYS_CLKb.
3. The user should place the data to be written into the memory onto the user_input_data pins
before placing the memory address on the user_input_address. The input data is asserted
with the clk90_int_val signal.
4. Two and half clocks after the user_cmd_ack signal assertion, the user should place the
memory address on user_input address [21:0]. The user_input_address signal should be
asserted on the rising edge of the SYS_CLK.
5. To terminate write burst, the user should assert the burst_done signal for two clocks with
the last user_input_address.
6. Any further commands to the DDR2 SDRAM controller should be given only after the
user_cmd_ack signal is de-asserted.
DDR2 SDRAM Figure 5 shows a memory read timing diagram for two successive bursts with a burst length of
Memory Read four. The user initiates a memory read by sending a read command to the DDR2 SDRAM
controller.
The read command flow is similar to the write command. A read command is asserted on the
rising edge of SYS_CLK. The DDR2 SDRAM controller asserts the user_cmd_ack signal in
response to the read command on the rising edge of SYS_CLKb. After two and half clock
cycles of user_cmd_ack, the memory burst read address is placed on
user_input_address[addwidth:0]. The user_input_address signal is asserted on the rising edge
of SYS_CLK.
The data read from the DDR2 SDRAM memory is available on user_output_data, which is
asserted with clk90_int_val. The data on user_output_data is valid only when user_data_valid
signal is asserted. As the DDR SDRAM data is converted to SDR data, the width of this bus is
2n, where n is the data width of the DDR2 SDRAM memory. For a read burst length of four, the
DDR2 SDRAM controller outputs only two data with each user address, each of 2n width of
DDR2 SDRAM memory. To terminate the read burst, a burst_done signal is asserted for two
clock cycles on the rising edge of SYS_CLK. The burst_done signal should be asserted after
the last memory address. Any further commands to the DDR2 SDRAM controller should be
given after user_cmd_ack signal deassertion.
sys_clk
sys_clkb
clk90_int_val
1
user_command_reg[3:0] Read Command
2 7
user_cmd_ack
2.5 clks 3
burst_done 6
4
user_valid_data
5
user_output_data[(2n-1):0] Data 1 Data 2 Data 3 Data 4
XAPP549_07_120604
5. The data read from the DDR2 SDRAM memory is available on user_output_data. The
user_output_data is asserted with clk90_int_val. Since the DDR SDRAM data is converted
to SDR data, the width of this bus is 2n, where n is the data width of the DDR2 SDRAM
memories. For a read burst length of four, with each user address the DDR2 SDRAM
controller outputs only two data words.
6. To terminate the read burst, the burst_done signal is asserted for two clocks on the rising
edge of SYS_CLK. The burst_done signal should be asserted with the last memory
address.
7. Any further commands to the DDR2 SDRAM controller should be given after the
user_cmd_ack signal is de-asserted.
DDR2 SDRAM The DDR2 SDRAM controller does not support memory refresh on its own and must
Memory periodically be provided with an auto_refresh command. The auto_refresh command is
asserted with SYS_CLK. The ar_done signal is asserted by the DDR2 SDRAM controller upon
Auto_Refresh completion of the auto_refresh command. The ar_done signal is asserted with SYS_CLKb.
Physical Layer The physical layer for DDR2 SDRAM is similar to the DDR SDRAM physical layer described in
and Delay application note XAPP768c. The delay calibration technique described in XAPP768c is also
used in the DDR2 SDRAM interface.
Calibration
Timing Calculations
Write Timing
Table 4: Write Data
Value Leading Edge Trailing Edge
Parameter Meaning
(ps) Uncertainities Uncertainities
Minimal skew, since the right/left sides are used and the
Tclock_skew 50 50 50
bits are close together
Tphase_offset_error 140 140 140 Offset error between different clocks from the same DCM
Read Timing
Table 5: Read Data
Value Leading Edge Trailing Edge
Parameter Meaning
(ps) Uncertainities Uncertainities
Tclock 6000 Clock period
Tclock_phase 3000 Clock phase
Tclock_duty_cycle_dist 300 0 0 Duty cycle distortion of clock to memory
Tdata_period 2700 Total data period, Tclock_phase-Tdcd
Tdqsq 350 350 0 Strobe to data distortion from memory data sheet
Tpackage_skew 90 90 90 Worst-case package skew
Tds 452 452 0 Setup time from Spartan-3 –5 data sheet
Tdh -35 0 -35 Hold time from Spartan-3 –5 data sheet
Data and Strobe jitter together, since they are
Tjitter 100 0 0
generated off of the same clock.
Tlocal_clock_line 20 20 20 Worst-case local clock line skew
Tpcb_layout_skew 50 50 50 Skew between data lines and strobes on the board
Tqhs 450 0 450 Hold skew factor for DQ from memory data sheet
Worst-case for leading and trailing can never happen
Total uncertainties 962 575
simultaneously.
Window for DQS position
1163 962 2125 Worst-case window of 1163 ps.
for normal case
Notes:
1. Reference for Tdqsq and Tqhs are from Micron data sheet for MT47H64M4FT-37E, Rev C, 05/04 EN.
2. Reference for Spartan-3 timing is –5 devices, Speeds file version 1.33.
Conclusion It is possible to implement a high-performance DDR2 SDRAM memory interface for Spartan-3
FPGAs. This design has been simulated, synthesized (with Synplicity), and taken through the
Xilinx Project Navigator flow.
Revision The following table shows the revision history for this document.
History
Date Version Revision
12/06/04 1.0 Initial Xilinx release.
Summary DDR2 SDRAM devices offer new features that go beyond the DDR SDRAM specification and
enable the DDR2 device to operate at data rates of 666 Mb/s. High data rates require higher
performance from the controller and the I/Os in the FPGA. To achieve the desired bandwidth,
it is essential for the controller to operate synchronously with the operating speed of the
memory.
Introduction This application note describes a 267 MHz and above DDR2 controller implementation in a
Virtex™-4 device interfacing to a Micron DDR2 SDRAM device. For performance levels of
267 MHz and above, the controller design outlined in this application note should be used
along with the read data capture technique explained in a separate application note entitled
XAPP721, High-Performance DDR2 SDRAM Interface Data Capture Using ISERDES and
OSERDES.
This application note provides a brief overview of DDR2 SDRAM device features followed by a
detailed explanation of the controller operation when interfacing to high-speed DDR2
memories. It also explains the backend user interface to the controller. A reference design in
Verilog is available for download from the Xilinx website:
http://www.xilinx.com/bvdocs/appnotes/xapp721.zip.
DDR2 SDRAM DDR2 SDRAM devices are the next generation devices in the DDR SDRAM family. DDR2
Overview SDRAM devices use the SSTL 1.8V I/O standard. The following section explains the features
available in the DDR2 SDRAM devices and the key differences between DDR SDRAM and
DDR2 SDRAM devices.
DDR2 SDRAM devices use a DDR architecture to achieve high-speed operation. The memory
operates using a differential clock provided by the controller. Commands are registered at
every positive edge of the clock. A bidirectional data strobe (DQS) is transmitted along with the
data for use in data capture at the receiver. DQS is a strobe transmitted by the DDR2 SDRAM
device during Reads and by the controller during Writes. DQS is edge-aligned with data for
Reads and center-aligned with data for Writes.
Read and write accesses to the DDR2 SDRAM device are burst oriented; accesses begin with
the registration of an Active command, which is then followed by a Read or Write command.
The address bits registered with the Active command are used to select the bank and row to be
accessed. The address bits registered with the Read or Write command are used to select the
bank and the starting column location for the burst access.
The DDR2 controller reference design includes a user backend interface to generate the Write
address, Write data, and Read addresses. This information is stored in three backend FIFOs
for address and data synchronization between the backend and controller modules. Based on
the availability of addresses in the address FIFO, the controller issues the correct commands to
the memory, taking into account the timing requirements of the memory. The implementation
details of the logic blocks are explained in the following sections.
© 2005–2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc.
All other trademarks are the property of their respective owners.
Notes:
1. Address signal A10 is held High during Precharge All Banks and is held Low during single bank
precharge.
A2 A1 A0 Burst Length
0 1 0 4
0 1 1 8
Others Reserved
A6 A5 A4 CAS Latency
0 1 0 2
0 1 1 3
A11 A10 A9 Write Recovery
1 0 0 4
0 0 1 2
1 0 1 5
0 1 0 3
Others Reserved
0 1 1 4
1 0 0 5
1 0 1 6
Others Reserved
x723_01_091505
Bank Addresses BA1 and BA0 select the Mode registers. Table 2 shows the Bank Address bit
configuration.
Table 2: Bank Address Bit Configuration
BA1 BA0 Mode Register
0 0 Mode Register (MR)
0 1 EMR1
1 0 EMR2
1 1 EMR3
Initialization Sequence
The initialization sequence used in the controller state machine follows the DDR2 SDRAM
specifications. The voltage requirements of the memory need to be met by the interface. The
following is the sequence of commands issued for initialization.
1. After stable power and clock, a NOP or Deselect command is applied for 200 2s.
2. CKE is asserted.
3. Precharge All command after 400 ns.
4. EMR (2) command. BA0 is held Low, and BA1 is held High.
5. EMR (3) command. BA0 and BA1 are both held High.
6. EMR command to enable the memory DLL. BA1 and A0 are held Low, and BA0 is held
High.
7. Mode Register Set command for DLL reset. To lock the DLL, 200 clock cycles are required.
8. Precharge All command.
9. Two Auto Refresh commands.
10. Mode Register Set command with Low to A8, to initialize device operation.
11. EMR command to enable OCD default by setting bits E7, E8, and E9 to 1.
12. EMR command to enable OCD exit by setting bits E7, E8 and E9 to 0.
After the initialization sequence is complete, the controller issues a dummy write followed by
dummy reads to the DDR2 SDRAM memory for the datapath module to select the right number
of taps in the Virtex-4 input delay block. The datapath module determines the right number of
delay taps required and then asserts the dp_dly_slct_done signal to the controller. The
controller then moves into the IDLE state.
Precharge Command
The Precharge command is used to deactivate the open row in a particular bank. The bank is
available for a subsequent row activation a specified time (tRP) after the Precharge command
is issued. Input A10 determines whether one or all banks are to be precharged.
Active Command
Before any Read or Write commands can be issued to a bank within the DDR2 SDRAM
memory, a row in the bank must be activated using an Active command. After a row is opened,
Read or Write commands can be issued to the row subject to the tRCD specification. DDR2
SDRAM devices also support posted CAS additive latencies; these allow a Read or Write
command to be issued prior to the tRCD specification by delaying the actual registration of the
Read or Write command to the internal device using additive latency clock cycles.
When the controller detects a conflict, it issues a Precharge command to deactivate the open
row and then issues another Active command to the new row. A conflict occurs when an
incoming address refers to a row in a bank other than the currently opened row.
Read Command
The Read command is used to initiate a burst read access to an active row. The values on BA0
and BA1 select the bank address, and the address inputs provided on A0 - Ai select the starting
column location. After the read burst is over, the row is still available for subsequent access
until it is precharged.
Figure 2 shows an example of a Read command with an additive latency of zero. Hence, in this
example, the Read latency is three, the same as the CAS latency.
T0 T1 T2 T3 T3n T4 T4n T5
CK
CK
Command READ NOP NOP NOP NOP NOP
Bank a,
Address Col n
RL = 3 (AL = 0, CL = 3)
DQS
DQS
DQ DOn
x723_02_091505
Write Command
The Write command is used to initiate a burst access to an active row. The value on BA0 and
BA1 select the bank address while the value on address inputs A0 - Ai select the starting
column location in the active row. DDR2 SDRAMs use a write latency equal to read latency
minus one clock cycle.
Write Latency = Read Latency – 1 = (Additive Latency + CAS Latency) – 1
Figure 3 shows the case of a Write burst with a Write latency of 2. The time between the Write
command and the first rising edge of the DQS signal is determined by the WL.
T0 T1 T2 T2n T3 T3n T4 T5
CK
CK
Command Write NOP NOP NOP NOP NOP
Bank a,
Address Col b
DM
x723_03_091605
DDR2 SDRAM The user interface to the DDR2 controller (Figure 4) and the datapath are clocked at half the
Interface frequency of the interface, resulting in improved design margin at frequencies above 267 MHz.
The operation of the controller at half the frequency does not affect the throughput or latency.
Design DDR2 SDRAM devices support a minimum burst size of 4, only requiring a command every
other clock. For a burst of 4, the controller issues a command every controller clock (the slow
clock). For a burst of 8, the controller issues a command every other controller clock (the slow
clock). All the FIFOs in the user interface are asynchronous FIFOs, allowing the user’s backend
to operate at any frequency. The I/Os toggle at the target frequency.
Virtex-4 FPGA
x723_04_020806
User Backend
The backend is designed to provide address and data patterns to test all the design aspects of
a DDR2 controller. The backend includes the following blocks: backend state machine, read
data comparator, and a data generator module. The data generation module generates the
various address and data patterns that are written to the memory. The address locations are
pre-stored in a block RAM, being used here as a ROM. The address values stored have been
selected to test accesses to different rows and banks in the DDR2 SDRAM device. The data
pattern generator includes a state machine issuing patterns of data. The backend state
machine emulates a user backend. This state machine issues the write or read enable signals
to determine the specific FIFO that will be accessed by the data generator module.
User Interface
The backend user interface has three FIFOs: the Address FIFO, the Write Data FIFO, and the
Read Data FIFO. The first two FIFOs are accessed by the user backend modules, while the
Read Data FIFO is accessed by the Datapath module used to store the captured Read data.
User-to-Controller Interface
Table 4 lists the signals between the user interface and the controller.
Table 4: Signals Between User Interface and Controller
Port Name Port Width Port Description Notes
Af_addr 36 Output of the Address FIFO in the Monitor FIFO-full status flag to write address into the
user interface. Mapping of these address FIFO.
address bits:
N Memory Address (CS, Bank,
Row, Column) - [31:0]
N Reserved - [32]
N Dynamic Command Request -
[35:33]
Af_empty 1 The user interface Address FIFO FIFO16 Empty Flag.
empty status flag output. The
controller processes the address
on the output of the FIFO when
this signal is deasserted.
ctrl_Waf_RdEn 1 Read Enable input to address This signal is asserted for one clock cycle when the
FIFO in the user interface. controller state is Write, Read, Load Mode Register,
Precharge All, Auto Refresh, or Active resulting from
dynamic command requests.
ctrl_Wdf_RdEn 1 Read Enable input to Write Data The controller asserts this signal for one clock cycle
FIFO in the user interface. after the first write state. This signal is asserted for
two clock cycles for a burst length of 8. Sufficient data
must be available in Write Data FIFO associated with
a write address for the required burst length before
issuing a write command. For example, for a 64-bit
data bus and a burst length of 4, the user should
input two 128-bit data words in the Write Data FIFO
for every write address before issuing the write
command.
The memory address (Af_addr) includes the column address, row address, bank address, and
chip-select width for deep memory interfaces (Table 5).
Table 5: Af_addr Memory Address
Address Description
Column Address col_ap_width – 1:0
Figure 5 describes four consecutive Writes followed by four consecutive Reads with a burst
length of 8. Table 7 lists the state signal values for Figure 5.
CLKdiv_0
State 0C 0E 0D 0E 0D 0E 0D 0E 16 09 0B 0A 0B 0A 0B 0A 0B
Ctrl_Waf_Rden
Ctrl_Wdf_Rden
Ctrl_Waf_Empty
X723_05_091905
CLKdiv_0
State 0C 0E 0D 0E 0D 0E 0D 0E 16 09 0B 0A 0B 0A 0B 0A 0B
Ctrl_Wr_En
Ctrl_Wren_Dis
Ctrl_Rden
Cas_latency 5
4
Additive_latency
X723_06_091505
Figure 6: Timing Waveform for Control Signals from the Controller to the Physical Layer
Controller The controller is clocked at the half the frequency of the interfaces. Therefore, the address,
Implementation bank address, and command signals (RAS, CAS, and WE) are asserted for two clock cycles of
the fast memory interface clocks. The control signals (CS, CKE, and ODT) are DDR of the half
frequency clocks, ensuring that the control signals are asserted for just one clock cycle of the
fast memory interface clock.
The controller state machine manages issuing the commands in the correct sequencing order
while determining the timing requirements of the memory.
Along with Figure 7, the following sections explain in detail the various stages of the controller
state machine.
Rst
Precharge Conflict/Refresh
RP_cnt= Initialization
0 IDLE
Init_done
h
res
Ref e
h_ don
fres
Re WR/RD
Auto
Refresh RD
Autorefresh/
Conflict Active
Burst
Read
WR Autorefresh/
RD
Conflict
Burst Active
Write Wait
WR WR
RD Conflict/
WR
Conflict/ Write –
First Read First
RD Write RD Read
Conflict/ Conflict/
RD WR
Read
Read_write
Write WR Wait
Wait
X723_07_092005
Design Figure 8 shows the design hierarchy beginning with a top-level module called
Hierarchy mem_interface_top.
mem_Interface_top
top test_bench
RAM_D
x723_08_091505
Resource The resource utilization for a 64-bit DDR2 SDRAM interface including the synthesizable test
Utilization bench is listed in Table 9.
Table 9: Resource Utilization
Resources Utilization Notes
Slices 3198 Includes the controller, synthesizable test bench, and the user
interface.
BUFGs 6 Includes one BUFG for the 200 MHz IDELAY block reference
clock.
BUFIOs 8 Equals the number of strobes in the interface.
DCMs 1
PMCDs 2
ISERDES 64 Equals the number of data bits in the interface.
OSERDES 90 Equals the sum of the data bits, strobes, and data mask signals.
The reference design for the 64-bit DDR2 SDRAM interface using the data capture technique is
available for download on the Xilinx website at:
http://www.xilinx.com/bvdocs/appnotes/xapp721.zip.
Conclusion The DDR2 controller described in this application note, along with the data capture method
from XAPP721, provide a good solution for high-performance memory interfaces. This design
provides high margin because all the logic in the FPGA fabric is clocked at half the frequency
of the interface, eliminating critical paths. This design was verified in hardware.
Revision The following table shows the revision history for this document.
History
Date Version Revision
12/15/05 1.0 Initial Xilinx release.
12/16/05 1.1 Updated Table 8 and Table 9.
02/02/06 1.2 Updated Figure 4.
02/08/06 1.3 Updated Figure 4.
Summary This application note describes a data capture technique for a high-performance DDR2
SDRAM interface. This technique uses the Input Serializer/Deserializer (ISERDES) and Output
Serializer/Deserializer (OSERDES) features available in every Virtex™-4 I/O. This technique
can be used for memory interfaces with frequencies of 267 MHz (533 Mb/s) and above.
Introduction A DDR2 SDRAM interface is source-synchronous where the read data and read strobe are
transmitted edge-aligned. To capture this transmitted data using Virtex-4 FPGAs, either the
strobe or the data can be delayed. In this design, the read data is captured in the delayed
strobe domain and recaptured in the FPGA clock domain in the ISERDES. The received serial,
double data rate (DDR) read data is converted to 4-bit parallel single data rate (SDR) data at
half the frequency of the interface using the ISERDES. The differential strobe is placed on a
clock-capable IO pair in order to access the BUFIO clock resource. The BUFIO clocking
resource routes the delayed read DQS to its associated data ISERDES clock inputs. The write
data and strobe transmitted by the FPGA use the OSERDES. The OSERDES converts 4-bit
parallel data at half the frequency of the interface to DDR data at the interface frequency. The
controller, datapath, user interface, and all other FPGA slice logic are clocked at half the
frequency of the interface, resulting in improved design margin at frequencies of 267 MHz and
above.
Clocking The clocking scheme for this design includes one digital clock manager (DCM) and two phase-
Scheme matched clock dividers (PMCDs) as shown in Figure 1. The controller is clocked at half the
frequency of the interface using CLKdiv_0. Therefore, the address, bank address, and
command signals (RAS_L, CAS_L, and WE_L) are asserted for two clock cycles (known as
"2T" timing), of the fast memory interface clock. The control signals (CS_L, CKE, and ODT) are
twice the rate (DDR) of the half frequency clock CLKdiv_0, ensuring that the control signals are
asserted for just one clock cycle of the fast memory interface clock. The clock is forwarded to
the external memory device using the Output Dual Data Rate (ODDR) flip-flops in the Virtex-4
I/O. This forwarded clock is 180 degrees out of phase with CLKfast_0. Figure 2 shows the
command and control timing diagram.
© 2005 – 2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc.
All other trademarks are the property of their respective owners.
Write Datapath
DCM PMCD#1
CLKfast Input CLKfast_0
CLKIN CLK0 CLKA CLKA1
System Reset*
RST CLK90 * RST CLKdiv_0
CLKA1D2
PMCD#2
CLKfast_90
CLKA CLKA1
CLKB
CLKdiv_90
* RST CLKA1D2
CLKFB
x702_04_051105
CLKdiv_0
CLKfast_0
Memory Device
Clock
Control (CS_L)
X721_02_080205
Write Datapath The write datapath uses the built-in OSERDES available in every Virtex-4 I/O. The OSERDES
transmits the data (DQ) and strobe (DQS) signals. The memory specification requires DQS to
be transmitted center-aligned with DQ. The strobe (DQS) forwarded to the memory is
180 degrees out of phase with CLKfast_0. Therefore, the write data transmitted using
OSERDES must be clocked by CLKfast_90 and CLKdiv_90 as shown in Figure 3. The timing
diagram for write DQS and DQ is shown in Figure 4.
Write Datapath
D1
DQ
D2
Write
Data
Words
0-3
D3
D4
OSERDES
CLKDIV CLK
CLKdiv_90
CLKfast_90
CLKfast_0
CLKfast_90
Clock Forwarded
to Memory Device
Control (CS_L)
Strobe (DQS)
Figure 4: Write Strobe (DQS) and Data (DQ) Timing for a Write Latency of Four
Uncertainties Uncertainties
Uncertainty Parameters Value Meaning
before DQS after DQS
Notes:
1. Skew between output flip-flops and output buffers in the same bank is considered to be minimal over voltage and temperature.
Write Datapath
Write Datapath
CLKdiv_0
Clock Forwarded
to Memory Device
CLKdiv_90
CLKfast_90
Control (CS_L)
ctrl_WrEn
ctrl_wr_disable
Strobe (DQS)
X721_05_080205
CLKdiv_0
CLKfast_0
Clock Forwarded
to Memory Device
CLKdiv_180
Control (CS_L)
ctrl_WrEn
ctrl_wr_disable
X721_06_080205
Figure 6: Write DQS Generation for a Write Latency of 4 and a Burst Length of 4
Read Datapath
Read Datapath The read datapath comprises the read data capture and recapture stages. Both stages are
implemented in the built-in ISERDES available in every Virtex-4 I/O. The ISERDES has three
clock inputs: CLK, OCLK, and CLKDIV. The read data is captured in the CLK (DQS) domain,
recaptured in the OCLK (FPGA fast clock) domain, and finally transferred to the CLKDIV
(FPGA divided clock) domain to provide parallel data.
N CLK: The read DQS routed using BUFIO provides the CLK input of the ISERDES as
shown in Figure 7.
N OCLK: The OCLK input of ISERDES is connected to the CLK input of OSERDES in
hardware. In this design, the CLKfast_90 clock is provided to the ISERDES OCLK input
and the OSERDES CLK input. The clock phase used for OCLK is dictated by the phase
required for write data.
N CLKDIV: It is imperative for OCLK and CLKDIV clock inputs to be phase-aligned for
correct functionality. Therefore, the CLKDIV input is provided with CLKdiv_90 that is
phase-aligned to CLKfast_90.
User Interface
FIFOs
DQ Delay Q1 Read Data
Read Data Word 3
to Align With
Strobe and
Q2 Read Data
FPGA Clock
Word 2
Q3 Read Data
Word 1
Q4 Read Data
Word 0
ISERDES
CLK OCLK CLKDIV BUFIO
DQS
CLKdiv_90
CLKfast_90
IOB
X721_07_063005
Read Datapath
Table 3 shows the read timing analysis at 333 MHz required to determine the delay required on
DQ bits for centering DQS in the data valid window.
Table 3: Read Timing Analysis at 333 MHz
Read Datapath
Figure 8 shows the timing waveform for read data and strobe delay determination. The
waveforms on the left show a case where the DQS is delayed due to BUFIO and clocking
resource, and the ISERDES outputs do not match the expected data pattern. The waveforms
on the right show a case where the DQS and DQ are delayed until the ISERDES outputs match
the expected data pattern. The lower end of the frequency range useful in this design is limited
by the number of available taps in the IDELAY block, the PCB trace delay, and the CAS latency
of the memory device.
CLKdiv_0
CLKfast_0
CLKfast_90
CLKdiv_90
DQ @ FPGA D0 D1 D2 D3 DQ @ FPGA D0 D1 D2 D3
Correct Data
D0 D2 D0 D2 Sequence
DQ Captured in DQS Domain
D1 D3 D1 D3
D0 D2 D0 D0 D2
Input to Q2 Reg
Input to Q1 Reg D1 D3 D1 D1 D3
CLKfast_90
Domain Input to Q4 Reg D0 D2 D0 D2
No Match
Incorrect Data
Input to Q3 Reg D1 D3 Sequence D1 D3
Read Datapath
CLKdiv_0
CLKfast_0
CLKdiv_90
CLKfast_90
Ctrl_RdEn
(Write_enable to FIFOs Aligned with ISERDES Data Output)
X721_09_113005
Reference Design
The ctrl_RdEn signal is required to validate read data because the DDR2 SDRAM devices do
not provide a read valid or read-enable signal along with read data. The controller generates
this read-enable signal based on the CAS latency and the burst length. This read-enable signal
is input to an SRL16 (LUT-based shift register). The number of register stages required to align
the read-enable signal to the ISERDES read data output is determined during calibration. One
read-enable signal is generated for each data byte. Figure 10 shows the read-enable logic
block diagram.
srl_out ctrl_RdEn
ctrl_RdEn_div0
SRL16 FD
Reference Figure 11 shows the hierarchy of the reference design. The mem_interface_top is the top-level
Design module. This reference design is available on the Xilinx website at:
http://www.xilinx.com/bvdocs/appnotes/xapp721.zip.
mem_Interface_top
top test_bench
RAM_D
X721_11_113005
Reference Table 5 lists the resource utilization for a 64-bit interface including the physical layer, the
Design controller, the user interface, and a synthesizable test bench.
Conclusion The data capture technique explained in this application note using ISERDES provides a good
margin for high-performance memory interfaces. The high margin can be achieved because all
the logic in the FPGA fabric is clocked at half the frequency of the interface, eliminating critical
paths.
Revision The following table shows the revision history for this document.
History
Date Version Revision
12/15/05 1.0 Initial Xilinx release.
12/20/05 1.1 Updated Table 1.
01/04/06 1.2 Updated link to reference design file.
02/02/06 1.3 Updated Table 4.
Day 2
! Physical PCB Structure
! On-Chip Termination
! SDRAM Design
! Mentor Lab 6
! Managing an Entire Design
© 2005 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and disclaimers are as listed at www.xilinx.com/legal.htm.
All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice.
www.xilinx.com/xcell/
R
Two speed grades faster with
PlanAhead software and Virtex-4
Xilinx ISE
with PlanAhead
With our unique PlanAhead software tool, and our industry-leading Virtex-4
Xilinx ISE
FPGAs, designers can now achieve a new level of performance. For complex,
high-utilization, multi-clock designs, no other competing FPGA comes close
Nearest to the Virtex-4 PlanAhead advantage:
Competitor
• 30% better logic performance on average = 2 speed grade advantage
• Over 50% better logic performance for complex multi-clock designs
1 2
Speed Grade Speed Grades
Meet Your Timing Budgets . . . Beat
Based on benchmark data from a suite of 15 real-world customer designs targeting Xilinx and competing
FPGA Solutions.
Your Competition To Market
Meeting timing budgets is the most critical issue facing FPGA designers*. Inferior
tools can hit a performance barrier, impacting your timing goals, while costing
you project delays and expensive higher speed grades. To maximize the Virtex-4
performance advantage, the new PlanAhead software tool allows you to quickly
analyze, floorplan, and improve placement and timing of even the most complex
designs. Now, with ISE and PlanAhead you can meet your timing budgets and
reduce design iterations, all within an easy-to-use design environment.
View The
TechOnLine
Seminar Today
©2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners.
Memory Interfaces
Solution Guide
www.xilinx.com/xcell/memory1/
Published by
© 2006 Xilinx Inc. All rights reserved. The Xilinx name is a registered trademark; CoolRunner, Virtex, Spartan, Virtex-II Pro, RocketIO, System ACE, WebPACK, HDL Bencher, ChipScope, LogiCORE, AllianceCORE, MicroBlaze, and PicoBlaze are trademarks;
and The Programmable Logic Company is a service mark of Xilinx Inc. PowerPC is a trademark of International Business Machines Corporation in the United States, or other countries, or both. All other trademarks are the property of their owners.