0% found this document useful (0 votes)
183 views72 pages

Xilinx ddr2 Memory Interfaces PDF

Uploaded by

Jagadish Baluvu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
183 views72 pages

Xilinx ddr2 Memory Interfaces PDF

Uploaded by

Jagadish Baluvu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Issue 1

March 2006

Memory Interfaces
Solution Guide

Overcoming
Memory Interface
Bottlenecks

INSIDE
ARTICLES

Implementing
High-Performance
Memory Interfaces
with Virtex-4 FPGAs

Meeting Signal Integrity


Requirements in FPGAs

How to Detect Potential


Memory Problems Early
in FPGA Designs

APPLICATION NOTES

667 Mbps DDR2 SDRAM


Interface Solution
Using Virtex-4 FPGAs

R
Dr. Howard Johnson
The world’s foremost
authority on
signal integrity

Fastest Memory Interfaces:


75 ps adaptive calibration

Supporting 667 Mbps DDR2 SDRAM interfaces, Virtex-4 FPGAs achieve the highest bandwidth
benchmark in the industry. Based on our unique ChipSync™ technology—built into every I/O—the
Virtex-4 family provides adaptive centering of the clock to the data valid window. By providing reliable
data capture (critical to high-performance memory interfacing), and 75 ps resolution for maximum
design margins, your memory design can now adapt to changing system conditions.

High Signal Integrity Means


High-Bandwidth Memory Interfaces
Virtex-4 FPGAs deliver the industry’s best signal integrity for high-speed and wide data bus
designs —7x less Simultaneous Switching Output (SSO) noise, a critical factor in ensuring reliable
interfaces. And Xilinx provides hardware verified interface solutions for popular memories such as
DDR2 SDRAM, QDR II SRAM, and more. The lab results speak for themselves. As measured by
industry expert Dr. Howard Johnson, no competing FPGA comes close to achieving the signal
integrity of Virtex-4 devices.

Visit www.xilinx.com/virtex4/memory today, and start your next memory interface design for
Virtex-4 FPGAs with the easy-to-use Memory Interface Generator software.

Dr. Howard Johnson, author of High-Speed Digital Design,


frequently conducts technical workshops for digital engineers
at Oxford University and other sites worldwide.
Visit www.sigcon.com to register.
The Programmable Logic CompanySM

View The
TechOnLine
Seminar Today

©2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners.
Faster, But More Challenging
Memory Interfaces
Solution Guide

W
PUBLISHER Forrest Couch
[email protected]
408-879-5270

Welcome to the Memory Interfaces Solution Guide, an educational journal of memory interface
EDITOR Charmaine Cooper Hussain design and implementation solutions from Xilinx. Engineers in the semiconductor and electronics
design community tasked to create high-performance system-level designs know well the growing
ART DIRECTOR Scott Blair
challenge of overcoming memory interface bottlenecks. This guide seeks to bring light to current
memory interface issues, challenges, and solutions, especially as they relate to extracting maximum
ADVERTISING SALES Dan Teie
1-800-493-5551
performance in FPGA designs.

Toward the latter half of the 1990s, memory interfaces evolved from single-data-rate SDRAMs to
double-data-rate (DDR) SDRAMs, the fastest of which is currently the DDR2 SDRAM, running
at 667 Mbps/pin. Present trends indicate that these rates are likely to double every four years,
potentially reaching 1.6 Gbps/pin by 2010. These trends present a serious problem to designers in
that the time period during which you can reliably obtain read data – the data valid window –
is shrinking faster than the data period itself.

This erosion of the data valid window introduces a new set of design challenges that require a more
effective means of establishing and maintaining reliable memory interface performance.

Along with the performance issues that attend the new breed of high-performance memories,
designers face a new set of memory controller design issues as well. The complexities and intricacies
of creating memory controllers for these devices pose a wide assortment of challenges that suggest a
need for a new level of integration support from the tools accompanying the FPGA.

In this guide, we offer a foundational set of articles covering the broad selection of resources and
solutions Xilinx offers, including the latest silicon features in the Virtex™-4 FPGA family that
address the shrinking data valid window, the availability of hardware reference designs to accelerate
your design efforts, and two application notes that discuss the latest technology advances enabling
the design of DDR2 SDRAM interfaces running at 667 Mbps/pin.

Xilinx, Inc.
2100 Logic Drive Enjoy!
San Jose, CA 95124-3400
Phone: 408-559-7778
FAX: 408-879-4780

© 2006 Xilinx, Inc. All rights reserved. XILINX,


the Xilinx Logo, and other designated brands included
herein are trademarks of Xilinx, Inc. PowerPC is a trade-
mark of IBM, Inc. All other trademarks are the property
of their respective owners. Adrian Cosoroaba
The articles, information, and other materials included in Marketing Manager,
this issue are provided solely for the convenience of our
readers. Xilinx makes no warranties, express, implied, Product Solutions Marketing
statutory, or otherwise, and accepts no liability with respect
to any such articles, information, or other materials or
Xilinx, Inc.
their use, and any use thereof is solely at the risk of the
user. Any person or entity using such information in any
way releases and waives any claim it might have against
Xilinx for any loss, damage, or expense caused thereby.
MEMORY INTERFACES SOLUTION GUIDE I S S U E 1, MARCH 2006

C O N T E N T S

ARTICLES

Implementing High-Performance Memory Interfaces with Virtex-4 FPGAs................5

Successful DDR2 Design...............................................................................9

Designing a Spartan-3 FPGA DDR Memory Interface ......................................14

Xilinx/Micron Partner to Provide High-Speed Memory Interfaces .......................16

Meeting Signal Integrity Requirements in FPGAs


with High-End Memory Interfaces.................................................................18

How to Detect Potential Memory Problems Early in FPGA Designs .....................22

Interfacing QDR II SRAM with Virtex-4 FPGAs ................................................25

APPLICATION NOTES

Memory Interfaces Reference Designs...........................................................27

DDR2 SDRAM Memory Interface for Spartan-3 FPGAs ....................................30

DDR2 Controller (267 MHz and Above) Using Virtex-4 Devices........................42

High-Performance DDR2 SDRAM Interface Data Capture

Using ISERDES and OSERDES.....................................................................55

EDUCATION

Signal Integrity for High-Speed Memory and Processor I/O .............................67

BOARDS

Virtex-4 Memory Interfaces .........................................................................68

Learn more about memory interfaces solutions


from Xilinx at: www.xilinx.com/xcell/memory1/
Implementing High-Performance
Memory Interfaces with Virtex-4 FPGAs
You can center-align clock-to-read data at “run time” with ChipSync technology.

by Adrian Cosoroaba
Marketing Manager
Xilinx, Inc.
[email protected]

As designers of high-performance systems


labor to achieve higher bandwidth while
meeting critical timing margins, one con-
sistently vexing performance bottleneck is
the memory interface. Whether you are
designing for an ASIC, ASSP, or FPGA,
capturing source-synchronous read data at
transfer rates exceeding 500 Mbps may
well be the toughest challenge.

Source-Synchronous Memory Interfaces


Double-data rate (DDR) SDRAM and
quad-data-rate (QDR) SRAM memories
utilize source-synchronous interfaces
through which the data and clock (or
strobe) are sent from the transmitter to the
receiver. The clock is used within the
receiver interface to latch the data. This
eliminates interface control issues such as
the time of signal flight between the mem-
ory and the FPGA, but raises new chal-
lenges that you must address.

March 2006 Memory Interfaces Solution Guide 5


One of these issues is how to meet the tions (voltage, temperature) – can easily cre- also cause data and address timing problems
various read data capture requirements to ate skew whereby the predetermined phase at the input to the RAM and the FPGA’s
implement a high-speed source-synchronous shift is ineffectual. I/O blocks (IOB) flip-flop. Furthermore, as
interface. For instance, the receiver must These techniques have allowed FPGA a bidirectional and non-free-running signal,
ensure that the clock or strobe is routed to all designers to implement DDR SDRAM the data strobe has an increased jitter com-
data loads while meeting the required input memory interfaces. But very high-speed 267 ponent, unlike the clock signal.
setup and hold timing. But source-synchro-
nous devices often limit the loading of the
forwarded clock. Also, as the data-valid win- Valid?
dow becomes smaller at higher frequencies, it Data
Lines 90 nm Competitor
becomes more important (and simultane- A fixed phase-shift delay
ously more challenging) to align the received Fixed cannot compensate for
clock with the center of the data. Delay changing system conditions
(process, voltage, and
Clock temperature), resulting in
Traditional Read Data Capture Method clock-to-data misalignment.
Source-synchronous clocking requirements
are typically more difficult to meet when Figure 1 – Traditional fixed-delay read data capture method
reading from memory compared with writ-
ing to memory. This is because the DDR
and DDR2 SDRAM devices send the data
ChipSync
edge aligned with a non-continuous strobe
signal instead of a continuous clock. For Data Lines IDELAY FPGA Fabric

low-frequency interfaces up to 100 MHz, (DQs) (tap delays)


State
DCM phase-shifted outputs can be used to Machine
capture read data.
IDELAY CNTL
Capturing read data becomes more chal-
lenging at higher frequencies. Read data can
be captured into configurable logic blocks
(CLBs) using the memory read strobe, but Xilinx Virtex-4 FPGAs
Data
Valid
the strobe must first be delayed so that its Lines Calibration with ChipSync is
edge coincides with the center of the data 75 ps
the only solution that ensures
Variable accurate centering of the
valid window. Finding the correct phase-shift Delay Resolution
clock to the data-valid window
value is further complicated by process, volt- under changing system
Clock
age, and temperature (PVT) variations. The conditions.
delayed strobe must also be routed onto low-
skew FPGA clock resources to maintain the
accuracy of the delay. Figure 2 – Clock-to-data centering using ChipSync tap delays
The traditional method used by FPGA,
ASIC, and ASSP controller-based designs MHz DDR2 SDRAM and 300 MHz QDR Clock-to-Data Centering Built into Every I/O
employs a phase-locked loop (PLL) or delay- II SRAM interfaces demand much tighter Xilinx® Virtex™-4 FPGAs with dedicat-
locked loop (DLL) circuit that guarantees a control over the clock or strobe delay. ed delay and clocking resources in the
fixed phase shift or delay between the source System timing issues associated with I/O blocks – called ChipSync™ technol-
clock and the clock used for capturing data setup (leading edge) and hold (trailing edge) ogy – answer these challenges. These
(Figure 1). You can insert this phase shift to uncertainties further minimize the valid devices make memory interface design
accommodate estimated process, voltage, window available for reliable read data cap- significantly easier and free up the FPGA
and temperature variations. The obvious ture. For example, 267 MHz (533 Mbps) fabric for other purposes. Moreover,
drawback with this method is that it fixes DDR2 read interface timings require FPGA Xilinx offers a reference design for mem-
the delay to a single value predetermined clock alignment within a .33 ns window. ory interface solutions that center-aligns
during the design phase. Thus, hard-to-pre- Other issues also demand your attention, the clock to the read data at “run time”
dict variations within the system itself – including chip-to-chip signal integrity, upon system initialization. This proven
caused by different routing to different simultaneous switching constraints, and methodology ensures optimum perform-
memory devices, variations between FPGA board layout constraints. Pulse-width distor- ance, reduces engineering costs, and
or ASIC devices, and ambient system condi- tion and jitter on clock or data strobe signals increases design reliability.

6 Memory Interfaces Solution Guide March 2006


ChipSync features are built into every I/O. This capability
provides additional flexibility if you are looking to alleviate board
layout constraints and improve signal integrity.
ChipSync technology enables clock-to- determine the phase relationship between transition (second-edge taps) in the
data centering without consuming CLB the FPGA clock and the read data received FPGA clock domain.
resources. Designers can use the memory at the FPGA. This is done using the mem- Having determined the values for first-
read strobe purely to determine the phase ory read strobe. Based on this phase rela- edge taps and second-edge taps, the state
relationship between the FPGA’s own tionship, the next step is to delay read data machine logic can compute the required
DCM clock output and the read data. The to center it with respect to the FPGA clock. data delay. The pulse center is computed
read data is then delayed to center-align the The delayed read data is then captured with these recorded values as (second-edge
taps – first-edge taps)/2. The required data
delay is the sum of the first-edge taps and
Second Edge First Edge the pulse center. Using this delay value,
Detected Detected
the data-valid window is centered with
Clock /
Strobe
respect to the FPGA clock.
ChipSync features are built into every
Read Data
I/O. This capability provides additional
First-Edge
Taps
flexibility if you are looking to alleviate
Second-Edge
Taps
board layout constraints and improve
Center-Aligned
signal integrity.
Data Delay
Taps
Each I/O also has input DDR flip-
Data Delay
Taps flops required for read data capture either
Delayed Read Data
in the delayed memory read strobe
domain or in the system (FPGA) clock
Internal
FPGA Clock domain. With these modes you can
achieve higher design performance by
avoiding half-clock-cycle data paths in the
Figure 3 – Clock-to-data centering at “run time” FPGA fabric.
Instead of capturing the data into a
CLB-configured FIFO, the architecture
FPGA clock in the read data window for directly in input DDR flip-flops in the provides dedicated 500 MHz block
data capture. In the Virtex-4 FPGA archi- FPGA clock domain. RAM with built-in FIFO functionality.
tecture, the ChipSync I/O block includes a The phase detection is performed at run These enable a reduction in design size,
precision delay block known as IDELAY time by issuing dummy read commands while leaving the CLB resources free for
that can be used to generate the tap delays after memory initialization. This is done to other functions.
necessary to align the FPGA clock to the receive an uninterrupted strobe from the
center of the read data (Figure 2). memory (Figure 3). Clock-to-Data Phase Alignment for Writes
Memory read strobe edge-detection The goal is to detect two edges or tran- Although the read operations are the most
logic uses this precision delay to detect the sitions of the memory read strobe in the challenging part of memory interface
edges of the memory read strobe from FPGA clock domain. To do this, you design, the same level of precision is
which the pulse center can be calculated in must input the strobe to the 64-tap required in write interface implementa-
terms of the number of delay taps counted IDELAY block that has a resolution of 75 tion. During a write to the external mem-
between the first and second edges. ps. Then, starting at the 0-tap setting, ory device, the clock/strobe must be
Delaying the data by this number of taps IDELAY is incremented one tap at a time transmitted center-aligned with respect to
aligns the center of the data window with until it detects the first transition in the data. In the Virtex-4 FPGA I/O, the
the edge of the FPGA DCM output. The FPGA clock domain. After recording the clock/strobe is generated using the output
tap delays generated by this precision delay number of taps it took to detect the first DDR registers clocked by a DCM clock
block allow alignment of the data and edge (first-edge taps), the state machine output (CLK0) on the global clock net-
clock to within 75 ps resolution. logic continues incrementing the taps one work. The write data is transmitted using
The first step in this technique is to tap at a time until it detects the second the output DDR registers clocked by a

March 2006 Memory Interfaces Solution Guide 7


DCM clock output that is phase-offset 90
degrees (CLK270) with respect to the clock
Virtex-4 FF1148 Stratix-II F1020
used to generate clock/strobe. This phase
shift meets the memory vendor specifica-
tion of centering the clock/strobe in the
data window.
Another innovative feature of the output
DDR registers is the SAME_EDGE mode
of operation. In this mode, a third register
clocked by a rising edge is placed on the
input of the falling-edge register. Using this
mode, both rising-edge and falling-edge
data can be presented to the output DDR Many Regions
Returns Spread Evenly Devoid of Returns
registers on the same clock edge (CLK270),
thereby allowing higher DDR performance
with minimal register-to-register delay. Figure 4 – Pin-out comparison between Virtex-4 and Stratix-II FPGAs

Signal Integrity Challenge


One challenge that all chip-to-chip, high-
speed interfaces need to overcome is signal 68 mV p-p (Virtex-4 FPGA)
integrity. Having control of cross-talk, Virtex-4 FPGA
ground bounce, ringing, noise margins, 1.5V LVCMOS
impedance matching, and decoupling is
now critical to any successful design.
The Xilinx column-based ASMBL 474 mV p-p (Stratix-II FPGA)
architecture enables I/O, clock, and
power and ground pins to be located any- Stratix-II FPGA
1.5V LVCMOS
where on the silicon chip, not just along
the periphery. This architecture alleviates
the problems associated with I/O and
array dependency, power and ground dis- Tek TDS6804B
tribution, and hard-IP scaling. Special Source: Dr. Howard Johnson

FPGA packaging technology known as


SparseChevron enables distribution of Figure 5 – Signal integrity comparison using the accumulated test pattern
power and ground pins evenly across the
package. The benefit to board designers is
improved signal integrity. an FPGA design. Unlike competing solu- greater than 1 Gbps for differential I/O and
The pin-out diagram in Figure 4 tions that restrict I/O placements to the more than 600 Mbps for single-ended I/O.
shows how Virtex-4 FPGAs compare with top and bottom banks of the FPGA and
a competing Altera Stratix-II device that functionally designate I/Os with respect to Conclusion
has many regions devoid of returns. address, data and clock, Virtex-4 FPGAs As with most FPGA designs, having the
The SparseChevron layout is a major provide unrestricted I/O bank placements. right silicon features solves only part of
reason why Virtex-4 FPGAs exhibit Finally, Virtex-4 devices offer a differ- the challenge. Xilinx also provides com-
unmatched simultaneous switching out- ential DCM clock output that delivers plete memory interface reference designs
put (SSO) performance. As demonstrated the extremely low jitter performance nec- that are hardware-verified and highly cus-
by signal integrity expert Howard essary for very small data-valid windows tomizable. The Memory Interface
Johnson, Ph.D., these domain-optimized and diminishing timing margins, ensur- Generator, a free tool offered by Xilinx,
FPGA devices have seven times less SSO ing a robust memory interface design. can generate all of the FPGA design files
noise and crosstalk when compared to These built-in silicon features enable (.rtl, .ucf ) required for a memory inter-
alternative FPGA devices (Figure 5). high-performance synchronous interfaces face through an interactive GUI and a
Meeting I/O placement requirements for both memory and data communications library of hardware-verified designs.
and enabling better routing on a board in single or differential mode. The For more information, visit www.
requires unrestricted I/O placements for ChipSync technology enables data rates xilinx.com/memory.
8 Memory Interfaces Solution Guide March 2006
Successful DDR2 Design
Mentor Graphics highlights design issues and solutions
for DDR2, the latest trend in memory design.
by Steve McKinney
HyperLynx Technical Marketing Engineer
Mentor Graphics
[email protected]

The introduction of the first SDRAM


interface, in 1997, marked the dawn of the
high-speed memory interface age. Since
then, designs have migrated through SDR
(single data rate), DDR (double data rate),
and now DDR2 memory interfaces to sus-
tain increasing bandwidth needs in prod-
ucts such as graphics accelerators and
high-speed routers. As a result of its high-
bandwidth capabilities, DDR and DDR2
technology is used in nearly every sector of
the electronics design industry – from
computers and networking to consumer
electronics and military applications.
DDR technology introduced the con-
cept of “clocking” data in on both a rising
and falling edge of a strobe signal in a
memory interface. This provided a 2x
bandwidth improvement over an SDR
interface with the same clock speed. This,
in addition to faster clock frequencies,
allowed a single-channel DDR400 inter-
face with a 200 MHz clock to support up
to 3.2 GB/s, a 3x improvement over the
fastest SDR interface. DDR2 also provided
an additional 2x improvement in band-
width over its DDR predecessor by dou-
bling the maximum clock frequency to 400
MHz. Table 1 shows how the progression
from SDR to DDR and DDR2 has
allowed today’s systems to maintain their
upward growth path.

March 2006 Memory Interfaces Solution Guide 9


SDR DDR DDR2

PC100 PC133 DDR - 200 DDR - 266 DDR - 333 DDR - 400 DDR2 - 400 DDR2 - 533 DDR2 - 667 DDR2 - 800

0.8 1.1 1.6 2.1 2.7 3.2 3.2 4.266 5.33 6.4

Single Channel Bandwidth (GB/s)

Table 1 – The progression from SDR to DDR and DDR2 has allowed today’s systems to maintain their
upward growth path. Speed grades and bit rates are shown for each memory interface.

With any high-speed interface, as sup-


Active VDDQ Inactive VDDQ
ported operating frequencies increase it DIMM ODT DIMM
becomes progressively more difficult to ODT
meet signal integrity and timing require- 2*RTT RTT
ments at the receivers. Clock periods
become shorter, reducing timing budgets to
a point where you are designing systems 2*RTT Receiver RTT
with only picoseconds of setup or hold mar-
gins. In addition to these tighter timing 22 Ohms 22 Ohms

budgets, signals tend to deteriorate because 22 Ohms


Driver
faster edge rates are needed to meet these
tight timing parameters. As edge rates get
faster, effects like overshoot, reflections, and
crosstalk become more significant problems Figure 1 – An example of ODT settings for a write operation
on the interface, which results in a negative in a 2 DIMM module system where RTT = 150 Ohms.
impact on your timing budget. DDR2 is no
exception, though the JEDEC standards
committee has created several new features
to aid in dealing with the adverse effects
that reduce system reliability.
Some of the most significant changes
incorporated into DDR2 include on-die
termination for data nets, differential
strobe signals, and signal slew rate derating
for both data and address/command sig-
nals. Taking full advantage of these new
features will help enable you to design a
robust memory interface that will meet
both your signal integrity and timing goals.

On-Die Termination
Figure 2 – The HyperLynx free-form schematic editor shows a pre-layout topology
The addition of on-die termination of an unbuffered 2 DIMM module system. Transmission line lengths on the DIMM
(ODT) has provided an extra knob with are from the JEDEC DDR2 unbuffered DIMM specification.
which to dial in and improve signal integri-
ty on the DDR2 interface. ODT is a nation values, allowing you to choose an unbuffered DIMM modules and ODT set-
dynamic termination built into the optimal solution for your specific design. tings of 150 Ohms at each DIMM. You
SDRAM chip and memory controller. It It is important to investigate the effects can simulate the effects of using different
can be enabled or disabled depending on of ODT on your received signals, and you ODT settings and determine which set-
addressing conditions and whether a read can easily do this by using a signal integrity tings would work best for this DDR2
or write operation is being performed, as software tool like Mentor Graphics’ design before committing to a specific
shown in Figure 1. In addition to being HyperLynx product. Consider the example board layout or creating a prototype.
able to turn termination off or on, ODT design shown in Figure 2, which shows a With the 150 Ohm ODT settings,
also offers the flexibility of different termi- DDR2-533 interface (266 MHz) with two Figure 3 shows significant signal degrada-

10 Memory Interfaces Solution Guide March 2006


DIMM, you must change the ODT value buses, DDR2’s developers implemented a
at the second DIMM. Setting the ODT at fairly advanced and relatively new timing
the second DIMM to 75 Ohms and re- concept to improve timing on the interface:
running the simulation, Figure 4 shows “signal slew rate derating.” Slew rate derat-
more than a 100 percent increase in the eye ing provides you with a more accurate pic-
aperture at the first DIMM, resulting in a ture of system-level timing on the DDR2
1.06 ns eye opening. As you can see, being interface by taking into account the basic
able to dynamically change ODT is a pow- physics of the transistors at the receiver.
erful capability to improve signal quality For DDR2, when any memory vendor
on the DDR2 interface. defines the setup and hold times for their
With respect to a DDR interface, ODT component, they use an input signal that has
allows you to remove the source termina- a 1.0V/ns input slew rate. What if the signals
Figure 3 – The results of a received signal at the tion, normally placed at the memory con- in your design have faster or slower slew rates
first DIMM in eye diagram form. Here, ODT troller, from the board. In addition, the than 1.0V/ns? Does it make sense to still
settings of 150 Ohms are being used at both
DIMM modules during a write operation. The
pull-up termination to VTT at the end of meet that same setup and hold requirement
results show there is an eye opening of approximately the data bus is no longer necessary. This defined at 1.0V/ns? Not really. This disparity
450 ps outside of the VinAC switching thresholds. reduces component cost and significantly drove the need for slew rate derating on the
improves the layout of the board. By signals specific to your design.
removing these terminations, you may be To clearly understand slew rate derating,
able to reduce layer count and remove let’s consider how a transistor works. It
unwanted vias on the signals used for layer takes a certain amount of charge to build
transitions at the terminations. up at the gate of the transistor before it
switches high or low. Consider the 1.0V/ns
Signal Slew Rate Derating slew rate input waveform between the
A challenging aspect of any DDR2 design switching region, Vref to Vin(h/l)AC, used
is meeting the setup and hold time require- to define the setup and hold times. You can
ments of the receivers. This is especially define a charge area under this 1.0V/ns
true for the address bus, which tends to curve that would be equivalent to the
have significantly heavier loading condi- charge it takes to cause the transistor to
tions than the data bus, resulting in fairly switch. If you have a signal that has a slew
slow edge rates. These slower edge rates can rate faster than 1.0V/ns, say 2.0V/ns, it
Figure 4 – This waveform shows a significant consume a fairly large portion of your tim- transitions through the switching region
improvement in the eye aperture with a new
ODT setting. Here, the ODT setting is 150 ing budget, preventing you from meeting much faster and effectively improves your
Ohms at the first DIMM and 75 Ohms at the your setup and hold time requirements. timing margin. You’ve added some amount
second DIMM. The signal is valid for 1.064 ns To enable you to meet the setup and of timing margin into your system, but that
with the new settings, which is an increase of 614
ps from the previous ODT settings. hold requirements on address and data was with the assumption of using the stan-
+ Δt - Δt

tion at the receiver, resulting in eye closure.


The eye shows what the signal looks like for
all bit transitions of a pseudo-random
VIH AC
(PRBS) bitstream, which resembles the data
that you might see in a DDR2 write trans- 2 V/ns
action. Making some simple measurements 1 V/ns
of the eye where it is valid outside the
VinhAC and VinlAC thresholds, you can
see that there is roughly a 450 ps window of 0.5 V/ns
valid signal at the first DIMM module.
It is appropriate to try to improve this VREF
eye aperture (opening) at the first DIMM
if possible, and changing the ODT setting Figure 5 – A 1V/ns signal has a defined charge area under the signal between Vref and VinhAC. A 2V/ns
signal would require a + Δt change in time to achieve the same charge area as the 1V/ns signal. A 0.5V/ns
is one of the options available for this. To signal would require a - Δt change in time to achieve the same charge area as the 1V/ns signal. This change
improve the signal quality at the first in time provides a clearer picture of the timing requirements needed for the receiver to switch.

March 2006 Memory Interfaces Solution Guide 11


dard setup and hold times defined at figuration and operating speed that you The first step in performing signal der-
1.0V/ns. In reality, you haven’t allowed can reliably support. But again, if you con- ating is to find a nominal slew rate of the
enough time for the transistor to reach the sider the charge potential at the gate with signal in the transition region between the
charge potential necessary to switch, so this slow slew rate, you would be able to Vref and Vin(h/l)AC threshold. That nom-
there is some uncertainty that is not being subtract some time out of your budget (as inal slew rate line is defined in the JEDEC
accounted for in your system timing budg- much as 1.42 ns under certain conditions) specification as the points of the received
et. To guarantee that your receiver has because the signal reached an equivalent waveform and Vref and VinhAC for a ris-
enough charge built up to switch, you have charge area earlier than when it crossed the ing edge, as shown in Figure 6.
to allow more time to pass so that sufficient VinAC threshold. It would be a daunting task to manual-
charge can accumulate at the gate. To assist you in meeting these timing ly measure each one of your signal edges to
Once the signal has reached a charge goals, the memory vendors took this slew determine a nominal slew rate for use in
area equivalent to the 1.0V/ns curve rate information into account and have the derating tables toward derating each
between the switching regions, you can constructed a derating table included in the signal. To assist with this process,
safely say that you have a valid received sig- DDR2 JEDEC specification (JESD79-2B HyperLynx simulation software includes
nal. You must now look at the time differ- on www.jedec.com). By using signal derat- built-in measurement capabilities designed
ence between reaching the VinAC ing, you are now considering how the tran- specifically for DDR2 slew rate measure-
switching threshold and the amount of sistors at the receiver respond to charge ments. This can reduce your development
time it took for the 2.0V/ns to reach an building at their gates in your timing budg- cycle and take the guesswork out of trying
equivalent charge area, and then add that ets. Although this adds a level of complexi- to perform signal derating. The HyperLynx
time difference into your timing budget, as ty to your analysis, it gives you more oscilloscope will automatically measure
shown in Figure 5. flexibility in meeting your timing goals, each of the edge transitions on the received
Conversely, if you consider a much while also providing you with higher visi- waveform, reporting back the minimum
slower slew rate, such as 0.1V/ns, it would bility into the actual timing of your system. and maximum slew rate values, which can
take a very long time to reach the switching then be used in the JEDEC derating tables.
threshold. You may never meet the setup Determining Slew Rate The scope also displays the nominal slew
and hold requirements in your timing To properly use the derating tables, it is rate for each edge transition, providing
budget with that slow of a slew rate important to know how to measure the slew confidence that the correct measurements
through the transition region. This could rate on a signal. Let’s look at an example of are being made (see Figure 7).
cause you to overly constrain the design of a slew rate measurement for the rising edge The nominal slew rate is acceptable for
your system, or potentially limit the con- of a signal under a setup condition. use in the derating tables as long as the

VDDQ

VIH(AC) min
VREF to AC
Region

VIH(DC) min
Nominal
Slew Rate
VREF(DC)
Nominal
Slew Rate

VIL(DC) max

VREF to AC
Region
VIL(AC) max

VSS

Figure 6 – The waveform illustrates how a nominal slew rate Figure 7 – The HyperLynx oscilloscope shows an automated measurement of the
is defined for a signal when performing a derating in a nominal slew rate for every edge in an eye diagram with the DDR2 slew rate der-
setup condition. The waveform is taken from the DDR2 ating feature. The measurement provides the minimum and maximum slew rates
JEDEC specification (JESD79-2B). that can then be used in the DDR2 derating tables in the JEDEC specification.

12 Memory Interfaces Solution Guide March 2006


VDDQ
Nominal
making it easier to identify whether this
Line condition is occurring.
VIH(AC) min
For a hold condition, you perform a
VREF to AC
Region slightly different measurement for the
VIH(DC) min slew rate. Instead of measuring from Vref
Tangent to the VinAC threshold, you measure
Line
VREF(DC)
from VinDC to Vref to determine the
Tangent
nominal slew rate (shown in Figure 10).
Line The same conditions regarding the nomi-
VIL(DC) max
nal slew rate line and the inspection of
VREF to AC
Region the signal to determine the necessity for a
VIL(AC) max tangent line for a new slew rate hold true
Nominal here as well.
Line Delta TR
VSS
Conclusion
Figure 8 – This waveform, taken from the DDR2 JEDEC specification, shows how With the new addition of ODT, you’ve
a tangent line must be found if any of the signal crosses the nominal slew rate line. seen how dynamic on-chip termination can
The slew rate of this tangent line would then be used in the DDR2 derating tables.
vastly improve signal quality. Performing
signal derating per the DDR2 SDRAM
received signal meets the condition of of this new tangent line now becomes specification has also shown that you can
always being above (for the rising edge) or your slew rate for signal derating. add as much as 1.42 ns back into your tim-
below (for the falling edge) the nominal You can see in the example that if there ing budget, giving you more flexibility in
slew rate line for a setup condition. If the is an aberration on the signal edge that your PCB design and providing you with a
signal does not have clean edges – possi- would require you to find this new tan- better understanding of system timing.
bly having some non-monotonicity or gent line slew rate, HyperLynx automati- Equipped with the right tools and an
“shelf ”-type effect that crosses the nomi- cally performs this check for you. If understanding of underlying technology,
nal slew rate line – you must define a new necessary, the oscilloscope creates the tan- you will be able to move your designs from
slew rate. This new slew rate is a tangent gent line, which becomes part of the min- DDR to DDR2 in a reasonably pain-free
line on the received waveform that inter- imum and maximum slew rate results. As process – realizing the added performance
sects with VinhAC and the received wave- Figure 9 shows, the HyperLynx oscillo- benefits and component-count reductions
form, as shown in Figure 8. The slew rate scope also displays all of the tangent lines, promised by DDR2.

Figure 9 – The HyperLynx oscilloscope shows how the tangent line is automati- Figure 10 – The oscilloscope shows how a derating for a hold
cally determined for you in the DDR2 slew rate derating feature. The slew rate condition is being performed on the received signal. The DC
lines in the display indicate that they are tangent lines because they no longer thresholds are used in place of the AC switching thresholds,
intersect with the received signal and Vref intersection. The oscilloscope deter- which are noted in the DDR2 derating dialog.
mines the slew rate of these new tangent lines for you and reports the minimum
and maximum slew rates to be used in the derating tables.

March 2006 Memory Interfaces Solution Guide 13


Designing a Spartan-3 FPGA
DDR Memory Interface
Xilinx provides many tools to implement
customized DDR memory interfaces.

by Rufino Olay DDR SDRAM is an evolutionary of storage elements. The storage-element


Marketing Manager, Spartan Solutions extension of “single-data-rate” SDRAM pair on either the output path or the three-
Xilinx, Inc. and provides the benefits of higher speed, state path can be used together with a spe-
[email protected] reduced power, and higher density com- cial multiplexer to produce DDR
ponents. Data is clocked into or out of the transmission. This is accomplished by tak-
Karthikeyan Palanisamy device on both the rising and falling edges ing data synchronized to the clock signal’s
Staff Engineer, Memory Applications Group of the clock. Control signals, however, still rising edge and converting it to bits syn-
Xilinx, Inc. change only on the rising clock edge. chronized on both the rising and falling
[email protected] DDR memory is used in a wide range edge. The combination of two registers and
of systems and platforms and is the com- a multiplexer is referred to as double-data-
Memory speed is a crucial component of puting memory of choice. You can use rate D-type flip-flop (FDDR).
system performance. Currently, the most Xilinx® Spartan™-3 devices to implement
common form of memory used is synchro- a custom DDR memory controller on Memory Controllers Made Fast and Easy
nous dynamic random access memory your board. Xilinx has created many tools to get design-
(SDRAM). ers quickly through the process of building
The late 1990s saw major jumps in Interfacing Spartan-3 and testing memory controllers for Spartan
SDRAM memory speeds and technology Devices with DDR SDRAMs devices. These tools include reference
because systems required faster perform- Spartan-3 platform FPGAs offer an ideal designs and application notes, the Memory
ance and larger data storage capabilities. connectivity solution for low-cost systems, Interface Generator (MIG), and more
By 2002, double-data-rate (DDR) providing the system-level building blocks recently, a hardware test platform.
SDRAM became the standard to meet necessary to successfully interface to the Xilinx application note XAPP454,
this ever-growing demand, with latest generation of DDR memories. “DDR2 SDRAM Memory Interface for
DDR266 (initially), DDR333, and Included in all Spartan-3 FPGA Spartan-3 FPGAs,” describes the use of a
recently DDR400 speeds. input/output blocks (IOB) are three pairs Spartan-3 FPGA as a memory controller,

14 Memory Interfaces Solution Guide March 2006


using MIG. The results in Table 1 show
DQS
that the implementation would use 17% of
DQ
the slices, leaving more than 80% of the
device free for data-processing functions.
Internally or
Externally
Delayed DQS
to Capture DQ
Testing Out Your Designs
The last sequence in a design is the verifi-
Phase-Shifted
cation and debug in actual hardware.
DCM Output After using MIG 007 to create your cus-
to Capture DQ
tomized memory controller, you can
implement your design on the Spartan-3
Memory Development Kit, HW-S3-
Figure 1 – Read operation timing diagram SL361, as shown in Figure 3. The $995
kit is based on a Spartan-3 1.5M-gate
FPGA (the XC3S1500) and includes
with particular focus on interfacing to a delayed strobe can be centered in the data additional features such as:
Micron MT46v32M16TG-6T DDR window for data capture.
• 64 MB of DDR SDRAM Micron
SDRAM. This and other application notes To maximize resources within the FPGA,
MT5VDDT1672HG-335, with an
illustrate the theory of operations, key chal- you can explore design techniques such as
additional 128 MB DDR SDRAM
lenges, and implementations of a Spartan- using the LUTs as RAMs for data capture –
DIMM for future expansion
3 FPGA-based memory controller. while at the same time minimizing the use
DDR memories use non-free-running of global clock buffers (BUFGs) and digital • Two-line LCD
strobes and edge-aligned read data clock managers (DCMs) – as explained in
• 166 MHz oscillator
(Figure 1). For 333 Mbps data speeds, the the Xilinx application notes. Results are
memory strobe must be used for higher given with respect to the maximum data • Rotary switches
margins. Using local clocking resources, a width per FPGA side for either right and left
• Universal power supply 85V-240V,
or top and bottom
50-60 MHz
implementations.
Implementation chal-
lenges such as these are
mitigated with the
new Memory Interface
Generator.
Xilinx created the
Memory Interface
Generator (MIG 007)
to take the guesswork
out of designing your
own controller. To cre-
ate the interface, the
tool requires you to Figure 3 – Spartan-3 memory
input data including development board (HW-S3-SL361)
Figure 2 – Using the MIG 007 to automatically
create a DDR memory controller FPGA device, frequen-
cy, data width, and Conclusion
banks to use. The inter- With the popularity of DDR memory
active GUI (Figure 2) increasing in system designs, it is only nat-
Feature Utilization Percent Used
generates the RTL, ural that designers use Spartan-3 FPGAs as
Number of Slices 2,277 out of 13,312 17% EDIF, SDC, UCF, and memory controllers. Implementing the
Number of DCMs 1 out of 4 25% related document files. controller need not be difficult.
Number of External IOBs 147 out of 487 30% As an example, we For more information about the applica-
created a DDR 64-bit tion notes, GUI, and development board,
Table 1 – Device utilization for a DDR interface for a Spartan please visit www.xilinx.com/products/
64-bit interface in an XC3S1500 FPGA XC3S1500-5FG676 design_resources/mem_corner/index.htm.

March 2006 Memory Interfaces Solution Guide 15


Xilinx/Micron Partner to Provide
High-Speed Memory Interfaces
Micron’s RLDRAM II and DDR/DDR2 memory combines performance-critical features
to provide both flexibility and simplicity for Virtex-4-supported applications.
by Mike Black RLDRAM II devices to achieve peak els, as well as programmable output imped-
Strategic Marketing Manager bandwidth by decreasing the probability of ance that enables compatibility with both
Micron Technology, Inc. random access conflicts. HSTL and SSTL I/O schemes. Micron’s
[email protected] In addition, incorporating eight banks RLDRAM II devices are also equipped
results in a reduced bank size compared to with on-die termination (ODT) to enable
With network line rates steadily increas- typical DRAM devices, which use four. more stable operation at high speeds in
ing, memory density and performance are The smaller bank size enables shorter multipoint systems. These features provide
becoming extremely important in address and data lines, effectively reducing simplicity and flexibility for high-speed
enabling network system optimization. the parasitics and access time. designs by bringing both end termination
Micron Technology’s RLDRAM™ and Although bank management remains and source termination resistors into the
DDR2 memories, combined with Xilinx® important with RLDRAM II architec- memory device. You can take advantage of
Virtex-4™ FPGAs, provide a platform ture, even at its worst case (burst of two at these features as needed to reach the
designed for performance. 400 MHz operation), one bank is always RLDRAM II operating speed of 400 MHz
This combination provides the critical available for use. Increasing the burst DDR (800 MHz data transfer).
features networking and storage applications length of the device increases the number At high-frequency operation, however, it
need: high density and high bandwidth. The of banks available. is important that you analyze the signal driv-
ML461 Advanced Memory Development er, receiver, printed circuit board network,
System (Figure 1) demonstrates high-speed I/O Options and terminations to obtain good signal
memory interfaces with Virtex-4 devices and RLDRAM II architecture offers separate integrity and the best possible voltage and
helps reduce time to market for your design. I/O (SIO) and common I/O (CIO) timing margins. Without proper termina-
options. SIO devices have separate read tions, the system may suffer from excessive
Micron Memory and write ports to eliminate bus turn- reflections and ringing, leading to reduced
With a DRAM portfolio that’s among the around cycles and contention. Optimized voltage and timing margins. This, in turn,
most comprehensive, flexible, and reliable for near-term read and write balance, can lead to marginal designs and cause ran-
in the industry, Micron has the ideal solu- RLDRAM II SIO devices are able to dom soft errors that are very difficult to
tion to enable the latest memory platforms. achieve full bus utilization. debug. Micron’s RLDRAM II devices pro-
Innovative new RLDRAM and DDR2 In the alternative, CIO devices have a vide simple, effective, and flexible termina-
architectures are advancing system designs shared read/write port that requires one tion options for high-speed memory designs.
farther than ever, and Micron is at the fore- additional cycle to turn the bus around.
front, enabling customers to take advan- RLDRAM II CIO architecture is optimized On-Die Source Termination Resistor
tage of the new features and functionality for data streaming, where the near-term bus The RLDRAM II DQ pins also have on-
of Virtex-4 devices. operation is either 100 percent read or 100 die source termination. The DQ output
percent write, independent of the long-term driver impedance can be set in the range of
RLDRAM II Memory balance. You can choose an I/O version that 25 to 60 ohms. The driver impedance is
An advanced DRAM, RLDRAM II mem- provides an optimal compromise between selected by means of a single external resis-
ory uses an eight-bank architecture opti- performance and utilization. tor to ground that establishes the driver
mized for high-speed operation and a The RLDRAM II I/O interface pro- impedance for all of the device DQ drivers.
double-data-rate I/O for increased band- vides other features and options, including As was the case with the on-die end ter-
width. The eight-bank architecture enables support for both 1.5V and 1.8V I/O lev- mination resistor, using the RLDRAM II

16 Memory Interfaces Solution Guide March 2006


on-die source termination resistor elimi- DDR SDRAM

nates the need to place termination resistors DDR 2


on the board – saving design time, board SDRAM

space, material costs, and assembly costs,


while increasing product reliability. It also
DDR SDRAM
eliminates the cost and complexity of end DDR 2
DIMM
SDRAM DIMM
termination for the controller at that end of
the bus. With flexible source termination, FCRAM II

you can build a single printed circuit board


QDR II
with various configurations that differ only SRAM

by load options, and adjust the Micron


RLDRAM II memory driver impedance
with a single resistor change.
RLDRAM II
DDR/DDR2 SDRAM
DRAM architecture changes enable twice the
bandwidth without increasing the demand on Figure 1 – ML461 Advanced Memory Development System
the DRAM core, and keep the power low.
These evolutionary changes enable DDR2 to Xilinx engineered the ML461 Virtex-4 devices. The ML461 system,
operate between 400 MHz and 533 MHz, Advanced Memory Development System which also includes the whole suite of ref-
with the potential of extending to 667 MHz to demonstrate high-speed memory inter- erence designs to the various memory
and 800 MHz. A summary of the functional- faces with Virtex-4 FPGAs. These include devices and the memory interface genera-
ity changes is shown in Table 1. interfaces with Micron’s PC3200 and tor, will help you implement flexible, high-
Modifications to the DRAM architec- PC2-5300 DIMM modules, DDR400 bandwidth memory solutions with
ture include shortened row lengths for and DDR2533 components, and Virtex-4 devices.
reduced activation power, burst lengths of RLDRAM II devices. Please refer to the RLDRAM informa-
four and eight for improved data bandwidth In addition to these interfaces, the tion pages at www.micron.com/products/
capability, and the addition of eight banks ML461 also demonstrates high speed dram/rldram/ for more information and
in 1 Gb densities and above. QDR-II and FCRAM-II interfaces to technical details.
New signaling features include on-die ter-
mination (ODT) and on-chip driver (OCD). FEATURE/OPTION DDR DDR2
ODT provides improved signal quality, with Data Transfer Rate 266, 333, 400 MHz 400, 533, 667, 800 MHz
better system termination on the data signals. Package TSOP and FBGA FBGA only
OCD calibration provides the option of tight-
Operating Voltage 2.5V 1.8V
ening the variance of the pull-up and pull-
I/O Voltage 2.5V 1.8V
down output driver at 18 ohms nominal.
I/O Type SSTL_2 SSTL_18
Modifications were also made to the mode
register and extended mode register, including Densities 64 Mb-1 Gb 256 Mb-4 Gb
column address strobe CAS latency, additive Internal Banks 4 4 and 8
latency, and programmable data strobes. Prefetch (MIN Write Burst) 2 4
CAS Latency (CL) 2, 2.5, 3 Clocks 3, 4, 5 Clocks
Conclusion Additive Latency (AL) No 0, 1, 2, 3, 4 Clocks
The built-in silicon features of Virtex-4 READ Latency CL AL + CL
devices – including ChipSync™ I/O tech- WRITE Latency Fixed READ Latency – 1 Clock
nology, SmartRAM, and Xesium differential I/O Width x4/ x8/ x16 x4/ x8/ x16
clocking – have helped simplify interfacing Output Calibration None OCD
FPGAs to very-high-speed memory devices. Data Strobes Bidirectional Strobe Bidirectional Strobe
A 64-tap 80 ps absolute delay element as well (Single-Ended) (Single-Ended or Differential)
as input and output DDR registers are avail- with RDQS
able in each I/O element, providing for the On-Die Termination None Selectable
first time a run-time center alignment of data Burst Lengths 2, 4, 8 4, 8
and clock that guarantees reliable data cap-
ture at high speeds. Table 1 – DDR/DDR2 feature overview

March 2006 Memory Interfaces Solution Guide 17


Meeting Signal Integrity Requirements in
FPGAs with High-End Memory Interfaces
As valid signal windows shrink, signal integrity becomes a dominant
factor in ensuring that high-end memory interfaces perform flawlessly.

by Olivier Despaux Features that are integrated on the FPGA factor, including output access time, pack-
Senior Applications Engineer silicon die, such as digitally controlled age and routing skew, and data-to-strobe
Xilinx Inc. impedance (DCI), simplify the PCB lay- skew. In the case where the memory con-
[email protected] out design and enhance performance. troller is using a fixed phase-shift to register
This article discusses these design tech- data across the entire interface, the sum of
Wider parallel data buses, increasing data niques and hardware experiment results, the skew uncertainties must be accounted
rates, and multiple loads are challenges for illustrating the effect of design parameters for in the timing budget. If the worst-case
high-end memory interface designers. The on signal integrity. sum of skew uncertainties is high, it reduces
demand for higher bandwidth and the data valid window and thereby limits the
throughput is driving the requirement for Optimizing Timing in DDR SDRAM Interfaces guaranteed performance for the interface.
even faster clock frequencies. As valid sig- Shrinking data periods and significant Table 1 shows an example of timing param-
nal windows shrink, signal integrity (SI) memory timing uncertainties are making eters for a 267 MHz memory interface.
becomes a dominant factor in ensuring timing closure a real challenge in today’s Assuming that data capture is based on
that memory interfaces perform flawlessly. higher performance electronic systems. the timing of the DQS signals, leveraging
Chip and PCB-level design techniques Several design practices help in preserving the source-synchronous nature of the inter-
can improve simultaneous switching out- the opening of the valid data window. For face, it is possible to compute the memory
put (SSO) characteristics, making it easier example, in the case of interfaces with valid data window across the entire data
to achieve the signal integrity required in DDR2 SDRAM devices, the JEDEC stan- bus as follows:
wider memory interfaces. EDA vendors dard allows the memory device suppliers to
are making a wide range of tools available have a substantial amount of skew on the TMEMORY_VDW = TDATA_PERIOD
to designers for optimizing the signal data transmitted to the memory controller. - TDQSCK - TDQSQ - TQHS
integrity quality of memory interfaces. There are several components to this skew 1687 ps - 900 ps - 300 ps - 400 ps = 87 ps

18 Memory Interfaces Solution Guide March 2006


Uncertainty Uncertainty Uncertainty • On one byte of data, the data hold skew
Value (ps) Description factor, TDQSQ, and the strobe-to-data dis-
Parameters Before Clock After Clock
tortion parameter, TQHS, are removed
TCLOCK 3750 Clock period from the timing budget equation.
TMEM_DCD 188 0 0 Duty cycle distortion tolerance • On one bit of data, the DQS-to-DQ
is subtracted from clock phase skew uncertainty, TDQSCK, is accounted
to determine TDATA_PERIOD for in addition to the data hold skew
TDATA_PERIOD 1687 Data period is half the clock factor, TDQSQ, and the strobe-to-data
period minus 10% of duty distortion parameter, TQHS. In this
cycle distortion case, the valid data window is equal
to the data period as provided by
TDQSQ 300 300 0 Strobe-to-data distortion
specified by memory vendor the memory device.

TQHS 400 0 400 Hold skew factor specified In high data rate systems that are subject
by memory vendor to variations in voltage and temperature,
dynamic calibration is required. In leading
TDQSCK 900 450 450 DQS output access time edge interfaces, performing the calibration
from CK/CK# sequence periodically makes this scheme
independent of voltage and temperature
Table 1 – Memory parameters valid data window uncertainty summary
variations at all times.

Increasing Bandwidth with Wider Data Buses:


Distributed Power and Ground Pins
Meeting higher bandwidth requirements
can be achieved by widening data buses.
Interfaces of 144 or 288 bits are not
uncommon nowadays. Memory controller
device packages with many I/Os are
required to achieve those wide buses.
Numerous bits switching simultaneously
can create signal integrity problems. The
SSO limit is specified by the device vendor
Pinout of Package with Distributed Pinout of Package with Centered and represents the number of pins that the
Power and Ground Pins Power and Ground Pins user can use simultaneously per bank in the
HSTL/SSTL Supported on All I/Os HSTL/SSTL Restricted to Top and Bottom device. This limit increases for devices that
are architected to support a large number
of I/Os and contain distributed power and
Figure 1 – Examples of advanced FPGA package pinouts ground pins. These packages offer better
immunity against crosstalk.
This equation sets the first condition the memory controller clock domain. Examples of package pinouts from two
for the data capture timing budget; the This method consists of detecting transi- FPGA vendors are shown in Figure 1. The
memory valid data window must be larger tions on DQS signals and implementing dots represent power and ground pins, the
than the sampling window of the memory the appropriate delay on data bits to cen- crosses represent pins available for the user.
controller receiver, including setup and ter-align DQ signals with the memory These two devices have been used in an
hold times and all the receiver timing controller clock. This technique also has experiment emulating a 72-bit memory
uncertainties. Capturing data using a fixed the advantage of making the clock interface. The worst-case noise level on a
delay or a fixed phase shift across the entire domain transfers from the memory to the user pin is six times smaller in the package
data bus is no longer sufficient to register controller efficient and reliable. When with the distributed power and ground pins
data reliably. However, there are several the calibration is used for: using SSTL 1.8V – the standard for DDR2
other methods available. Among all the interfaces – with external terminations. For
different options, the “direct clocking” • The entire interface, the delay on data wide interfaces, crosstalk and data depend-
data capture method is a very efficient way bits is automatically adjusted inde- ent jitter can be major contributors to the
to register data bits and transfer them into pendently of the system parameters. timing budget. For wide interfaces,

March 2006 Memory Interfaces Solution Guide 19


crosstalk and data dependent jitter can be proportional to the capacitive load (as
major contributors to the timing budget shown in the following equation); once
Double Rank
and cause setup and hold time violations. Registered DIMM the driver is saturated, rising and falling
The data dependent jitter depends on the edges become slower:
V= 680 mV
transitions on the data bus (examples of T= 962 ps
possible data transitions: “Z-0-1-Z”, “1-1- 1
v(t) = . ∫ i(t).dt
1-1” or “0-1-0-1” transitions). For design C
using I/Os extensively, distributed power
Single Rank
and ground packages and careful PCB Single Unbuffered DIMM The result is a limitation of the maxi-
design are elements that add to the stability mum clock frequency that can be achieved
and robustness of the electronic system. V= 390 mV with a fixed configuration: there is an
T= 800 ps
instance when the edges are slow to the
The Challenge of Capacitive Loading on the Bus point of limiting the performance of the
When designing a large memory interface interface. This limitation is presented in
system, cost, density, throughput, and Single Rank Figure 4, which shows the experimental
Two Unbuffered DIMMs
latency are key factors in determining the analysis of performance limitation versus
choice of interface architecture. In order to V= 25 mV capacitive load on the bus.
achieve the desired results, one solution is T= 610 ps
to use multiple devices driven by a com-
mon bus for address and command signals.
This corresponds, for example, to the case Figure 2 – ADDRCMD signals eye openings
obtained with IBIS simulations of DIMM
of a dense unbuffered DIMM interface. interfaces with different loads on the address bus
One interface with two 72-bit unbuffered
DIMMs can have a load of up to 36
receivers on the address and command
buses, assuming that each single rank Figure 4 – Maximum possible clock rate based
DIMM has 18 components. The maximum on address bus loading in Xilinx FPGAs
with DDR2 SDRAM devices
load recommended by JEDEC standards
and encountered in common systems is two
unbuffered DIMMs. The resulting capaci-
tive loading on the bus is extremely large. It There are several ways to resolve the
causes these signals to have edges that take capacitive loading issue:
more than one clock period to rise and fall
• Duplicate the signals that have an
resulting in setup and hold violations at the
excessive load across the interface.
memory device. The eye diagrams obtained
Figure 3 – Falling edge timing of ADDRCMD For example, replicating address and
by IBIS simulations using different config-
signals obtained with IBIS simulation of DIMM command signals every 4 or 8 loads
urations: one registered DIMM, one interfaces with different loads on the address bus can be very efficient in ensuring high
unbuffered DIMM, and two single rank
quality signal integrity characteristics
unbuffered DIMMs, are shown in Figure 2.
on these signals.
The capacitive loads range from 2 for the edges on the same signal under the same
registered DIMM to 36 for the unbuffered loading conditions using IBIS simulations • In applications where adding one
DIMM. are shown in Figure 3. clock cycle of latency on the inter-
These eye diagrams clearly show the This simple test case illustrates that the face is applicable, using Registered
effect of loading on the address bus; the loading causes the edges to slow down sig- DIMMs can be a good option. These
registered DIMMs offer a wide open valid nificantly and the eye to close itself past a DIMMs use a register to buffer heavily
window on the ADDRCMD bus. The eye certain frequency. In systems where the load loaded signals like address and com-
opening for one DIMM appears to be still on the bus cannot be reduced, lowering the mand signals. In exchange for one
good at 267 MHz; however, with 32 loads, frequency of operation is one way to keep additional latency cycle in the address
the ADDRCMD valid window is col- the integrity of the signals acceptable. and command signals, these modules
lapsed, and the conventional implementa- Each load has a small capacitance that drastically reduce the load on control
tion is no longer sufficient to interface adds up on the bus. However, the driver and address signals by a factor of 4 to
reliably to the 2 unbuffered DIMMs. has a fixed or limited current drive 18, thereby helping with the capacitive
The timing characteristics of falling- strength. Because the voltage is inversely loading problem.

20 Memory Interfaces Solution Guide March 2006


• Use the design technique based on two and command signals to rise and meet the For example, in cases of high-speed
clock periods on address and command setup and hold time memory requirements. interfaces, reducing the operating fre-
signals. More details on this method The control signals, such as Chip Select sig- quency of the interface by 50 percent can
are presented in the following section. nals, that have a load limited to compo- save design time or enable meeting tim-
nents of one rank of a DIMM, are used to ing on a complex controller state
Using Two Clock Periods indicate the correct time for the memory to machine. This can be done using dedi-
The use of unbuffered DIMMs can be load address and command signals. The cated SERDES features in FPGAs, for
required: design technique has been successfully test- example. This design technique is very
ed and characterized in multiple systems. advantageous when a large burst of con-
• When latency is the preponderant per-
tiguous locations is accessed. Depending
formance factor. If the memory access-
es are short and at random locations in Reduce the BOM and Simplify PCB Layout: on the SERDES configuration, there is a
the memory array, adding one clock Use On-Chip Terminations possibility that one clock cycle of latency
One way to reduce the design complexity is inserted in the interface. Although the
cycle of latency will degrade the data
and bill-of-materials (BOM) for a board is size of the bus is multiplied by two, the
bus utilization and the overall perform-
to use on-chip terminations. The JEDEC logic design can leverage the inherent
ance of the interface decreases. In this
industry standard for DDR2 SDRAM has structure of parallelism in the FPGA
case, a slower interface with minimal
defined the on-die termination (ODT fea- device and take advantage of a slower
cycles of latency can be more efficient
ture). This feature provides an embedded switching rate to meet timing more easi-
than an interface running faster with
termination apparatus on the die of the ly and to consume less dynamic power.
one more clock cycle of latency.
device that eliminates the need for external The state machine of the controller runs
However, when the memory is accessed
PCB terminations on data, strobe, and data slower, allowing the use of a more com-
in several bursts of adjacent locations,
mask (DQ, DQS and DM) signals. plex controller logic that can increase
the faster clock rate compensates for
However, the other signals still require overall efficiency and optimize bus uti-
the initial latency, and the overall per-
external termination resistors on the PCB. lization. This makes the logic design eas-
formance increases.
For DDR2, memory vendors have also ier to place and route, and the end result
• When cost is sensitive and the addi- increased the quality of signal integrity of is a more flexible and robust system that
tional premium for using a registered the signals on DIMM modules compared to is less susceptible to change.
DIMM is not a possibility. the original DDR1 devices. For example,
• When hardware is already available but they match all flight times and loading on a Conclusion
has to support a deeper memory inter- given signal, reducing the effect of stubs and With rising clocking rates and shrinking
face or a faster clock rate. reflection on the transmission lines. But valid windows, parallel buses for memory
using ODT also requires running addition- interfaces are becoming more challenging
• When the number of pins for the
al signal integrity simulations, because the for designers. All the stages of the design
memory controller is fixed by an exist-
configuration of terminations and loading and implementation should be consid-
ing PCB or a feature set, and the addi-
on the bus changes based on the number of ered carefully to tune the interface
tional pinout for registered DIMMs is
populated sockets. Based on the JEDEC parameters so as to determine the optimal
not available.
matrix for write operations or read opera- settings. Signal integrity simulations and
In these cases, the design technique tions, the memory controller should be able simultaneous switching output checks are
using two clock periods to transmit signals to turn the ODT terminations on or off key factors in tailoring the interface. The
on heavily loaded buses can be utilized to depending on how the memory bus is feature-rich silicon resources in devices
resolve the capacitive loading on the loaded. JEDEC recommends in the termi- on which memory controllers are imple-
address and command bus effectively. nation reference matrix that in the case mented, such as process, voltage, and
However, the controller will be able to where both sockets are loaded with DIMMs temperature compensated delays; dedi-
present data only every two clock cycles to with 2 ranks each, only the front side of the cated routing and registers; and specific
the memory, reducing the efficiency of the DIMM on the second slot be ODT clocking resources can also help in main-
interface in certain applications. enabled. IBIS simulations are the safest way taining or improving the memory inter-
The principle for the two period clock- to determine which ODT terminations face performance targets.
ing on address and command buses con- need to be turned on.
sists of pre-launching command and
address signals (ADDRCMD) one clock Running the Interface Twice as Fast This article is reprinted with permission from
period in advance of loading data bits and as the Internal Memory Controller CMP. It originally ran on the Programmable
keeps these signals valid for two clock peri- Feature-rich IC devices can facilitate Logic DesignLine (www.pldesignline.com)
ods. This leaves more time for the address meeting timing for memory interfaces. on January 18, 2006.

March 2006 Memory Interfaces Solution Guide 21


How to Detect Potential Memory
Problems Early in FPGA Designs
System compatibility testing for FPGA memory requires
methods other than traditional signal integrity analysis.

by Larry French Memory Design, Testing, and Verification Tools • Phase 4 – Production
FAE Manager You can use many tools to simulate or
Micron Semiconductor Products, Inc. • Phase 5 – Post-Production (in the
debug a design. Table 1 lists the five essen-
[email protected] form of memory upgrades or field
tial tools for memory design. Note that this
replacements)
is not a complete list as it does not include
As a designer, you probably spend a signif- thermal simulation tools; instead, it focus-
icant amount of time simulating boards The Value of SI Testing
es only on those tools that you can use to
and building and testing prototypes. It is SI is not a panacea and should be used
validate the functionality and robustness of
critical that the kinds of tests performed on judiciously. SI should not be overused,
a design. Table 2 shows when these tools
these prototypes are effective in detecting although it frequently is. For very early or
can be used most effectively.
problems that can occur in production or alpha prototypes, SI is a key tool for
This article focuses on the five phases
in the field. ensuring that your system is free of a
of product development, as shown in
DRAM or other memory combined in number of memory problems, including:
Table 2:
an FPGA system may require different test • Ringing and overshoot/undershoot
methodologies than an FPGA alone. • Phase 1 – Design (no hardware,
Proper selection of memory design, test, only simulation) • Timing violations, such as:
and verification tools reduces engineering – Setup and hold time
• Phase 2 – Alpha (or Early) Prototype
time and increases the probability of
(design and hardware changes likely to – Slew rate (weakly driven or
detecting potential problems. In this arti-
occur before production) strongly driven signals)
cle, we’ll discuss the best practices for thor-
oughly debugging a Xilinx® FPGA design • Phase 3 – Beta Prototype (nearly – Setup/hold time (data, clock,
that uses memory. “production-ready” system) and controls)

22 Memory Interfaces Solution Guide March 2006


Tool Example Tool Design Alpha Proto Beta Proto Production Post-Prod

Electrical Simulations SPICE or IBIS Simulation – Electrical Essential Very Valuable Limited Value Rarely Used No Value
Behavioral Simulations Verilog or VHDL Simulation – Behavioral Essential Very Valuable Limited Value Rarely Used No Value
Signal Integrity Oscilloscope and probes; Signal Integrity Unavailable Critical Limited Value Rarely Used No Value
possibly mixed-mode to
Margin Testing Unavailable Essential Essential Essential Essential
allow for more accurate
signal capture Compatibility Unavailable Valuable Essential Essential Essential
Margin Testing Guardband testing and Table 2 – Tools for verifying memory functionality versus design phase
four-corner testing by
variation of voltage
and temperature • SI is time-consuming. Probing 64-bit
or 72-bit data buses and taking scope
Compatibility Testing Functional software shots requires a great deal of time.
testing or system
reboot test • SI uses costly equipment. To gather
accurate scope shots, you need high-
Table 1 – Memory design, test, cost oscilloscopes and probes.
and verification tools
• SI takes up valuable engineering
resources. High-level engineering
– Clock duty cycle and differential
analysis is required to evaluate scope
clock crossing (CK/CK#) Figure 1 – Typical signal integrity shot shots.
– Bus contention from an oscilloscope
• SI does not find all errors. Margin and
By contrast, SI is not useful in the beta thousand scope shots in our SI lab dur- compatibility testing find errors that are
prototype phase unless there are changes to ing memory qualification testing. Based not detectable by SI.
the board signals. (After all, each signal net on this extensive data, we concluded The best tests for finding FPGA/
is validated in the alpha prototype.) that system problems are most easily memory issues are margin and compati-
However, if a signal does change, you can found with margin and compatibility bility testing.
use SI to ensure that no SI problems exist testing. Although SI is useful in the
with the changed net(s). Rarely – if ever – is alpha prototype phase, it should be Margin Testing
there a need for SI testing in production. replaced by these other tests during beta Margin testing is used to evaluate how sys-
SI is commonly overused for testing prototype and production. tems work under extreme temperatures
because electrical engineers are comfort- Here are some other results of our and voltages. Many system parameters
able looking at an oscilloscope and using SI testing: change with temperature/voltage, includ-
the captures or photographs as documen-
• SI did not find a single issue that ing slew rate, drive strength, and access
tation to show that a system was tested
was not identified by memory or time. Validation of a system at room tem-
(Figure 1). Yet extensive experience at
system-level diagnostics. In other perature is not enough. Micron found that
Micron Technology shows that much
words, SI found the same failures as another benefit of margin testing is that it
more effective tools exist for catching fail-
the other tests, thus duplicating the detects system problems that SI will not.
ures. In fact, our experience shows that SI
capabilities of margin testing and Four-corner testing is a best industry
cannot detect all types of system failures.
software testing. practice for margin testing. If a failure is
Limitations of SI Testing
SI testing has a number of fundamental How Does the Logic Analyzer (or Mixed-Mode Analysis) Fit In?
limitations. First and foremost is the
You may have noticed that Table 1 does not include logic analyzers. Although it is rare
memory industry migration to fine-pitch
to find a debug lab that does not include this tool as an integral part of its design and
ball-grid array (FBGA) packages.
debug process, we will not discuss logic analyzers in this article. Because of the cost and
Without taking up valuable board real
time involved, they are rarely the first tool used to detect a failure or problem in a sys-
estate for probe pins, SI is difficult or tem. Logic analyzers are, however, invaluable in linking a problem, after it has been
impossible because there is no way to identified, to its root cause. Like signal integrity (SI), logic analyzers should be used
probe under the package. after a problem has been detected.
Micron has taken several hundred

March 2006 Memory Interfaces Solution Guide 23


...margin and compatibility testing will identify more marginalities or
problems within a system than traditional methods such as SI.
going to occur during margin testing, it ily be written to identify a bit error, er in-spec commands when entering and
will likely occur at one of these points: address, or row – in contrast to the stan- exiting self-refresh; otherwise, you could
• Corner #1: high voltage, high dard embedded program that might not lose data.
temperature identify any memory failures. This pro- Like power-up cycling, self-refresh
gram could be run during margin testing. cycling is a useful compatibility test. If an
• Corner #2: high voltage, low It would be especially interesting for intermittent self-refresh enter or exit
temperature embedded applications where the memo- problem is present, repeated cycling can
• Corner #3: low voltage, high ry interface runs a very limited set of help detect it. Applications that do not
temperature operations. Likely, this type of test would use self-refresh should completely skip
have more value than extensive SI testing this test.
• Corner #4: low voltage, low
of the final product.
temperature
Sustaining Qualifications
There is one caveat to this rule. During Tests Not To Ignore One last area to consider is the test
the alpha prototype, margin testing may The following tests, if ignored, can lead methodology for sustaining qualifica-
not be of value because the design is still to production and field problems that are tions. That is, what tests should you per-
changing and the margin will be improved subtle, hard to detect, and intermittent. form to qualify a memory device once a
in the beta prototype. Once the system is system is in production? This type of test-
nearly production-ready, you should per- Power-Up Cycling ing is frequently performed to ensure that
form extensive margin testing. A good memory test plan should include an adequate supply of components will be
several tests that are sometimes skipped available for uninterrupted production.
Compatibility Testing and can lead to production or field prob- During production a system is stable
Compatibility testing refers simply to the lems. The first of these is power-up and unchanging. Our experience has
software tests that are run on a system. cycling. During power-up, a number shown that margin and compatibility
These can include BIOS, system operat- of unique events occur, including the testing are the key tests for sustaining
ing software, end-user software, embed- ramp-up of voltages and the JEDEC- qualifications. Because a system is stable,
ded software, and test programs. PCs are standard DRAM initialization sequence. SI has little or no value.
extremely programmable; therefore, you Best industry practices for testing
should run many different types of soft- PCs include power-up cycling tests to Conclusion
ware tests. ensure that you catch intermittent In this article, our intent has been to
In embedded systems where the FPGA power-up issues. encourage designers to rethink the way
acts like a processor, compatibility testing Two types of power-up cycling exist: they test and validate FPGA and memo-
can also comprise a large number of tests. cold- and warm-boot cycling. A cold boot ry interfaces. Using smart test practices
In other embedded applications where the occurs when a system has not been run- can result in an immediate reduction in
DRAM has a dedicated purpose such as a ning and is at room temperature. A warm engineering hours during memory quali-
FIFO or buffer, software testing by defini- boot occurs after a system has been run- fications. In addition, proper use of mar-
tion is limited to the final application. ning for awhile and the internal tempera- gin and compatibility testing will
Thorough compatibility testing (along ture is stabilized. You should consider identify more marginalities or problems
with margin testing) is one of the best both tests to identify temperature- within a system than traditional methods
ways to detect system-level issues or fail- dependent problems. such as SI. No “one-size-fits-all” test
ures in all of these types of systems. methodology exists, so you should iden-
Given the programmable nature of Self-Refresh Testing tify the test methodology that is most
Xilinx FPGAs, you might even consider a DRAM cells leak charge and must be effective for your designs.
special FPGA memory test program. This refreshed often to ensure proper opera- For more detailed information on test-
program would only be used to run tion. Self-refresh is a key way to save sys- ing memory, see Micron’s latest
numerous test vectors (checkerboard, tem power when the memory is not used DesignLine article, “Understanding the
inversions) to and from the memory to for long periods of time. It is critical that Value of Signal Integrity,” on our website,
validate the DRAM interface. It could eas- the memory controller provide the prop- www.micron.com.

24 Memory Interfaces Solution Guide March 2006


Interfacing QDR II SRAM with Virtex-4 FPGAs
QDR II SRAM devices provide a suitable solution for memory requirements when partnered with Virtex-4 FPGAs.

by Veena Kondapalli The reference design uses the phase-shifted – Divide the speed of the interface by
Applications Engineer Staff outputs of the DCM to clock the interface using multiple devices to achieve a
Cypress Semiconductor Corp. on the transmit side. This configuration gives given bandwidth
[email protected] the best jitter and skew characteristics. • Read: valid window worst-case 440 ps
QDR II devices include the following fea-
The growing demand for higher perform- tures: • Write: valid window worst-case 460 ps
ance communications, networking, and • Address and control signal timing
• Maximum frequency of operations -
DSP necessitates higher performance mem- analysis: command window worst-
250 MHz - tested up to 278 MHz
ory devices to support such applications. case 2360 ps
Memory manufacturers like Cypress have • Available in QDR II architecture with
developed specialized memory products burst of 2 or 4 Conclusion
such as quad data rate II (QDR II) SRAM • Supports simultaneous reads/writes For more information about QDR II and
devices to optimize memory bandwidth for and back-to-back transactions without Virtex-4 devices, see Xilinx application note
a specific system architecture. In this article, bus contention issues XAPP703, “QDR II SRAM Interface for
I’ll provide a general outline of a QDR II Virtex-4 Devices,” at www.xilinx.com/bvdocs/
SRAM interface implemented in a Xilinx® • Supports multiple QDR II SRAM
appnotes/xapp703.pdf, as well as Cypress
Virtex™-4 XC4VP25 FF6688-11 device. devices on the same bus to:
application note “Interfacing QDR-II
Figure 1 shows a block diagram of the – Increase the density of the memory SRAM with Virtex-4 Devices” at
QDR II SRAM design interface, with the resource www.cypress.com.
physical interface to the actual memory
device on the controller.

QDR II SRAM qdrII_mem_ctrl1.v / .vhd


QDR II can perform two data write and d II t l2 / hd

two data reads per clock cycle. It uses one USER_CLK0 QDR_K K
USER_CLK270 QDR_K_n K
port for writing data and one port for read-
USER_RESET
ing data. These unidirectional ports sup- CLK_DIV4 QDR_SA
18
SA
USER_W_n
port simultaneous reads and writes and 4
USER_BW_n QDR_W_n W
allow back-to-back transactions without (SDR) 18 4 (SDR)
USER_AD_WR QDR_BW_n BW
the bus contention that may occur with a QDR_D
36 (DDR)
D
(SDR) 36 QDR II SRAM
USER_DWL
single bidirectional data bus. (SDR) 36
Device
USER_DWH QDR_CQ CQ
USER_WR_FULL NC CQ
Clocking Scheme
QDR_R_n R
The FPGA generates all of the clock and USER_R_n
36 (DDR)
(SDR) 18 QDR_Q Q
USER_AD_RD
control signals for reads and writes to mem-
USER_RD_FULL
ory. The memory clocks are typically gener- DOFF

ated using a double-data-rate (DDR) USER_QEN_n


RD_STB_n_out
C
(SDR) 36 C
register. A digital clock manager (DCM) 36
USER_QRL
RD_STB_n_in
(SDR) USER_QRH (Optional)
generates the clock and its inverted version.
USER_QR_EMPTY
This has two advantages. First, the data, con-
trol, and clock signals all go through similar DLY_CLK_200
L

delay elements while exiting the FPGA. DLY_CAL_DO


L NE

Second, the clock-duty cycle distortion is


minimal when global clock nets are used for
the clock and the 180° phase-shifted clock. Figure 1 - Top-level architecture block diagram

March 2006 Memory Interfaces Solution Guide 25


SPECIAL ADVERTISEMENT

SIGNAL INTEGRITY
Xilinx Low-Noise FPGAs Meet SI Challenges
Unique chip package Essential noise
control
and I/Os accelerate To address these issues,
careful printed circuit-
system development board (PCB) design and
layout are critical for con-
trolling system-level noise

T
he good news is plentiful. Advances in and crosstalk. Another
silicon technology are enabling higher important consideration is
system performance to satisfy the the electrical characteris-
requirements of networking, wireless, tics of the components
video, and other demanding applications. At mounted on the PCB. With
the same time, an abundance of I/O pins its Virtex™-4 FPGAs,
with faster data edge-rates enables higher Xilinx, Inc., uses special
interface speeds. chip I/O and packaging
Design Example: 1.5 volt LVCMOS 4mA, I/O, 100 aggressors shown
However, every advance creates new technologies that signifi-
design challenges. Wide buses with hun- cantly improve signal-integrity not only at the Seven times less crosstalk
dreds of simultaneously switching outputs chip level, but also at the system-level. Analysis by independent signal-integrity expert
(SSOs) create crosstalk. The undesirable "Xilinx preempts signal-integrity issues Dr. Howard Johnson verifies the ability of
effects of this electrical noise include jitter at the chip level," says Xilinx Senior Director SparseChevron technology to control SSO
that threatens the stability of high-band- of Systems and Applications Engineering noise/crosstalk at the chip level. “High signal
width interfaces. Sharp edge-rates further Andy DeBaets. "This reduces board develop- integrity demands a low-noise chip” states
exacerbate noise problems. ment and debug effort, and may even make Johnson. “Compared to competing 90nm
the difference between a successful and FPGAs, the Virtex-4 FPGAs in SparseChevron
scrapped board design." packaging demonstrate seven times less SSO

“High
In high-speed systems, a significant noise,” Johnson stresses
signal integrity demands source of crosstalk is inductive coupling with- Xilinx Virtex-4 devices include several
a low-noise chip.

in the PCB via field under the BGA package. other features to help designers improve
In systems with sub-optimal design, noise system-level signal integrity prior to board
from simultaneously switching outputs can layout. Programmable edge rates and drive
reach levels that severely degrade perform- strength minimize noise while meeting other
ance. In extreme cases, this noise can even design objectives. Xilinx Digitally Controlled
lead to system failure. A properly designed Impedance (DCI) technology enables designers
BGA package minimizes inductive coupling to implement on-chip line termination for
between I/Os by placing a power/ground pin single-ended and differential I/Os. By eliminat-
Howard Johnson pair next to every signal pin. ing the need for external termination resistors,
The world’s foremost “Xilinx’s innovative SparseChevron™ DCI technology enables designers to minimize
authority on signal package design minimizes crosstalk prob- system component count and significantly
integrity lems that can degrade system performance. simplify board layout and manufacturing.
This is particularly important for wide, To learn more about how Virtex-4
high-speed DDR2 SDRAM or QDR II FPGAs can help control noise in your system,
SRAM memory designs,” DeBaets says. visit www.xilinx.com/virtex4.

For more information on signal integrity, go to


www.xilinx.com/platformsi
by Adrian Cosoroaba

Memory Interfaces Marketing Manager


Xilinx, Inc.
[email protected]

Reference Designs
Memory interfaces are source-synchronous inter-
faces in which the clock/strobe and data being
transmitted from a memory device are edge-
aligned. Most memory interface and controller
vendors leave the read data capture implementa-
tion as an exercise for the user. In fact, the read

Give your designs the Virtex-4 FPGA advantage. data capture implementation in FPGAs is the
most challenging portion of the design. Xilinx
provides multiple read data capture techniques
for different memory technologies and perform-
ance requirements. All of these techniques are
implemented and verified in Xilinx® FPGAs.
The following sections provide a brief overview
of prevalent memory technologies.

Double Data Rate Synchronous Dynamic


Random Access Memory (DDR SDRAM)
Key features of DDR SDRAM memories
include:

• Source-synchronous read and write interfaces


using the SSTL-2.5V Class I/II I/O standard

• Data available both on the positive and neg-


ative edges of the strobe

• Bi-directional, non-free-running, single-


ended strobes that are output edge-aligned
with read data and must be input center-
aligned with write data

• One strobe per 4 or 8 data bits

• Data bus widths varying between 8, 16, and


32 for components and 32, 64, and 72 for
DIMMs

• Supports reads and writes with burst lengths


of two, four, or eight data words, where each
data word is equal to the data bus width

• Read latency of 2, 2.5, or 3 clock cycles,


with frequencies of 100 MHz, 133 MHz,
166 MHz, and 200 MHz

• Row activation required before accessing col-


umn addresses in an inactive row

• Refresh cycles required every 15.6 μs

• Initialization sequence required after power


on and before normal operation

March 2006 Memory Interfaces Solution Guide 27


Double Data Rate Synchronous Dynamic Memory Technology
Supported FPGAs
Maximum Maximum
XAPP Title Data Capture Scheme
XAPP Number
and I/O Standard Performance Data Width
Random Access Memory (DDR 2 SDRAM)
Key features of DDR 2 SDRAM memories, the XAPP721 High Performance DDR 2 Read data is captured in the
SDRAM Interface Data delayed DQS domain and
second-generation DDR SDRAMs, include: Capture Using ISERDES transferred to the FPGA clock
DDR 2 SDRAM
Virtex-4 333 MHz 8 bits and OSERDES domain within the ISERDES.
• Source-synchronous read and write inter- SSTL-1.8V
(Components)
Class II
faces using the SSTL-1.8V Class I/II I/O XAPP723 DDR2 Controller
(267 MHz and Above)
standard Using Virtex-4 Devices

• Data available both on the positive and XAPP702 DDR 2 SDRAM Controller Read data delayed such that
negative edges of the strobe 16 bits Using Virtex-4 Devices FPGA clock is centered in
DDR 2 SDRAM (Components) data window.
• Bi-directional, non-free-running, differ- SSTL-1.8V Virtex-4 267 MHz XAPP701 Memory Interfaces Data
144-bit
Class II Capture Using Direct Memory read strobe used
ential strobes that are output edge-aligned Registered DIMM
Clocking Technique to determine amount
with read data and must be input center- of read data delay.

aligned with write data Read data delayed such that


16 bits FPGA clock is centered in
• One differential strobe pair per 4 or 8 DDR SDRAM DDR SDRAM Controller
(Components) XAPP709 data window.
data bits SSTL-2.5V Virtex-4 200 MHz Using Virtex-4 Devices
144-bit
Class I/II Memory read strobe used
Registered DIMM
• Data bus widths varying between 4, 8, to determine amount
of read data delay.
and 16 for components and 64 and 72
for DIMMs Read data delayed such
that FPGA clock is centered
• Supports reads and writes with burst QDR II SRAM 72 bits XAPP703 QDR II SRAM Interface
in data window.
Virtex-4 300 MHz
lengths of four or eight data words, where HSTL-1.8V (Components)
Memory read strobe used to
each data word is equal to the data bus determine amount of read
data delay.
width
Read data delayed such that
• Read latency is a minimum of three clock FPGA clock is centered in
XAPP710 Synthesizable CIO DDR data window.
cycles, with frequencies ranging from 200 RLDRAM II
Virtex-4
36 bits
RLDRAM II Controller for
300 MHz
HSTL-1.8V (Components) Memory read strobe used to
MHz to 400 MHz Virtex-4 FPGAs
determine amount of read
• Row activation required before accessing data delay.

column addresses in an inactive row


Table 1 – Virtex-4 memory interface application notes (XAPPs) currently available,
• Refresh cycles required every 7.8 μs with a brief description of the read data capture technique
• Initialization sequence required after
power on and before normal operation Number of
XAPP Number
Number of Number of Interfaces with Device(s) Used for Requirements
Memory Technology Performance
DCMs/DLLs BUFGs Listed DCMs and Hardware Verification
and I/O Standard
Quad Data Rate Synchronous BUFGs
Random Access Memory (QDR II SRAM) XAPP721
Key features of QDR II SRAM memories, the XAPP723 1 DCM Multiple at Same
333 MHz 6 XC4VLX25 –11 FF668 All Banks Supported
DDR2 SDRAM 2 PMCDs Frequency
second-generation QDR I SRAMs, include: SSTL-1.8V Class II

• Source-synchronous read and write inter- XAPP702


XAPP701 Multiple at Same
faces using the HSTL-1.8V I/O standard DDR2 SDRAM
267 MHz 1 6
Frequency
XC4VLX25 –11 FF668 All Banks Supported
SSTL-1.8V Class II
• Data available both on the positive and
negative edges of the strobe XAPP709
Multiple at Same
DDR SDRAM 200 MHz 1 6 XC4VLX25 –11 FF668 All Banks Supported
Frequency
• Uni-directional, free-running, differential SSTL-2.5V Class I/II

data/echo clocks that are edge-aligned XAPP703


Multiple at Same
QDR II SRAM 300 MHz 1 3 XC4VLX25 –11 FF668 All Banks Supported
with read data and center-aligned with Frequency
HSTL-1.8V
write data
XAPP710
Multiple at Same
• One differential strobe pair per 8, 9, 18, RLDRAM II 300 MHz 1 5
Frequency
XC4VLX25 –11 FF668 All Banks Supported
36, or 72 data bits HSTL-1.8V

• Data bus widths varying between 8, 9,


18, 36, and 72 for components (no QDR Table 2 – Resource utilization for all Virtex-4 memory interface
application notes currently available
II SDRAM DIMMs available)

28 Memory Interfaces Solution Guide March 2006


• Reads and writes with burst lengths of
two or four data words, where each data
word is equal to the data bus width
• Read latency is 1.5 clock cycles, with fre-
Get Published
quencies from 154 MHz to 300 MHz
• No row activation, refresh cycles, or
initialization sequence after power on
required, resulting in more efficient
memory bandwidth utilization

Reduced Latency Dynamic Random


Access Memory (RLDRAM II)
Key features of RLDRAM II memories
include:
• Source-synchronous read and write inter-
faces using the HSTL-1.8V I/O standard
• Data available both on the positive and
negative edges of the strobe
• Uni-directional, free-running, differential
memory clocks that are edge-aligned Would you like to write
with read data and center-aligned with
write data for Xcell Publications?
• One strobe per 9 or 18 data bits
• Data bus widths varying between 9, 18, It’s easier than you think!
and 36 for components and no DIMMs
• Supports reads and writes with burst Submit an article draft for our Web-based or printed
lengths of two, four, or eight data words, publications and we will assign an editor and a
where each data word is equal to the data
graphic artist to work with you to make
bus width
your work look as good as possible.
• Read latency of five or six clock cycles,
with frequencies of 200 MHz, 300 For more information on this exciting and highly
MHz, and 400 MHz
rewarding program, please contact:
• Data-valid signal provided by memory
device
• No row activation required; row and col-
umn can be addressed together
Forrest Couch
• Refresh cycles required every 3.9 μs
Publisher, Xcell Publications
[email protected]
• Initialization sequence required after
power on and before normal operation

Conclusion
For application notes on various memory
technologies and performance require-
ments, visit www.xilinx.com/memory. The
See all the new publications on our website.
summaries in Table 1 and Table 2 can help
you determine which application note is www.xilinx.com/xcell
relevant for a particular design.

March 2006 Memory Interfaces Solution Guide 29


Application Note: Spartan-3

R DDR2 SDRAM Memory Interface for


Spartan-3 FPGAs
XAPP454 (v1.0) December 6, 2004 Author: Karthikeyan Palanisamy

Summary This application note describes a DDR2 SDRAM memory interface implementation in a
Spartan-3 device, interfacing with a Micron DDR2 SDRAM device. This document provides a
brief overview of the DDR2 SDRAM device features, followed by a detailed explanation of the
DDR2 SDRAM memory interface implementation.

DDR2 SDRAM DDR2 SDRAM devices are the next generation DDR SDRAM devices. The DDR2 SDRAM
Device memory interface is source-synchronous and supports double-data rate like DDR SDRAM
memory. DDR2 SDRAM devices use the SSTL 1.8V I/O standard.
Overview
DDR2 SDRAM devices use a DDR SDRAM architecture to achieve high-speed operation. The
memory operates using a differential clock provided by the controller. (The reference design on
the web does not support differential strobes. Support for this is planned to be added later.)
Commands are registered at every positive edge of the clock. A bi-directional data strobe
(DQS) is transmitted along with the data for use in data capture at the receiver. DQS is a strobe
transmitted by the DDR2 SDRAM device during reads, and by the controller during writes. DQS
is edge-aligned with data for reads, and center-aligned with data for writes.
Read and write accesses to the DDR2 SDRAM device are burst oriented. Accesses begin with
the registration of an active command and are then followed by a read or a write command. The
address bits registered coincident with the active command are used to select the bank and
row to be accessed. The address bits registered with the read or write command are used to
select the bank and starting column location for the burst access.

Interface Model The DDR2 SDRAM memory interface is layered to simplify the design and make the design
modular. Figure 1 shows the layered memory interface. The three layers consist of an
application layer, an implementation layer, and a physical layer.

© 2004 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and further disclaimers are as listed at http://www.xilinx.com/legal.htm. All other
trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice.
NOTICE OF DISCLAIMER: Xilinx is providing this design, code, or information "as is." By providing the design, code, or information as one possible implementation of this
feature, application, or standard, Xilinx makes no representation that this implementation is free from any claims of infringement. You are responsible for obtaining any rights you
may require for your implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of the implementation, including but not limited to any
warranties or representations that this implementation is free from claims of infringement and any implied warranties of merchantability or fitness for a particular purpose.

30 Memory Interfaces Solution Guide March 2006


R

DDR2 SDRAM Controller Modules

User Interface

Implementation Layer

Infrastructure Data Path Control

Physical Layer

xapp549_02_113004

Figure 1: Interface Layering Model

DDR2 SDRAM Figure 2 is a block diagram of the Spartan-3 DDR2 SDRAM memory interface. All four blocks
Controller shown in this figure are sub-blocks of the ddr2_top module. The function of each block is
explained in the following sections.
Modules

user_clk
Infrastructure

DDR2_IF
user_data IOBS
Data Path

Command & Address


Controller

xapp549_03_113004

Figure 2: DDR2 SDRAM Memory Interface Modules

Controller
The controller’s design is based on the design shown in XAPP253, Synthesizable 400 Mb/s
DDR SDRAM Controller, but is modified to incorporate changes for the DDR2 SDRAM
memory. It supports a burst length of four, and CAS latencies of three and four. The design is
modified to implement the write latency feature of the DDR2 SDRAM memory. The controller
initializes the EMR(2) and EMR(3) registers during the Load Mode command and also
generates differential data strobes.

March 2006 Memory Interfaces Solution Guide 31


R

DDR2 SDRAM Controller Modules

The controller accepts user commands, decodes these user commands, and generates read,
write, and refresh commands to the DDR2 SDRAM memory. The controller also generates
signals for other modules. Refer to XAPP253 for detailed design and timing analyses of the
controller module.

Data Path
The data path module is responsible for transmitting data to and receiving data from the
memories. Major functions include:
• Writing data to the memory
• Reading data from the memory
• Transferring the read data from the memory clock domain to the FPGA clock domain
For a description of data write and data read capture techniques, see XAPP768c, Interfacing
Spartan-3 Devices With 166 MHz or 333 Mb/s DDR SDRAM Memories. The write data and
strobe are clocked out of the FPGA. The strobe is center-aligned with respect to the data during
writes. For DDR2 SDRAM memories, the strobe is non-free running. To meet the requirements
specified above, the write data is clocked out using a clock that is shifted 90° and 270° from the
primary clock going to the memory. The data strobes are generated out of primary clocks going
to the memory.
Memory read data is edge-aligned with a source-synchronous clock. The DDR2 SDRAM clock
is a non-free running strobe. The data is received using the non-free running strobe and
transferred to the FPGA clock domain. The input side of the data uses resources similar to the
input side of the strobe. This ensures matched delays on data and strobe signals until the
strobe is delayed in the strobe delay circuit.

Infrastructure
The Infrastructure module generates the FPGA clocks and reset signals. A Digital Clock
Manager (DCM) is used to generate the clock and its inverted version. A delay calibration
circuit is also implemented in this module.
The delay calibration circuit is used to select the number of delay elements used to delay the
strobe lines with respect to read data. The delay calibration circuit calculates the delay of a
circuit that is identical in all respects to the strobe delay circuit. All aspects of the delay are
considered for calibration, including all the component and route delays. The calibration circuit
selects the number of delay elements for any given time. After the calibration is done, it asserts
the select lines for the delay circuit. Refer to XAPP768c for details about delay calibration.

IOBS
All FPGA input and output signals are implemented in the IOBS module. All address and
control signals are registered going into and coming out from the IOBS module.

32 Memory Interfaces Solution Guide March 2006


User Interface Signals
R

User Interface Table 1 shows user interface signal descriptions; all signal directions are with respect to the
Signals DDR2 SDRAM controller.

Table 1: User Interface Signals


Signal Name Direction Description

dip1 Input Clock enable signal for DDR2 SDRAM (active low)

This signal enables the dqs_div flop during DDR2 SDRAM memory
rst_dqs_div_in Input
read.

reset_in Input System reset

Write Data for DDR2 SDRAM, where 'n' is the width of the memory
user_input_data[(2n-1):0] Input
interface

user_input_address[addwidth:0] Input DDR2 SDRAM row and column address

user_bank_address[bankaddwidth:0] Input DDR2 SDRAM bank address

user_config_reg1[14:0] Input DDR2 SDRAM configuration data register1

user_config_reg2[12:0] Input DDR2 SDRAM configuration data register2

user_command_reg[3:0] Input User command register for DDR2 SDRAM controller

burst_done Input Burst data transfer done signal

This signal is externally connected to rst_dqs_div_in. This signal


rst_dqs_div_out Output
enables the dqs_div flop.

user_output_data[(2n-1):0] Output Read data from DDR2 SDRAM

This active Low signal indicates that read data from DDR2 SDRAM
user_data_valid Output
memory is valid.

user_cmd_ack Output Acknowledge signal for user_command

user_ODT_ack Output Acknowledge signal for ODT command

init_val Output Indicates DDR2 SDRAM is initialized

ar_done Output Indicates auto-refresh command is given to DDR2 SDRAM

clk_int Input Clock generated by DDR2 SDRAM controller

90 degrees phase-shifted clock generated by DDR2 SDRAM


clk90_int Input
controller

sys_rst Input This is generated with system reset input

sys_rst90 Input 90 degrees phase-shifted Reset generated with system reset input

sys_rst180 Input 180 degrees phase-shifted Reset generated with system reset input.

sys_rst270 Input 270 degrees phase-shifted Reset generated with system reset input.

Notes:
1. All signal directions are with respect to DDR2 SDRAM controller.

March 2006 Memory Interfaces Solution Guide 33


R

User Interface Signals

Signal Descriptions
user_input_data[(2n-1):0]
This is the write data to DDR2 SDRAM from the user interface. The data is valid on a DDR2
SDRAM write command, where n is the width of the DDR2 SDRAM memory. The DDR2
SDRAM controller converts single data rate to double data rate on the physical layer side.

user_input_address[addwidth:0]
This is the sum of row and column address for DDR2 SDRAM writes and reads. Depending on
address width variable selection, user_input_address is divided into row and column address bits.

user_bank_address[bankaddwidth:0]
Bank address for DDR2 SDRAM. There is a variable through which the bank address is selectable.

user_config_reg1[14:0]
Configuration data for DDR2 SDRAM memory initialization. The contents of this register are
loaded into the mode register during a Load Mode command. The format for user_config_reg1
is as follows:

14 13 11 10 9 7 6 4 3 2 0

PD WR TM Res Cas_latency BT Burst_length

Burst_length[2:0]
The controller supports only a burst length of four.

BT
This bit selects the burst type. The controller supports only sequential bursts. This bit is always
set to zero in the controller.

Cas_latency [6:4]
Bits 6:4 select the cas latency. The DDR2 SDRAM controller supports a cas latency of 3 and 4.

Res [9:7]
Bits 9:7 are reserved for future implementation.

TM
This bit is loaded into the TM bit of the Load Mode Register.

WR [13:11]
These three bits are written to WR (write recovery) bits of the Load Mode register.

PD
This bit is written to PD (Power Down Mode) bit of the Load Mode register.
Refer to the Micron DDR2 SDRAM data sheets for details on the Load Mode register.

user_config_reg2[12:0]
DDR2 SDRAM configuration data for the Extended Mode Register. The format of
user_config_reg2 is as follows.

12 11 10 9 7 6 4 3 2 1 0

OUT RDQS DQS OCD Posted CAS RTT ODS Res

Refer to the Micron DDR2 SDRAM data sheets for details on the Extended Mode register.

34 Memory Interfaces Solution Guide March 2006


R

User Interface Signals

user_command_reg[3:0]
This is the user command register. Various commands are passed to the DDR2 SDRAM
module through this register. Table 2 illustrates the various supported commands.

Table 2: User Commands


user_command_reg[3:0] User Command Description
0000 NOP
0010 Memory (DDR2 SDRAM) initialization
0011 Auto-refresh
0100 Write
0101 Load Mode (Only Load mode)
0110 Read
Others Reserved

burst_done
Users should enable this signal, for two clock periods, at the end of the data transfer. The
DDR2 SDRAM controller supports write burst or read burst for a single row. Users must
terminate on a column boundary and reinitialize on a column boundary for the next row of
transactions. The controller terminates a write burst or read burst by issuing a pre-charge
command to DDR2 SDRAM memory.

user_output_data[(2n-1):0]
This is the read data from DDR2 SDRAM memory. The DDR2 SDRAM controller converts DDR
SDRAM data from DDR2 SDRAM memory to SDR data. As the DDR SDRAM data is converted
to SDR data, the width of this bus is 2n, where n is data width of DDR2 SDRAM memory.

user_data_valid
The user_output_data[(2n-1):0] signal is valid on assertion of this signal.

user_cmd_ack
This is the acknowledgement signal for a user read or write command. It is asserted by the
DDR2 SDRAM controller during a read or write to DDR2 SDRAM. No new command should be
given to the controller until this signal is deasserted.

init_val
The DDR2 SDRAM controller asserts this signal after completing DDR2 SDRAM initialization.

ar_done
The DDR2 SDRAM controller asserts this signal for one clock cycle after the auto-refresh
command is given to DDR2 SDRAM.
Note: The output clock and reset signals can be used for data synchronization.
Table 3 shows memory interface signals.

March 2006 Memory Interfaces Solution Guide 35


Initializing DDR2 SDRAM Memory
R

Table 3: Memory Interface Signals


Signal Name Direction Description
ddr_dq[(datawidth –1):0] Inout Bidirectional DDR2 SDRAM memory data
Bidirectional DDR2 SDRAM memory data strobe
ddr_dqs[(dqswidth-1):0] Inout signals. The number of strobe signals depends on
the data width and strobe to data ratio.
ddr_cke Output Clock enable signal for DDR2 SDRAM memory
ddr_csb Output Active low memory chip select signal
ddr_rasb Output Active low memory row address strobe
ddr_casb Output Actie low memory column address strobe
ddr_web Output Active low memory write enable signal
ddr_dm Output Memory data mask signal
ddr_ba Output Memory bank address
ddr_address Output Memory address (both row and column address)
ddr2_clk* Output Memory differential clock signals
ddr_odt[4:0] Output Memory on-die termination signals.

Initializing Before issuing the memory read and write commands, the DDR2 SDRAM memory must be
DDR2 SDRAM initialized using the memory initialization command. The data written in the Mode Register and
in the Extended Mode Register should be placed on user_config_reg1 [14:0] and
Memory user_config_reg2 [12:0] until DDR2 SDRAM initialization is completed. Once the DDR2
SDRAM is initialized, the init_val signal is asserted by the DDR2 SDRAM controller. Figure 3
shows a timing diagram of the memory initialization command.

sys_clk

sys_clkb

user_config_reg1(14:0) Configuration Data


1 3
user_config_reg2(12:0) Configuration Data
2
user_command_reg Init Cmd
4

init_val
xapp549_09_120804

Figure 3: DDR2 SDRAM Memory Initialization

1. Two clocks prior to placing the initialization command on command_reg [2:0], the user
places valid configuration data on user_config_reg1[14:0] and user_config_reg2[12:0].
2. The user places the initialization command on command_reg [2:0]. This starts the
initialization sequence.
3. Data on user_config_reg1[14:0] and user_config_reg2[12:0] should not be changed for any
subsequent memory operations.
4. The controller indicates that the configuration is complete by asserting the init_val signal.

36 Memory Interfaces Solution Guide March 2006


DDR2 SDRAM Memory Write
R

DDR2 SDRAM Figure 4 shows a DDR2 SDRAM memory write timing diagram for a burst length of four. The
Memory Write waveform shows two successive bursts. Memory write is preceded by a write command to the
DDR2 SDRAM controller. In response to the write command, the DDR2 SDRAM controller
acknowledges with a user_cmd_ack signal on the rising edge of SYS_CLKb. Users should wait
for a user command acknowledged signal before proceeding to the next step.
Two and a half clock cycles after the user_cmd_ack signal assertion, the memory burst
address is placed on user_input_address[addwidth:0] lines. The user_input_address should be
asserted on the rising edge of SYS_CLK. The data to be written into memory should be
asserted with clk90_int_val and should be given to the controller before placing the memory
address on user_input_address. The user data width is twice that of the memory data width.
The controller converts it into double data rate before it is passed to memory.
For a burst length of four, two user_input_data[(2n-1):0] pieces of data are given to the DDR2
SDRAM controller with each user address. To terminate the write burst, burst_done is asserted
on the rising edge of SYS_CLK for two clocks. The burst_done signal should be asserted for
two clocks with the last memory address. Any further commands to the DDR2 SDRAM
controller should be given only after the user_cmd_ack signal is deasserted.

sys_clk

sys_clkb

clk90_int_val
1
user_command_reg[3:0] Write Command

2 6
user_cmd_ack
2.5 clks 4
user_input_address[21:0] Addr 1 Addr 2
3
user_input_data[(2n-1):0] Data 1 Data 2 Data 3 Data 4

5
burst_done
xapp549_05_120604

Figure 4: DDR2 SDRAM Memory Write Burst Length of Four

1. The user initiates a memory write by issuing a write command to the DDR2 SDRAM
controller. The write command must be asserted on the rising edge of the SYS_CLK.
2. The DDR2 SDRAM controller acknowledges the write command by asserting the
user_cmd_ack signal on the rising edge of the SYS_CLKb.
3. The user should place the data to be written into the memory onto the user_input_data pins
before placing the memory address on the user_input_address. The input data is asserted
with the clk90_int_val signal.
4. Two and half clocks after the user_cmd_ack signal assertion, the user should place the
memory address on user_input address [21:0]. The user_input_address signal should be
asserted on the rising edge of the SYS_CLK.
5. To terminate write burst, the user should assert the burst_done signal for two clocks with
the last user_input_address.
6. Any further commands to the DDR2 SDRAM controller should be given only after the
user_cmd_ack signal is de-asserted.

March 2006 Memory Interfaces Solution Guide 37


R

DDR2 SDRAM Memory Read

DDR2 SDRAM Figure 5 shows a memory read timing diagram for two successive bursts with a burst length of
Memory Read four. The user initiates a memory read by sending a read command to the DDR2 SDRAM
controller.
The read command flow is similar to the write command. A read command is asserted on the
rising edge of SYS_CLK. The DDR2 SDRAM controller asserts the user_cmd_ack signal in
response to the read command on the rising edge of SYS_CLKb. After two and half clock
cycles of user_cmd_ack, the memory burst read address is placed on
user_input_address[addwidth:0]. The user_input_address signal is asserted on the rising edge
of SYS_CLK.
The data read from the DDR2 SDRAM memory is available on user_output_data, which is
asserted with clk90_int_val. The data on user_output_data is valid only when user_data_valid
signal is asserted. As the DDR SDRAM data is converted to SDR data, the width of this bus is
2n, where n is the data width of the DDR2 SDRAM memory. For a read burst length of four, the
DDR2 SDRAM controller outputs only two data with each user address, each of 2n width of
DDR2 SDRAM memory. To terminate the read burst, a burst_done signal is asserted for two
clock cycles on the rising edge of SYS_CLK. The burst_done signal should be asserted after
the last memory address. Any further commands to the DDR2 SDRAM controller should be
given after user_cmd_ack signal deassertion.

sys_clk

sys_clkb

clk90_int_val
1
user_command_reg[3:0] Read Command
2 7
user_cmd_ack
2.5 clks 3

user_input_address[21:0] Address 1 Address 2

burst_done 6
4
user_valid_data
5
user_output_data[(2n-1):0] Data 1 Data 2 Data 3 Data 4

XAPP549_07_120604

Figure 5: DDR2 SDRAM Memory Read Burst Length of Four

The read command flow is similar to the write command flow:


1. The user inputs the read command. It is accepted on the rising edge of the SYS_CLK.
2. The DDR2 SDRAM controller asserts the user_cmd_ack signal on the rising edge of the
SYS_CLKb in response to the read command.
3. Two and half clocks after user_cmd_ack, the user places the memory read address on
user_input_address [21:0]. The user_input_address signal is then accepted on the rising
edge of SYS_CLK.
4. The data on user_output_data is valid only when the user_data_valid signal is asserted.

38 Memory Interfaces Solution Guide March 2006


R

DDR2 SDRAM Memory Auto_Refresh

5. The data read from the DDR2 SDRAM memory is available on user_output_data. The
user_output_data is asserted with clk90_int_val. Since the DDR SDRAM data is converted
to SDR data, the width of this bus is 2n, where n is the data width of the DDR2 SDRAM
memories. For a read burst length of four, with each user address the DDR2 SDRAM
controller outputs only two data words.
6. To terminate the read burst, the burst_done signal is asserted for two clocks on the rising
edge of SYS_CLK. The burst_done signal should be asserted with the last memory
address.
7. Any further commands to the DDR2 SDRAM controller should be given after the
user_cmd_ack signal is de-asserted.

DDR2 SDRAM The DDR2 SDRAM controller does not support memory refresh on its own and must
Memory periodically be provided with an auto_refresh command. The auto_refresh command is
asserted with SYS_CLK. The ar_done signal is asserted by the DDR2 SDRAM controller upon
Auto_Refresh completion of the auto_refresh command. The ar_done signal is asserted with SYS_CLKb.

Physical Layer The physical layer for DDR2 SDRAM is similar to the DDR SDRAM physical layer described in
and Delay application note XAPP768c. The delay calibration technique described in XAPP768c is also
used in the DDR2 SDRAM interface.
Calibration
Timing Calculations
Write Timing
Table 4: Write Data
Value Leading Edge Trailing Edge
Parameter Meaning
(ps) Uncertainities Uncertainities

Tclock 6000 Clock period

Tclock_phase 3000 Clock phase

Tdcd 250 Duty cycle distortion of clock to memory

Tdata_period 2750 Total data period, Tclock_phase-Tdcd

Minimal skew, since the right/left sides are used and the
Tclock_skew 50 50 50
bits are close together

Skew due to package pins and board layout (This can be


Tpackage_skew 90 90 90
reduced further with tighter layout.)

Tsetup 350 350 0 Setup time from memory data sheet

Thold 350 0 350 Hold time from memory data sheet

Tphase_offset_error 140 140 140 Offset error between different clocks from the same DCM

The same DCM is used to generate the clock and data;


Tjitter 0 0 0
hence, they jitter together.

Worst case for leading and trailing can never happen


Total uncertainties 980 630 630
simultaneously.

Window 1490 630 2120 Total worst-case window is 1490ps.

March 2006 Memory Interfaces Solution Guide 39


R

Physical Layer and Delay Calibration

Read Timing
Table 5: Read Data
Value Leading Edge Trailing Edge
Parameter Meaning
(ps) Uncertainities Uncertainities
Tclock 6000 Clock period
Tclock_phase 3000 Clock phase
Tclock_duty_cycle_dist 300 0 0 Duty cycle distortion of clock to memory
Tdata_period 2700 Total data period, Tclock_phase-Tdcd
Tdqsq 350 350 0 Strobe to data distortion from memory data sheet
Tpackage_skew 90 90 90 Worst-case package skew
Tds 452 452 0 Setup time from Spartan-3 –5 data sheet
Tdh -35 0 -35 Hold time from Spartan-3 –5 data sheet
Data and Strobe jitter together, since they are
Tjitter 100 0 0
generated off of the same clock.
Tlocal_clock_line 20 20 20 Worst-case local clock line skew
Tpcb_layout_skew 50 50 50 Skew between data lines and strobes on the board
Tqhs 450 0 450 Hold skew factor for DQ from memory data sheet
Worst-case for leading and trailing can never happen
Total uncertainties 962 575
simultaneously.
Window for DQS position
1163 962 2125 Worst-case window of 1163 ps.
for normal case

Notes:
1. Reference for Tdqsq and Tqhs are from Micron data sheet for MT47H64M4FT-37E, Rev C, 05/04 EN.
2. Reference for Spartan-3 timing is –5 devices, Speeds file version 1.33.

Address and Command Timing

Table 6: Address and Command Data


Value Leading Edge Trailing Edge
Parameter Meaning
(ps) Uncertainties Uncertainties
Tclock 6000 Clock period
Minimal skew, since right/left sides are used and the bits are
Tclock_skew 50 50 50
close together
Tpackage_skew 90 90 65 Using same bank reduces the package skew
Tsetup 500 500 0 Setup time from memory data sheet
Thold 500 0 500 Hold time from memory data sheet
Tphase_offset_error 140 140 140 Offset between different phases of the clock
Tduty_cycle_distortion 0 0 0 Duty cycle distortion does not apply
Since the clock and address are generated using the same
Tjitter 0 0 0 clock, the same jitter exists in both; hence, it does not need
to be included.
Total uncertainties 780 755
Command window 3025 2220 5245 Worst-case window of 3025 ps

40 Memory Interfaces Solution Guide March 2006


References
R

References Xilinx Application Notes:


• XAPP253, “Synthesizable 400 Mb/s DDR SDRAM Controller”
• XAPP768c, “Interfacing Spartan-3 Devices With 166 MHz or 333 Mb/s DDR SDRAM
Memories” (available under click license)
Xilinx Reference Designs:
• http://www.xilinx.com/bvdocs/appnotes/xapp253.zip
• http://www.xilinx.com/memory
Micron Data Sheet MT47H16M16FG-37E, available online at:
http://www.micron.com/products/dram/ddr2sdram/partlist.aspx?density=256Mb

Conclusion It is possible to implement a high-performance DDR2 SDRAM memory interface for Spartan-3
FPGAs. This design has been simulated, synthesized (with Synplicity), and taken through the
Xilinx Project Navigator flow.

Revision The following table shows the revision history for this document.
History
Date Version Revision
12/06/04 1.0 Initial Xilinx release.

March 2006 Memory Interfaces Solution Guide 41


Application Note: Virtex-4 FPGAs

R DDR2 Controller (267 MHz and Above)


Using Virtex-4 Devices
XAPP723 (v1.3) February 8, 2006 Author: Karthi Palanisamy

Summary DDR2 SDRAM devices offer new features that go beyond the DDR SDRAM specification and
enable the DDR2 device to operate at data rates of 666 Mb/s. High data rates require higher
performance from the controller and the I/Os in the FPGA. To achieve the desired bandwidth,
it is essential for the controller to operate synchronously with the operating speed of the
memory.

Introduction This application note describes a 267 MHz and above DDR2 controller implementation in a
Virtex™-4 device interfacing to a Micron DDR2 SDRAM device. For performance levels of
267 MHz and above, the controller design outlined in this application note should be used
along with the read data capture technique explained in a separate application note entitled
XAPP721, High-Performance DDR2 SDRAM Interface Data Capture Using ISERDES and
OSERDES.
This application note provides a brief overview of DDR2 SDRAM device features followed by a
detailed explanation of the controller operation when interfacing to high-speed DDR2
memories. It also explains the backend user interface to the controller. A reference design in
Verilog is available for download from the Xilinx website:
http://www.xilinx.com/bvdocs/appnotes/xapp721.zip.

DDR2 SDRAM DDR2 SDRAM devices are the next generation devices in the DDR SDRAM family. DDR2
Overview SDRAM devices use the SSTL 1.8V I/O standard. The following section explains the features
available in the DDR2 SDRAM devices and the key differences between DDR SDRAM and
DDR2 SDRAM devices.
DDR2 SDRAM devices use a DDR architecture to achieve high-speed operation. The memory
operates using a differential clock provided by the controller. Commands are registered at
every positive edge of the clock. A bidirectional data strobe (DQS) is transmitted along with the
data for use in data capture at the receiver. DQS is a strobe transmitted by the DDR2 SDRAM
device during Reads and by the controller during Writes. DQS is edge-aligned with data for
Reads and center-aligned with data for Writes.
Read and write accesses to the DDR2 SDRAM device are burst oriented; accesses begin with
the registration of an Active command, which is then followed by a Read or Write command.
The address bits registered with the Active command are used to select the bank and row to be
accessed. The address bits registered with the Read or Write command are used to select the
bank and the starting column location for the burst access.
The DDR2 controller reference design includes a user backend interface to generate the Write
address, Write data, and Read addresses. This information is stored in three backend FIFOs
for address and data synchronization between the backend and controller modules. Based on
the availability of addresses in the address FIFO, the controller issues the correct commands to
the memory, taking into account the timing requirements of the memory. The implementation
details of the logic blocks are explained in the following sections.

© 2005–2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc.
All other trademarks are the property of their respective owners.

42 Memory Interfaces Solution Guide March 2006


R

DDR2 SDRAM Overview

DDR2 SDRAM Commands Issued by the Controller


Table 1 explains the commands issued by the controller. The commands are detected by the
memory using the following control signals: Row Address Select (RAS), Column Address
Select (CAS), and Write Enable (WE) signals. Clock Enable (CKE) is held High after device
configuration, and Chip select (CS) is held Low throughout device operation. The Mode
Register Definition section describes the DDR2 command functions supported in the controller.
Table 1: DDR2 Commands
Step Function RAS CAS WE
1 Load Mode L L L
2 Auto Refresh L L H
3 Precharge (1) L H L
4 Bank Activate L H H
5 Write H L L
6 Read H L H
7 No Operation/IDLE H H H

Notes:
1. Address signal A10 is held High during Precharge All Banks and is held Low during single bank
precharge.

Mode Register Definition


The Mode register is used to define the specific mode of operation of the DDR2 SDRAM. This
includes the selection of burst length, burst type, CAS latency, and operating mode. Figure 1
explains the Mode register features used by this controller.

BA1 BA0 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0


0 0 PD WR DLL TM CAS# Latency BT Burst Length

A2 A1 A0 Burst Length
0 1 0 4
0 1 1 8
Others Reserved

A6 A5 A4 CAS Latency
0 1 0 2
0 1 1 3
A11 A10 A9 Write Recovery
1 0 0 4
0 0 1 2
1 0 1 5
0 1 0 3
Others Reserved
0 1 1 4
1 0 0 5
1 0 1 6
Others Reserved
x723_01_091505

Figure 1: Mode Register

March 2006 Memory Interfaces Solution Guide 43


R

DDR2 SDRAM Overview

Bank Addresses BA1 and BA0 select the Mode registers. Table 2 shows the Bank Address bit
configuration.
Table 2: Bank Address Bit Configuration
BA1 BA0 Mode Register
0 0 Mode Register (MR)
0 1 EMR1
1 0 EMR2
1 1 EMR3

Extended Mode Register Definition


The extended Mode register (Table 3) controls functions beyond those controlled by the Mode
register. These additional functions are DLL enable/disable, output drive strength, On Die
Termination (ODT), Posted CAS Additive Latency (AL), off-chip driver impedance calibration
(OCD), DQS enable/disable, RDQS/RDQS enable/disable, and OUTPUT disable/enable.
Off-chip Driver Calibration (OCD) is not used in this reference design.
Table 3: Extended Mode Register
BA1 BA0 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0
0 1 Out RDQS DQS OCD Program RTT Posted CAS RTT ODS DLL

Extended Mode Register 2 (EMR2)


Bank Addresses are set to 10 (BA1 is set High, and BA0 is set Low). The address bits are all
set to Low.

Extended Mode Register 3 (EMR3)


Bank Address bits are set to 11 (BA1 and BA0 are set High). Address bits are all set Low, as
in EMR2.

Initialization Sequence
The initialization sequence used in the controller state machine follows the DDR2 SDRAM
specifications. The voltage requirements of the memory need to be met by the interface. The
following is the sequence of commands issued for initialization.
1. After stable power and clock, a NOP or Deselect command is applied for 200 2s.
2. CKE is asserted.
3. Precharge All command after 400 ns.
4. EMR (2) command. BA0 is held Low, and BA1 is held High.
5. EMR (3) command. BA0 and BA1 are both held High.
6. EMR command to enable the memory DLL. BA1 and A0 are held Low, and BA0 is held
High.
7. Mode Register Set command for DLL reset. To lock the DLL, 200 clock cycles are required.
8. Precharge All command.
9. Two Auto Refresh commands.
10. Mode Register Set command with Low to A8, to initialize device operation.
11. EMR command to enable OCD default by setting bits E7, E8, and E9 to 1.
12. EMR command to enable OCD exit by setting bits E7, E8 and E9 to 0.

44 Memory Interfaces Solution Guide March 2006


R

DDR2 SDRAM Overview

After the initialization sequence is complete, the controller issues a dummy write followed by
dummy reads to the DDR2 SDRAM memory for the datapath module to select the right number
of taps in the Virtex-4 input delay block. The datapath module determines the right number of
delay taps required and then asserts the dp_dly_slct_done signal to the controller. The
controller then moves into the IDLE state.

Precharge Command
The Precharge command is used to deactivate the open row in a particular bank. The bank is
available for a subsequent row activation a specified time (tRP) after the Precharge command
is issued. Input A10 determines whether one or all banks are to be precharged.

Auto Refresh Command


DDR2 devices need to be refreshed every 7.8 2s. The circuit to flag the Auto Refresh
commands is built into the controller. The controller uses a system clock divided by 16 output
as the refresh counter. When asserted, the auto_ref signal flags the need for Auto Refresh
commands. The auto_ref signal is held High 7.8 2s after the previous Auto Refresh command.
The controller then issues the Auto Refresh command after it has completed its current burst.
Auto Refresh commands are given the highest priority in the design of this controller.

Active Command
Before any Read or Write commands can be issued to a bank within the DDR2 SDRAM
memory, a row in the bank must be activated using an Active command. After a row is opened,
Read or Write commands can be issued to the row subject to the tRCD specification. DDR2
SDRAM devices also support posted CAS additive latencies; these allow a Read or Write
command to be issued prior to the tRCD specification by delaying the actual registration of the
Read or Write command to the internal device using additive latency clock cycles.
When the controller detects a conflict, it issues a Precharge command to deactivate the open
row and then issues another Active command to the new row. A conflict occurs when an
incoming address refers to a row in a bank other than the currently opened row.

Read Command
The Read command is used to initiate a burst read access to an active row. The values on BA0
and BA1 select the bank address, and the address inputs provided on A0 - Ai select the starting
column location. After the read burst is over, the row is still available for subsequent access
until it is precharged.
Figure 2 shows an example of a Read command with an additive latency of zero. Hence, in this
example, the Read latency is three, the same as the CAS latency.

T0 T1 T2 T3 T3n T4 T4n T5
CK
CK
Command READ NOP NOP NOP NOP NOP

Bank a,
Address Col n
RL = 3 (AL = 0, CL = 3)
DQS
DQS
DQ DOn
x723_02_091505

Figure 2: Read Command Example

March 2006 Memory Interfaces Solution Guide 45


R

DDR2 SDRAM Interface Design

Write Command
The Write command is used to initiate a burst access to an active row. The value on BA0 and
BA1 select the bank address while the value on address inputs A0 - Ai select the starting
column location in the active row. DDR2 SDRAMs use a write latency equal to read latency
minus one clock cycle.
Write Latency = Read Latency – 1 = (Additive Latency + CAS Latency) – 1
Figure 3 shows the case of a Write burst with a Write latency of 2. The time between the Write
command and the first rising edge of the DQS signal is determined by the WL.

T0 T1 T2 T2n T3 T3n T4 T5
CK
CK
Command Write NOP NOP NOP NOP NOP

Bank a,
Address Col b

tDQSS (NOM) tDQSS


DQS
DQS
DQ DIb

DM

x723_03_091605

Figure 3: Write Command Example

DDR2 SDRAM The user interface to the DDR2 controller (Figure 4) and the datapath are clocked at half the
Interface frequency of the interface, resulting in improved design margin at frequencies above 267 MHz.
The operation of the controller at half the frequency does not affect the throughput or latency.
Design DDR2 SDRAM devices support a minimum burst size of 4, only requiring a command every
other clock. For a burst of 4, the controller issues a command every controller clock (the slow
clock). For a burst of 8, the controller issues a command every other controller clock (the slow
clock). All the FIFOs in the user interface are asynchronous FIFOs, allowing the user’s backend
to operate at any frequency. The I/Os toggle at the target frequency.

46 Memory Interfaces Solution Guide March 2006


R

DDR2 SDRAM Interface Design

User Backend User Interface CK/CK_N

Backend FIFOs Address/Controls


App_WAF_addr WAF_addr DDR2 ctrl_Wr_Disable
Read/Write Af empty SDRAM ctrl_Odd_Latency
App_WAF_wren Address FIFO Controller ctrl_RdEn
ctrl_Waf_rden
ctrl_Wdf_rden ctrl_WrEn
Address
and Data App_WDF_data Ctrl_Dummyread_Start Phy_Dly_Slct_Done DDR2
Generation
SDRAM
App_WDF_wren Write Data
FIFOs
WDF_Full WDF_data
Physical DQ
Layer
Read_data_fifo_out read_data_rise/fall
Read Data DQS
Compare Read Data
Read_data_valid FIFOs rd_en_delayed_rise/fall
Module

Virtex-4 FPGA
x723_04_020806

Figure 4: DDR2 Complete Interface Block Diagram

User Backend
The backend is designed to provide address and data patterns to test all the design aspects of
a DDR2 controller. The backend includes the following blocks: backend state machine, read
data comparator, and a data generator module. The data generation module generates the
various address and data patterns that are written to the memory. The address locations are
pre-stored in a block RAM, being used here as a ROM. The address values stored have been
selected to test accesses to different rows and banks in the DDR2 SDRAM device. The data
pattern generator includes a state machine issuing patterns of data. The backend state
machine emulates a user backend. This state machine issues the write or read enable signals
to determine the specific FIFO that will be accessed by the data generator module.

User Interface
The backend user interface has three FIFOs: the Address FIFO, the Write Data FIFO, and the
Read Data FIFO. The first two FIFOs are accessed by the user backend modules, while the
Read Data FIFO is accessed by the Datapath module used to store the captured Read data.

March 2006 Memory Interfaces Solution Guide 47


R

DDR2 SDRAM Interface Design

User-to-Controller Interface
Table 4 lists the signals between the user interface and the controller.
Table 4: Signals Between User Interface and Controller
Port Name Port Width Port Description Notes
Af_addr 36 Output of the Address FIFO in the Monitor FIFO-full status flag to write address into the
user interface. Mapping of these address FIFO.
address bits:
N Memory Address (CS, Bank,
Row, Column) - [31:0]
N Reserved - [32]
N Dynamic Command Request -
[35:33]
Af_empty 1 The user interface Address FIFO FIFO16 Empty Flag.
empty status flag output. The
controller processes the address
on the output of the FIFO when
this signal is deasserted.
ctrl_Waf_RdEn 1 Read Enable input to address This signal is asserted for one clock cycle when the
FIFO in the user interface. controller state is Write, Read, Load Mode Register,
Precharge All, Auto Refresh, or Active resulting from
dynamic command requests.
ctrl_Wdf_RdEn 1 Read Enable input to Write Data The controller asserts this signal for one clock cycle
FIFO in the user interface. after the first write state. This signal is asserted for
two clock cycles for a burst length of 8. Sufficient data
must be available in Write Data FIFO associated with
a write address for the required burst length before
issuing a write command. For example, for a 64-bit
data bus and a burst length of 4, the user should
input two 128-bit data words in the Write Data FIFO
for every write address before issuing the write
command.

The memory address (Af_addr) includes the column address, row address, bank address, and
chip-select width for deep memory interfaces (Table 5).
Table 5: Af_addr Memory Address
Address Description
Column Address col_ap_width – 1:0

Row Address col_ap_width + row_address – 1:col_ap_width

Bank Address col_ap_width + row_address + bank_address – 1:col_ap_width + row_address

Chip Select col_ap_width + row_address + bank_address + chip_address – 1:col_ap_width + row_address + bank_address

48 Memory Interfaces Solution Guide March 2006


R

DDR2 SDRAM Interface Design

Dynamic Command Request


Table 6 lists the optional commands. These commands are not required for normal operation of
the controller. The user has the option to request these commands when required by an
application.
Table 6: Optional Commands
Command Description
000 Load Mode Register
001 Auto Refresh
010 Precharge All
011 Active
100 Write
101 Read
110 NOP
111 NOP

Figure 5 describes four consecutive Writes followed by four consecutive Reads with a burst
length of 8. Table 7 lists the state signal values for Figure 5.

CLKdiv_0

State 0C 0E 0D 0E 0D 0E 0D 0E 16 09 0B 0A 0B 0A 0B 0A 0B

Ctrl_Waf_Rden

Ctrl_Wdf_Rden

Ctrl_Waf_Empty
X723_05_091905

Figure 5: Consecutive Reads Followed by Consecutive Writes with Burst Length of 8

Table 7: Values for State Signals in Figure 5


State Description
0C First Write
0E Write Wait
0D Burst Write
16 Write Read
09 First Write
0B Read Wait
0A Burst Read

March 2006 Memory Interfaces Solution Guide 49


R

DDR2 SDRAM Interface Design

Controller to Physical Layer Interface


Table 8 lists the signals between the controller and the physical layer. Figure 6 describes the
timing waveform for control signals from the controller to the physical layer.
Table 8: Signals Between the Controller and Physical Layer
Signal
Signal Name Signal Description Notes
Width
ctrl_WrEn 1 Output from the controller to Asserted for two controller clock cycles for a burst
the write datapath. Write length of 4 and three controller clock cycles for a
DQS and DQ generation burst length of 8.
begins when this signal is Asserted one controller clock cycle earlier than the
asserted. WRITE command for CAS latency values of 4 and 5.
ctrl_wr_disable 1 Output from the controller to Asserted for one controller clock cycle for a burst
the write datapath. Write length of 4 and two controller clock cycles for a burst
DQS and DQ generation length of 8.
ends when this signal is Asserted one controller clock cycle earlier than the
deasserted. WRITE command for CAS latency values of 4 and 5.
ctrl_Odd_Latency 1 Output from controller to
write datapath. Asserted
when the selected CAS
latency is an odd number.
Required for generation of
write DQS and DQ after the
correct write latency
(CAS latency – 1).
ctrl_Dummyread_Start 1 Output from the controller to This signal must be asserted when valid read data is
the read datapath. When available on the data bus. This signal is deasserted
this signal is asserted, the when the dp_dly_slct_done signal is asserted.
strobe and data calibration
begin.
dp_dly_slct_done 1 Output from the read This signal is asserted when the data and strobe
datapath to the controller have been calibrated. Normal operation begins after
indicating the strobe and this signal is asserted.
data calibration are
complete.
ctrl_RdEn 1 Output from the controller to This signal is asserted for one controller clock cycle
the read datapath for a for a burst length of 4 and two controller clock cycles
read-enable signal. for a burst length of 8.
The CAS latency and additive latency values
determine the timing relationship of this signal with
the read state.

50 Memory Interfaces Solution Guide March 2006


Controller Implementation
R

CLKdiv_0

State 0C 0E 0D 0E 0D 0E 0D 0E 16 09 0B 0A 0B 0A 0B 0A 0B

Ctrl_Wr_En

Ctrl_Wren_Dis

Ctrl_Rden

Cas_latency 5

4
Additive_latency
X723_06_091505

Figure 6: Timing Waveform for Control Signals from the Controller to the Physical Layer

Controller The controller is clocked at the half the frequency of the interfaces. Therefore, the address,
Implementation bank address, and command signals (RAS, CAS, and WE) are asserted for two clock cycles of
the fast memory interface clocks. The control signals (CS, CKE, and ODT) are DDR of the half
frequency clocks, ensuring that the control signals are asserted for just one clock cycle of the
fast memory interface clock.
The controller state machine manages issuing the commands in the correct sequencing order
while determining the timing requirements of the memory.
Along with Figure 7, the following sections explain in detail the various stages of the controller
state machine.

March 2006 Memory Interfaces Solution Guide 51


Controller Implementation
R

Rst

Precharge Conflict/Refresh
RP_cnt= Initialization
0 IDLE
Init_done
h
res
Ref e
h_ don
fres
Re WR/RD
Auto
Refresh RD
Autorefresh/
Conflict Active
Burst
Read
WR Autorefresh/
RD
Conflict
Burst Active
Write Wait

WR WR
RD Conflict/
WR

Conflict/ Write –
First Read First
RD Write RD Read

Conflict/ Conflict/
RD WR

Read
Read_write
Write WR Wait
Wait
X723_07_092005

Figure 7: DDR2 Controller State Machine

Before the controller issues the commands to the memory:


1. The address FIFO is in first-word-fall-through mode (FWFT). In FWFT mode, the first
address written into the FIFO appears at the output of the FIFO. The controller decodes the
address.
2. The controller activates a row in the corresponding bank if all banks have been precharged,
or it compares the bank and row addresses to the already open row and bank address. If
there is a conflict, the controller precharges the open bank and then issues an Active
command before moving to the Read/Write states.
3. After arriving in the Write state, if the controller gets a Read command, the controller waits
for the write_to_read time before issuing the Read command. Similarly, in the Read state,
when the controller sees a Write command from the command logic block, the controller
waits for the read_to_write time before issuing the Write command. In the read or write
state, the controller also asserts the read enable to the address FIFO to get the next
address.
4. The commands are pipelined to synchronize with the Address signals before being issued
to the DDR2 memory.

52 Memory Interfaces Solution Guide March 2006


Design Hierarchy
R

Design Figure 8 shows the design hierarchy beginning with a top-level module called
Hierarchy mem_interface_top.

mem_Interface_top

infrastructure idelay_ctrl main

top test_bench

iobs user_interface data_path ddr2_controller backend_rom cmp_rd_data

infrastr_iobs controller_iobs datapath_iobs backend_fifos rd_data data_write tap_logic addr_gen data_gen_16

idelay_rd_en_io v4_dm_iob v4_dqs_iob v4_dq_iob rd_wr_addr_fifo wr_data_fifo_16 rd_data_fifo tap_ctrl

RAM_D
x723_08_091505

Figure 8: Reference Design Hierarchy

Resource The resource utilization for a 64-bit DDR2 SDRAM interface including the synthesizable test
Utilization bench is listed in Table 9.
Table 9: Resource Utilization
Resources Utilization Notes
Slices 3198 Includes the controller, synthesizable test bench, and the user
interface.
BUFGs 6 Includes one BUFG for the 200 MHz IDELAY block reference
clock.
BUFIOs 8 Equals the number of strobes in the interface.
DCMs 1
PMCDs 2
ISERDES 64 Equals the number of data bits in the interface.
OSERDES 90 Equals the sum of the data bits, strobes, and data mask signals.

The reference design for the 64-bit DDR2 SDRAM interface using the data capture technique is
available for download on the Xilinx website at:
http://www.xilinx.com/bvdocs/appnotes/xapp721.zip.

March 2006 Memory Interfaces Solution Guide 53


Conclusion
R

Conclusion The DDR2 controller described in this application note, along with the data capture method
from XAPP721, provide a good solution for high-performance memory interfaces. This design
provides high margin because all the logic in the FPGA fabric is clocked at half the frequency
of the interface, eliminating critical paths. This design was verified in hardware.

Revision The following table shows the revision history for this document.
History
Date Version Revision
12/15/05 1.0 Initial Xilinx release.
12/16/05 1.1 Updated Table 8 and Table 9.
02/02/06 1.2 Updated Figure 4.
02/08/06 1.3 Updated Figure 4.

54 Memory Interfaces Solution Guide March 2006


Application Note: Virtex-4 Series

R High-Performance DDR2 SDRAM


Interface Data Capture Using ISERDES
XAPP721 (v1.3) February 2, 2006
and OSERDES
Author: Maria George

Summary This application note describes a data capture technique for a high-performance DDR2
SDRAM interface. This technique uses the Input Serializer/Deserializer (ISERDES) and Output
Serializer/Deserializer (OSERDES) features available in every Virtex™-4 I/O. This technique
can be used for memory interfaces with frequencies of 267 MHz (533 Mb/s) and above.

Introduction A DDR2 SDRAM interface is source-synchronous where the read data and read strobe are
transmitted edge-aligned. To capture this transmitted data using Virtex-4 FPGAs, either the
strobe or the data can be delayed. In this design, the read data is captured in the delayed
strobe domain and recaptured in the FPGA clock domain in the ISERDES. The received serial,
double data rate (DDR) read data is converted to 4-bit parallel single data rate (SDR) data at
half the frequency of the interface using the ISERDES. The differential strobe is placed on a
clock-capable IO pair in order to access the BUFIO clock resource. The BUFIO clocking
resource routes the delayed read DQS to its associated data ISERDES clock inputs. The write
data and strobe transmitted by the FPGA use the OSERDES. The OSERDES converts 4-bit
parallel data at half the frequency of the interface to DDR data at the interface frequency. The
controller, datapath, user interface, and all other FPGA slice logic are clocked at half the
frequency of the interface, resulting in improved design margin at frequencies of 267 MHz and
above.

Clocking The clocking scheme for this design includes one digital clock manager (DCM) and two phase-
Scheme matched clock dividers (PMCDs) as shown in Figure 1. The controller is clocked at half the
frequency of the interface using CLKdiv_0. Therefore, the address, bank address, and
command signals (RAS_L, CAS_L, and WE_L) are asserted for two clock cycles (known as
"2T" timing), of the fast memory interface clock. The control signals (CS_L, CKE, and ODT) are
twice the rate (DDR) of the half frequency clock CLKdiv_0, ensuring that the control signals are
asserted for just one clock cycle of the fast memory interface clock. The clock is forwarded to
the external memory device using the Output Dual Data Rate (ODDR) flip-flops in the Virtex-4
I/O. This forwarded clock is 180 degrees out of phase with CLKfast_0. Figure 2 shows the
command and control timing diagram.

© 2005 – 2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc.
All other trademarks are the property of their respective owners.

March 2006 Memory Interfaces Solution Guide 55


R

Write Datapath

DCM PMCD#1
CLKfast Input CLKfast_0
CLKIN CLK0 CLKA CLKA1
System Reset*
RST CLK90 * RST CLKdiv_0
CLKA1D2

CLKFB LOCKED CLKFB

PMCD#2
CLKfast_90
CLKA CLKA1

CLKB
CLKdiv_90
* RST CLKA1D2

CLKFB

x702_04_051105

Figure 1: Clocking Scheme for the High-Performance Memory Interface Design

CLKdiv_0

CLKfast_0

Memory Device
Clock

Command WRITE IDLE

Control (CS_L)
X721_02_080205

Figure 2: Command and Control Timing

Write Datapath The write datapath uses the built-in OSERDES available in every Virtex-4 I/O. The OSERDES
transmits the data (DQ) and strobe (DQS) signals. The memory specification requires DQS to
be transmitted center-aligned with DQ. The strobe (DQS) forwarded to the memory is
180 degrees out of phase with CLKfast_0. Therefore, the write data transmitted using
OSERDES must be clocked by CLKfast_90 and CLKdiv_90 as shown in Figure 3. The timing
diagram for write DQS and DQ is shown in Figure 4.

56 Memory Interfaces Solution Guide March 2006


R

Write Datapath

D1
DQ

D2

Write
Data
Words
0-3
D3

D4

OSERDES

CLKDIV CLK
CLKdiv_90

CLKfast_90

IOB ChipSyncTM Circuit


X721_03_080305

Figure 3: Write Data Transmitted Using OSERDES

CLKfast_0

CLKfast_90

Clock Forwarded
to Memory Device

Command WRITE IDLE

Control (CS_L)

Strobe (DQS)

Data (DQ), OSERDES Output D0 D1 D2 D3


X721_04_120505

Figure 4: Write Strobe (DQS) and Data (DQ) Timing for a Write Latency of Four

March 2006 Memory Interfaces Solution Guide 57


Write Datapath
R

Write Timing Analysis


Table 1 shows the write timing analysis for an interface at 333 MHz (667 Mb/s).

Table 1: Write Timing Analysis at 333 MHz

Uncertainties Uncertainties
Uncertainty Parameters Value Meaning
before DQS after DQS

TCLOCK 3000 Clock period.


TMEMORY_DLL_DUTY_CYCLE_DIST 150 150 150 Duty-cycle distortion from memory DLL is
subtracted from clock phase (equal to half
the clock period) to determine
TDATA_PERIOD.
TDATA_PERIOD 1350 Data period is half the clock period with 10%
duty-cycle distortion subtracted from it.
TSETUP 100 100 0 Specified by memory vendor.
THOLD 175 0 175 Specified by memory vendor.
TPACKAGE_SKEW 30 30 30 PCB trace delays for DQS and its
associated DQ bits are adjusted to account
for package skew. The listed value
represents dielectric constant variations.
TJITTER 50 50 50 Same DCM used to generate DQS and DQ.
TCLOCK_SKEW-MAX 50 50 50 Global Clock Tree skew.
TCLOCK_OUT_PHASE 140 140 140 Phase offset error between different clock
outputs of the same DCM.
TPCB_LAYOUT_SKEW 50 50 50 Skew between data lines and the
associated strobe on the board.
Total Uncertainties 420 495
Start and End of Valid Window 420 855
Final Window 435 Final window equals 855 – 420.

Notes:
1. Skew between output flip-flops and output buffers in the same bank is considered to be minimal over voltage and temperature.

58 Memory Interfaces Solution Guide March 2006


R

Write Datapath

Controller to Write Datapath Interface


Table 2 lists the signals required from the controller to the write datapath.

Table 2: Controller to Write Datapath Signals


Signal
Signal Name Signal Description Notes
Width
ctrl_WrEn 1 Output from the controller to the write Asserted for two CLKDIV_0 cycles for a burst length
datapath. of 4 and three CLKDIV_0 cycles for a burst length of
Write DQS and DQ generation 8.
begins when this signal is asserted. Asserted one CLKDIV_0 cycle earlier than the
WRITE command for CAS latency values of 4 and
5.
Figure 5 and Figure 6 show the timing relationship
of this signal with respect to the WRITE command.
ctrl_wr_disable 1 Output from the controller to the write Asserted for one CLKDIV_0 cycle for a burst length
datapath. of 4 and two CLKDIV_0 cycles for a burst length of
Write DQS and DQ generation ends 8.
when this signal is deasserted. Asserted one CLKDIV_0 cycle earlier than the
WRITE command for CAS latency values of 4 and
5.
Figure 5 and Figure 6 show the timing relationship
of this signal with respect to the WRITE command.
ctrl_Odd_Latency 1 Output from controller to write
datapath.
Asserted when the selected CAS
latency is an odd number, e.g., 5.
Required for generation of write DQS
and DQ after the correct write
latency (CAS latency – 1).

March 2006 Memory Interfaces Solution Guide 59


R

Write Datapath

CLKdiv_0
Clock Forwarded
to Memory Device

CLKdiv_90

CLKfast_90

Command WRITE IDLE

Control (CS_L)

ctrl_WrEn

ctrl_wr_disable

User Interface Data D0,D1,D2,D3


FIFO Out

OSERDES Inputs D1, D2, D3, D4 X,X,D0,D1 D2,D3,X,X

OSERDES Inputs T1, T2, T3, T4 1,1,0,0 0,0,1,1

Strobe (DQS)

Data (DQ), OSERDES Output D0 D1 D2 D3

X721_05_080205

Figure 5: Write DQ Generation with a Write Latency of 4 and a Burst Length of 4

CLKdiv_0

CLKfast_0

Clock Forwarded
to Memory Device

CLKdiv_180

Command WRITE IDLE

Control (CS_L)

ctrl_WrEn

ctrl_wr_disable

OSERDES Inputs D1, D2, D3, D4 0, 0, 0, 0 0, 1, 0, 1 0, 0, 0 ,0

OSERDES Inputs T1, T2, T3, T4 1, 1, 1, 0 0, 0, 0, 0 0, 1, 1, 1

Strobe (DQS), OSERDES Output

X721_06_080205

Figure 6: Write DQS Generation for a Write Latency of 4 and a Burst Length of 4

60 Memory Interfaces Solution Guide March 2006


R

Read Datapath

Read Datapath The read datapath comprises the read data capture and recapture stages. Both stages are
implemented in the built-in ISERDES available in every Virtex-4 I/O. The ISERDES has three
clock inputs: CLK, OCLK, and CLKDIV. The read data is captured in the CLK (DQS) domain,
recaptured in the OCLK (FPGA fast clock) domain, and finally transferred to the CLKDIV
(FPGA divided clock) domain to provide parallel data.
N CLK: The read DQS routed using BUFIO provides the CLK input of the ISERDES as
shown in Figure 7.
N OCLK: The OCLK input of ISERDES is connected to the CLK input of OSERDES in
hardware. In this design, the CLKfast_90 clock is provided to the ISERDES OCLK input
and the OSERDES CLK input. The clock phase used for OCLK is dictated by the phase
required for write data.
N CLKDIV: It is imperative for OCLK and CLKDIV clock inputs to be phase-aligned for
correct functionality. Therefore, the CLKDIV input is provided with CLKdiv_90 that is
phase-aligned to CLKfast_90.

User Interface
FIFOs
DQ Delay Q1 Read Data
Read Data Word 3
to Align With
Strobe and
Q2 Read Data
FPGA Clock
Word 2

Q3 Read Data
Word 1

Q4 Read Data
Word 0

ISERDES
CLK OCLK CLKDIV BUFIO
DQS

Data Delay Value Determined


Using Training Pattern

CLKdiv_90
CLKfast_90
IOB
X721_07_063005

Figure 7: Read Data Capture Using ISERDES

Read Timing Analysis


To capture read data without errors in the ISERDES, read data and strobe must be delayed to
meet the setup and hold times of the flip-flops in the FPGA clock domain. Read data (DQ) and
strobe (DQS) are received edge-aligned at the FPGA. The differential DQS pair must be placed
on a clock-capable IO pair in order to access the BUFIO resource. The received read DQS is
then routed through the BUFIO resource to the CLK input of the ISERDES of the associated
data bits. The delay through the BUFIO and clock routing resources shifts the DQS to the right
with respect to data. The total delay through the BUFIO and clock resource is 595 ps in a -11
speed grade device and 555 ps in a -12 speed grade device.

March 2006 Memory Interfaces Solution Guide 61


R

Read Datapath

Table 3 shows the read timing analysis at 333 MHz required to determine the delay required on
DQ bits for centering DQS in the data valid window.
Table 3: Read Timing Analysis at 333 MHz

Parameter Value (ps) Meaning

TCLOCK 3000 Clock period.


TPHASE 1500 Clock phase for DDR data.
TSAMP_BUFIO 350 Sample Window from Virtex-4 data sheet for
a -12 device. It includes setup and hold for
an IOB FF, clock jitter, and 150 ps of tap
uncertainty.
TBUFIO_DCD 100 BUFIO clock resource duty-cycle distortion.
TDQSQ + TQHS 580 Worst case memory uncertainties that
include VT variations and skew between
DQS and its associated DQs. Because the
design includes per bit deskew, realistically
only a percentage of this number should be
considered.
TMEM_DCD 150 Duty-cycle distortion.
Tap Uncertainty 0 Tap uncertainty with 75 ps resolution. A
window detection error of 75 ps can be on
both ends of the window. This is already
included in TSAMP_BUFIO.
Total Uncertainties 1180
Window 320 Worst-case window.
Notes:
1. TSAMP_BUFIO is the sampling error over VT for a DDR input register in the IOB when using
the BUFIO clocking resource and the IDELAY.
2. All the parameters listed above are uncertainties to be considered when using the per bit
calibration technique.
3. Parameters like BUFIO skew, package_skew, pcb_layout_skew, and part of TDQSQ, and
TQHS are calibrated out with the per bit calibration technique. Inter-symbol interference and
crosstalk, contributors to dynamic skew, are not considered in this analysis.

Per Bit Deskew Data Capture Technique


To ensure reliable data capture in the OCLK and CLKDIV domains in the ISERDES, a training
sequence is required after memory initialization. The controller issues a WRITE command to
write a known data pattern to a specified memory location. The controller then issues
back-to-back read commands to read back the written data from this specified location. The DQ
bit 0 ISERDES outputs Q1, Q2, Q3, and Q4 are then compared with the known data pattern. If
they do not match, DQ and DQS are delayed by one tap, and the comparison is performed
again. The tap increments continue until there is a match. If there is no match even at tap 64,
then DQ and DQS are both reset to tap 0. DQS tap is set to one, and both DQS and DQ are
delayed in unit tap increments and the comparison is performed after each tap increment until
a match is found. With the first detected match, the DQS window count is incremented to 1.
DQS continues to be delayed in unit tap increments until a mismatch is detected. The DQS
window count is also incremented along with the tap increments to record the width of the data
valid window in the FPGA clock domain. DQS is then decremented by half the window count to
center DQS edges in the center of the data valid window. With the position of DQS fixed, each
DQ bit is then centered with respect to DQS. The dp_dly_slct_done signal is asserted when the
centering of all DQ bits associated with its DQS is completed.

62 Memory Interfaces Solution Guide March 2006


R

Read Datapath

Figure 8 shows the timing waveform for read data and strobe delay determination. The
waveforms on the left show a case where the DQS is delayed due to BUFIO and clocking
resource, and the ISERDES outputs do not match the expected data pattern. The waveforms
on the right show a case where the DQS and DQ are delayed until the ISERDES outputs match
the expected data pattern. The lower end of the frequency range useful in this design is limited
by the number of available taps in the IDELAY block, the PCB trace delay, and the CAS latency
of the memory device.

CLKdiv_0

CLKfast_0

CLKfast_90

CLKdiv_90

DQS @ FPGA DQS @ FPGA

DQ @ FPGA D0 D1 D2 D3 DQ @ FPGA D0 D1 D2 D3

DQS @ ISERDES DQS Delayed by Calibration


delayed by BUFIO Delay @ ISERDES
and clocking resource
DQ D0 D1 D2 D3 DQ Delayed by Calibration Delay D0 D1 D2 D3

Correct Data
D0 D2 D0 D2 Sequence
DQ Captured in DQS Domain
D1 D3 D1 D3

D0 D2 D0 D0 D2
Input to Q2 Reg

Input to Q1 Reg D1 D3 D1 D1 D3
CLKfast_90
Domain Input to Q4 Reg D0 D2 D0 D2
No Match
Incorrect Data
Input to Q3 Reg D1 D3 Sequence D1 D3

Parallel Data @ ISERDES Parallel Data @ ISERDES


D2,D3,D0,D1 D0,D1,D2,D3
Outputs Q4, Q3, Q2, Q1 Outputs Q4, Q3, Q2, Q1
X721_08_112905

Figure 8: Read Data and Strobe Delay

March 2006 Memory Interfaces Solution Guide 63


R

Read Datapath

Controller to Read Datapath Interface


Table 4 lists the control signals between the controller and the read datapath.

Table 4: Signals between Controller and Read Datapath


Signal
Signal Name Signal Description Notes
Width
ctrl_Dummyread_Start 1 Output from the controller to the This signal must be asserted when valid read data
read datapath. When this signal is available on the data bus.
is asserted, the strobe and data This signal is deasserted when the
calibration begin. dp_dly_slct_done signal is asserted.
dp_dly_slct_done 1 Output from the read datapath This signal is asserted when the data and strobe
to the controller indicating the have been calibrated.
strobe and data calibration are Normal operation begins after this signal is
complete. asserted.
ctrl_RdEn_div0 1 Output from the controller to the This signal is asserted for one CLKdiv_0 clock
read datapath used as the write cycle for a burst length of 4 and two clock cycles for
enable to the read data capture a burst length of 8.
FIFOs. The CAS latency and additive latency values
determine the timing relationship of this signal with
the read state.
Figure 9 shows the timing waveform for this signal
with a CAS latency of 5 and an additive latency of
0 for a burst length of 4.

CLKdiv_0

CLKfast_0

CLKdiv_90

CLKfast_90

Command READ D0 D1 D2 D3 DQ @ Memory Device

CS# @ Memory DQS @ Memory Device

DQS @ ISERDES CLK Input


ctrl_RdEn_div0 (Round Trip & BUFIO & Calibration Delays)
(Input to SRL16 Clocked
DQ @ ISERDES Input
by CLKdiv_90) D0 D1 D2 D3
(Round Trip & Initial Tap Value & Calibration Delays)
Parallel Data
D0,D1,D2,D3
@ ISERDES Output

srl_out (SRL16 Output)

Ctrl_RdEn
(Write_enable to FIFOs Aligned with ISERDES Data Output)
X721_09_113005

Figure 9: Read-Enable Timing for CAS Latency of 5 and Burst Length of 4

64 Memory Interfaces Solution Guide March 2006


R

Reference Design

The ctrl_RdEn signal is required to validate read data because the DDR2 SDRAM devices do
not provide a read valid or read-enable signal along with read data. The controller generates
this read-enable signal based on the CAS latency and the burst length. This read-enable signal
is input to an SRL16 (LUT-based shift register). The number of register stages required to align
the read-enable signal to the ISERDES read data output is determined during calibration. One
read-enable signal is generated for each data byte. Figure 10 shows the read-enable logic
block diagram.

srl_out ctrl_RdEn
ctrl_RdEn_div0
SRL16 FD

Number of Register Stages


Selected During Calibration
CLKdiv_90
x721_10_113005

Figure 10: Read-Enable Logic

Reference Figure 11 shows the hierarchy of the reference design. The mem_interface_top is the top-level
Design module. This reference design is available on the Xilinx website at:
http://www.xilinx.com/bvdocs/appnotes/xapp721.zip.

mem_Interface_top

infrastructure idelay_ctrl main

top test_bench

iobs user_interface data_path ddr2_controller backend_rom cmp_rd_data

infrastr_iobs controller_iobs datapath_iobs backend_fifos rd_data data_write tap_logic addr_gen data_gen_16

idelay_rd_en_io v4_dm_iob v4_dqs_iob v4_dq_iob rd_wr_addr_fifo wr_data_fifo_16 rd_data_fifo tap_ctrl data_tap_inc

RAM_D
X721_11_113005

Figure 11: Reference Design Hierarchy

March 2006 Memory Interfaces Solution Guide 65


Reference Design Utilization
R

Reference Table 5 lists the resource utilization for a 64-bit interface including the physical layer, the
Design controller, the user interface, and a synthesizable test bench.

Utilization Table 5: Resource Utilization for a 64-Bit Interface


Resources Utilization Notes
Slices 5861 Includes the controller, synthesizable test bench, and the user
interface.
BUFGs 6 Includes one BUFG for the 200 MHz reference clock for the
IDELAY block.
BUFIOs 8 Equals the number of strobes in the interface.
DCMs 1
PMCDs 2
ISERDES 64 Equals the number of data bits in the interface.
OSERDES 88 Equals the sum of the data bits, strobes, and data mask bits.

Conclusion The data capture technique explained in this application note using ISERDES provides a good
margin for high-performance memory interfaces. The high margin can be achieved because all
the logic in the FPGA fabric is clocked at half the frequency of the interface, eliminating critical
paths.

Revision The following table shows the revision history for this document.
History
Date Version Revision
12/15/05 1.0 Initial Xilinx release.
12/20/05 1.1 Updated Table 1.
01/04/06 1.2 Updated link to reference design file.
02/02/06 1.3 Updated Table 4.

66 Memory Interfaces Solution Guide March 2006


Signal Integrity for High-Speed
Memory and Processor I/O
SI20000-6-ILT (v1.0) Course Specification

Course Description Lab Descriptions


Learn how signal integrity techniques are applicable to high-speed Note: Labs feature the Mentor Graphics or Cadence flow. For private
interfaces between Xilinx FPGAs and semiconductor memories. This training, please specify your flow to your registrar or sales contact. For
course teaches you about high-speed bus and clock design, including public classes, flow will be determined by the instructor based upon
transmission line termination, loading, and jitter. You will work with class feedback.
IBIS models and complete simulations using CAD packages. Other
topics include managing PCB effects and on-chip termination. This ! Mentor Lab 1: Opening the appropriate Mentor simulator
course balances lecture modules and practical hands-on labs. ! Mentor Lab 2: Hands-on signal integrity observation of reflection
and propagation effects
! Mentor Lab 3: Using an IBIS simulator to study basic
Level ñ Intermediate transmission line effects
Course Duration ñ 2 days
Price ñ $1000 USD or 10 training credits ! Mentor Lab 4: Using saved simulation information to perform
power calculation. Also, additional clock simulations
Course Part Number ñ SI20000-6-ILT
Who Should Attend? ñ Digital designers, board layout designers, ! Mentor Lab 5: Observing the effects of coupling on transmission
or scientists, engineers, and technologists seeking to implement lines
Xilinx solutions. Also end users of Xilinx products who want to ! Mentor Lab 6: Demonstrating how an SDRAM module can be
understand how to implement high-speed interfaces without handled with an EBD model
incurring the signal integrity problems related to timing, crosstalk, ! Cadence Lab 1: Opening the appropriate Cadence simulator
and overshoot or undershoot infractions. ! Cadence Lab 2: Analysis of a simple clock net
Prerequisites ! Cadence Lab 3: Signal integrity effects caused by multidrop clock
! Xilinx FPGA design experience preferred (equivalent of networks
Fundamentals of FPGA Design course) ! Cadence Lab 4: Crosstalk analysis
Software Tools
! Cadence Lab 5: Address and data analysis
! Mentor Graphics HyperLynxÆ
! Cadence SPECCTRAQuestÆ
Register Today
After completing this comprehensive training, you will have the Xilinx delivers public and private courses in locations throughout the
necessary skills to: world. Please contact Xilinx Education Services for more information,
! Identify when signal integrity is important and relevant to view schedules, or to register online.
! Interpret an IBIS model and correct common errors
! Apply appropriate transmission line termination Visit www.xilinx.com/education, and click on the region where you
! Understand the effect loading has on signal propagation want to attend a course.
! Mitigate the impact of jitter North America, send your inquiries to [email protected], or contact
! Manage a memory data bus the registrar at 877-XLX-CLAS (877-959-2527). To register online,
! Understand the impact of selecting a PCB stackup search by Keyword ìHigh-Speedî in the Training Catalog at
! Differentiate between on-chip termination and discrete termination https://xilinx.onsaba.net/xilinx.

Europe, send your inquiries to [email protected],


call +44-870-7350-548, or send a fax to +44-870-7350-620.
Course Outline
Day 1 Asia Pacific, contact our training providers at:
www.xilinx.com/support/training/asia-learning-catalog.htm, send your
! Introduction
inquiries to [email protected], or call: +852-2424-5200.
! Transmission Lines
! Mentor or Cadence Lab 1 Japan, see the Japanese training schedule at:
! IBIS Models www.xilinx.co.jp/support/training/japan-learning-catalog.htm, send your
! Mentor or Cadence Lab 2 inquiries to [email protected], or call: +81-3-5321-7772.
! Mentor or Cadence Lab 3
! High-Speed Clock Design You must have your tuition payment information available when you
enroll. We accept credit cards (Visa, MasterCard, or American
! Mentor or Cadence Lab 4 Express) as well as purchase orders and training credits.
! SRAM Requirements
! Mentor or Cadence Lab 5

Day 2
! Physical PCB Structure
! On-Chip Termination
! SDRAM Design
! Mentor Lab 6
! Managing an Entire Design

© 2005 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and disclaimers are as listed at www.xilinx.com/legal.htm.
All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice.

March 2006 Memory Interfaces Solution Guide 67


68 Memory Interfaces Solution Guide March 2006
March 2006 Memory Interfaces Solution Guide 69
What’s New

To complement our flagship publication Xcell Journal, we’ve recently


launched three new technology magazines:

 Embedded Magazine, focusing on the use of embedded


processors in Xilinx® programmable logic devices.
 DSP Magazine, focusing on the high-performance
capabilities of our FPGA-based reconfigurable DSPs.
 I/O Magazine, focusing on the wide range of serial and
parallel connectivity options available in Xilinx devices.

In addition to these new magazines, we’ve created a family of Solution Guides,


designed to provide useful information on a wide range of hot topics such as
Broadcast Engineering, Power Management, and Signal Integrity.
Others are planned throughout the year.

www.xilinx.com/xcell/
R
Two speed grades faster with
PlanAhead software and Virtex-4
Xilinx ISE
with PlanAhead

With our unique PlanAhead software tool, and our industry-leading Virtex-4
Xilinx ISE
FPGAs, designers can now achieve a new level of performance. For complex,
high-utilization, multi-clock designs, no other competing FPGA comes close
Nearest to the Virtex-4 PlanAhead advantage:
Competitor
• 30% better logic performance on average = 2 speed grade advantage
• Over 50% better logic performance for complex multi-clock designs
1 2
Speed Grade Speed Grades
Meet Your Timing Budgets . . . Beat
Based on benchmark data from a suite of 15 real-world customer designs targeting Xilinx and competing
FPGA Solutions.
Your Competition To Market
Meeting timing budgets is the most critical issue facing FPGA designers*. Inferior
tools can hit a performance barrier, impacting your timing goals, while costing
you project delays and expensive higher speed grades. To maximize the Virtex-4
performance advantage, the new PlanAhead software tool allows you to quickly
analyze, floorplan, and improve placement and timing of even the most complex
designs. Now, with ISE and PlanAhead you can meet your timing budgets and
reduce design iterations, all within an easy-to-use design environment.

Download a free eval today at www.xilinx.com/planahead, view the


TechOnline web seminar, and prevent your next FPGA design from stalling.

* CMP: June 2005 FPGA EDA Survey

The Programmable Logic CompanySM

View The
TechOnLine
Seminar Today

©2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners.
Memory Interfaces
Solution Guide

“As designers of high-performance


systems labor to achieve higher
bandwidth while meeting critical
timing margins, one consistently
vexing performance bottleneck
is the memory interface.”

www.xilinx.com/xcell/memory1/

Published by

Corporate Headquarters European Headquarters Japan Asia Pacific Distributed By:


Xilinx, Inc. Xilinx Xilinx, K.K. Xilinx, Asia Pacific Pte. Ltd.
2100 Logic Drive Citywest Business Campus Shinjuku Square Tower 18F No. 3 Changi Business Park Vista, #04-01
San Jose, CA 95124 Saggart, 6-22-1 Nishi-Shinjuku Singapore 486051
Tel: (408) 559-7778 Co. Dublin Shinjuku-ku, Tokyo Tel: (65) 6544-8999
Fax: (408) 559-7114 Ireland 163-1118, Japan Fax: (65) 6789-8886
Web: www.xilinx.com Tel: +353-1-464-0311 Tel: 81-3-5321-7711 RCB no: 20-0312557-M
Fax: +353-1-464-0324 Fax: 81-3-5321-7765 Web: www.xilinx.com
Web: www.xilinx.com Web: www.xilinx.co.jp

© 2006 Xilinx Inc. All rights reserved. The Xilinx name is a registered trademark; CoolRunner, Virtex, Spartan, Virtex-II Pro, RocketIO, System ACE, WebPACK, HDL Bencher, ChipScope, LogiCORE, AllianceCORE, MicroBlaze, and PicoBlaze are trademarks;
and The Programmable Logic Company is a service mark of Xilinx Inc. PowerPC is a trademark of International Business Machines Corporation in the United States, or other countries, or both. All other trademarks are the property of their owners.

Printed in U.S.A. PN 0010926

You might also like