DDR4 Simulation Guidelines

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

DDR4 Simulation Guidelines

Introduction
These guidelines were created for use with the existing Memory Design Guidelines and provide
information on the simulations necessary to create the information needed for those guidelines that are
very important to. This set of guidelines will be created using Arria 10 IBIS models.

Background Knowledge Source


The Altera External Memory Interface Handbook provides a thorough explanation of DDR4 topologies
and board design guidelines for DDR4 systems. The External Memory Interface (EMIF) Handbook is very
useful in understanding what needs to be done to create a successful system.

http://www.altera.com/literature/hb/external-memory/emi_plan.pdf

The EMIF handbook guidelines were created using useful numbers for spacing of the traces and other
constraints for the system. If a 2D simulator available figures can be created for signal parallelism rules
and length matching. Also duty cycle distortion for clocks and DQS signals can be examined to avoid
timing problems in systems with high data rates.

In the document below the user will be lead through the simulation process using the 2D HyperLynx to
determine the optimum drive and termination levels for the pseudo open drain (POD) DQ/DQS
interface and create informed parallelism constraints for the topology.
Contents
Introduction .................................................................................................................................................. 1
Background Knowledge Source ................................................................................................................ 1
Chapter 1: Background ................................................................................................................................. 5
Properties of DDR4 systems that have an effect on simulation ............................................................... 5
General Simulation Method...................................................................................................................... 5
Topology ................................................................................................................................................... 6
Fly-by with equidistant memories ........................................................................................................ 6
Fly-by with “Ping Pong” configuration .................................................................................................. 6
Pre-Layout Simulations ............................................................................................................................. 7
Chapter 2: Board data and Simulation Models ............................................................................................. 7
Stackup...................................................................................................................................................... 7
Stackup setup in HyperLynx .................................................................................................................. 7
Simulation Models .................................................................................................................................... 8
Altera IBIS models from the website .................................................................................................... 8
Creating the Altera IBIS File from a Quartus Project ............................................................................ 9
Memory Vendor IBIS Files ................................................................................................................... 10
EBD Simulation Files............................................................................................................................ 10
Chapter 3: Setting up HyperLynx ................................................................................................................ 11
Locating Model Directories ..................................................................................................................... 11
Stackup Entry .......................................................................................................................................... 12
Opening a new schematic ....................................................................................................................... 12
Chapter 4: Command/Address Simulation and Analysis ............................................................................ 13
C/A Required simulation sets.................................................................................................................. 13
Basic C/A ISI, channel drive and termination simulation ........................................................................ 13
C/A Drive and Termination ................................................................................................................. 13
New Schematic.................................................................................................................................... 13
Stackup entry ...................................................................................................................................... 13
Creating the topology ......................................................................................................................... 14
Simulation lossy channel setup ........................................................................................................... 20
Nominal single channel ISI simulation run .......................................................................................... 20
Crosstalk Simulation Setup for C/A Channels for Parallelism ................................................................. 25
Crosstalk simulation theory ................................................................................................................ 29
C/A Parallelism Simulation and Analysis ............................................................................................. 30
Chapter 5: Clock Simulation and Analysis ................................................................................................... 36
Basic Clock Schematic ............................................................................................................................. 36
Pre-Layout Clock Simulation ................................................................................................................... 38
Pre layout clock crosstalk ........................................................................................................................ 40
Chapter 6: DQ/DQS Channel Simulation and Analysis ................................................................................ 45
Termination............................................................................................................................................. 46
Initial setup ......................................................................................................................................... 46
Nominal Topology to the Memory Device .......................................................................................... 47
Setting up and analyzing ISI termination simulations for DQ signals ..................................................... 48
Nominal Topology from the Memory Device ..................................................................................... 51
Transmit Eye Mask Opening ............................................................................................................... 53
Receive Eye Mask Opening ................................................................................................................. 54
Crosstalk Constraint Simulations ............................................................................................................ 54
Example of stripline crosstalk simulation for parallelism rules .......................................................... 54
A crosstalk topology ............................................................................................................................ 55
Setting up the simulations .................................................................................................................. 55
Running the Simulations ..................................................................................................................... 56
Analysis of crosstalk for parallelism rules ........................................................................................... 57
DQ/DQS Simulation for ISI and Crosstalk Effects ............................................................................... 58
ISI......................................................................................................................................................... 58
Stackup and Topology Setup in HyperLynx ......................................................................................... 58
Termination variation simulation ....................................................................................................... 58
Channel Simulation for crosstalk effects ............................................................................................ 60
Channel Simulation for SSN effects ........................................................................................................ 61
Chapter 7: Further Investigations ............................................................................................................... 61
Post Layout Simulation ........................................................................................................................... 61
Further Reading ...................................................................................................................................... 61
EMIF Guidelines for Layout and Timing .............................................................................................. 62
Simulating for Timing Closure ................................................................................................................. 62
Check these things .................................................................................................................................. 63
Chapter 1: Background

Properties of DDR4 systems that have an effect on simulation


The following table of properties constrains what needs to be simulated. Your requirements will vary
due to device speeds and timing constraints.

Table 1

Property Value Notes


I/O Voltage 1.2V L version could be 1.05
Data Rates 1600MHz to 3200Mbps. 800MHz to 1.6GHz Clock (See
[Arria 10 doesn’t support note 3)
full range. Current
supported range is 1333
MHz max]
Clock Signals SSTL 1.2V differential Externally Terminated (See note
1)
Address and Control Signal Standard SSTL 1.2V Externally Terminated (See note
1)
Address and Control Signal Rate 1T or 2T (See note 4)
DQ bus POD12 Lower SSN for systems
POD output drive strength 40 Ohms Altera Standard (See note 2)
Board trace characteristic impedance 40 Ohms Strongly Recommended (Micron)

Notes for Table 1:

1. The Address, Command and Clock signals all use a threshold of 0.49*VCC. For data rates up to
2133MTs the thresholds are +/-125mv. For 2400MTs the thresholds are +/-100mv.
2. POD12 is the acronym for the Pseudo Open Drain 1.2V interface. This interface is terminated to
VCC(1.2V) to reduce simultaneous signaling noise and reduce the complexity of the system. The
threshold voltage is therefore not 1/2VCC but something much higher. This voltage also
depends on the strength of the driver in the long term so DDR4 memory devices have an
adaptable threshold built in and bus inversion is included to help minimize DC drift in the
system. The DDR4 JEDEC specification for drive strength is 39 Ohms.
3. The specification for DDR4 gives a clock range of 1.25ns maximum to 0.625ns minimum or
800MHz to 1600MHz.
4. The data rate is equal to the clock rate for 1T and half the clock rate for 2T. Using the 2T timing
allows much more time for the signals to stabilize.

General Simulation Method


In order to assure that a memory project will be successful a set of simulations should be created. This
document covers the simulation methods necessary for DDR4.

The following subjects will be covered.


 Necessary Input
o Topology
o Data Rate
o Stackup
o Models
 Pre-layout Simulations
o Command/Address group
o Clock signals
o DQ/DQS Groups
 Post-layout Simulations

Topology
DDR4 was designed to use a Fly-by topology for the Command/Address and Clock system. The DQ/DQS
systems use read and write leveling to provide accurate timing for the exchange of data. The simulation
set necessary for each group is unique.

The following topology details are necessary.

 The number of devices in the chain


 The expected distance between the FPGA, the memory devices, and the termination

Fly-by with equidistant memories

FPGA A Mem0 B Mem1 B Mem.. .. MemN Termination

Fly-by with “Ping Pong” configuration

Mem0 Mem2
Additional sets of Memory
B

FPGA A via C via C Termination


B

Mem1 Mem3

The challenge with the “Ping Pong” arrangement is lower impedance due to the capacitance of closely
associated memory parts and branching of the controlled impedance paths.
Pre-Layout Simulations
Pre-layout simulations provide information on what is needed to create a successful topology. A
schematic is created using symbols for all of components and connections in the desired topology.
Simulations are executed to provide information on the expected channel response and a graphical
display is shown to use in analyzing the signal at the source and destination(s).

Chapter 2: Board data and Simulation Models

Stackup
In pre-layout simulations it is best to use the stackup that will be used for the PCB. Sometimes the final
stackup is not available and one a simulation stackup can be created using the dielectric constant (Er)
and the dielectric loss tangent (tanδ) of what will be the target dielectric. The Er only determines the
propagation delay so whatever its value only changes the timing of the wave. The tanδ is important
because it absorbs the high frequency portions of the waveform. This loss removes harmonics, causing
smoothing of the corners of the waveforms and decreasing the risetime, therefor delaying the time of
flight. A higher loss dielectric can be used to reduce the edge speed of a signal and reduce reflections as
long as it does not interfere with transceiver signals that may accidently be placed on the routing layers
for memory signals.

It is critical that, for microstrip simulations, the characteristics of the soldermask be known. Usually
soldermask materials have a really high tanδ that make life interesting for fast edges.

Stackup setup in HyperLynx


A stackup is needed to establish a baseline for the project. For this project the stackup data for the Arria
10 FPGA development kit (PCIe) was copied. Megrton6 was used for the board because of the
transceiver requirements. Be sure to use the material that will be used to implement your project.
Dielectric values are readily available from PCB manufacturers as well as from dielectric vendors.

If you have significant high-speed signal runs on the microstrip layer then be sure to have accurate data
on the soldermask layer.
After the circuit board layout is created, it can be imported into HyperLynx and the stackup will come
with it. It is therefore important to be sure that the stackup for the board has accurate data on the
actual board parameters for Dk and Df. (Er and Tanδ).

Simulation Models
For pre-layout simulations there are two options for obtaining IBIS files for use with HyperLynx. Models
can be obtained from the Altera website or created from a Quartus® II project. Two types of models can
be used, depending on where you are in the design of the FPGA.

The first is a set of IBIS models for all the IO capability available for the FPGA. These models have a
generic set of values for the package RLC values.

The second is a set of IBIS models with models specific to the implementation and accurate RLC values
for each pin. This file is very much smaller than the one you will get from the website.

Altera IBIS models from the website


An IBIS file with models of all the possible I/O standards and strengths can be found by following the link
below. There are multiple files for Arria 10 but the one that will have the models for DDR4 is Arria10.zip.

https://www.altera.com/support/support-resources/download/board-layout-test/ibis/ibs-
ibis_index.html

You will need to place the unzipped data from the file somewhere you can find easily from HyperLynx.
Creating the Altera IBIS File from a Quartus Project
To generate an IBIS file the project is to the point where:

 A system topology has been decided


 The pin-connection guidelines applied for the project
 A Quartus® II project with the desired I/O defined
o Often referred to as a “Golden Top” design
 With all of the I/O assigned to various pins
 And the pin characteristics assigned and saved in the .qsf file for the project
 Use the latest version of Quartus® II to insure that the latest IBIS models are
generated

From the Quartus® II project, under the compilation flow, select the “EDA Netlist Writer” section and
double click on “Edit Settings”:
Select the IBIS format, the latest IBIS version, enable the model selector, and the per pin RLC model

Where the selections for each arrow in this diagram are:


A. Format of the output file, one can select HSPICE or IBIS, for our us it is IBIS
B. IBIS Version: using 5.0 allows per pin RLC package model generation with mutual coupling
C. Enabling model selector creates an IBIS files with multiple models for what-if simulation
D. Per pin RLC package models deliver accurate package data

The resulting IBIS file, found in the directory for the Quartus project, will contain all of the models
needed for successful pre and post layout simulation.

Memory Vendor IBIS Files


IBIS simulation files are available from some memory vendors. These are usually generated from SPICE
models and correlated with the physical device characteristics. The characteristics for devices from
different vendors will be different.

EBD Simulation Files


EBD files are available from some memory vendors. These are wrapper files that are used for boards in
a system, such as DIMM modules.
Chapter 3: Setting up HyperLynx

Locating Model Directories


Once HyperLynx had been installed it is necessary to set up the project to point to the models being
used. The project can be created anywhere accessible by the computer being used. The IBIS and EBD
models for the devices to be used for the project can be anywhere also. What connects them together
is the assignments made in HyperLynx using Models -> Edit Model Library Paths... which gives the
opportunity to indicate where the model files for the project are located. It is a good idea to start with
as many models as you can get from whatever manufacturers you have for the project to avoid having
to go out and search for them in the middle of assembling a simulation topology.

Click the Edit button to assign model directories to the Model-library


Add folders and sub-folders where the models are located. The Add buttons open an explorer window
to let you choose the ones you want. Clicking the Generate Model Index button loads the list of models
into the project and returns to Set Directories.

Stackup Entry
Next, set up the stackup with the materials you will be using. The graphic, above is about the minimum.

The remainder of the pre-layout section of this document will use the above stackup.

Opening a new schematic


Now a schematic can be created. Clicking an icon on the task bar opens a new, free-form, schematic.

You can place IBIS drivers and receivers, transmission lines, terminators and other topology artifacts.
Examples will be shown for each topology as we go.
Chapter 4: Command/Address Simulation and Analysis
The Command/Address (C/A) system for DDR4 memories are SSTL signals. DDR4 differs from previous
DDRx families in one respect, the 2T timing option for the C/A signals. This means that the data rate can
be ½ of the SDR timing, allowing for a fairly long settling time when using this option.

C/A Required simulation sets


 Inter Symbol Interference (ISI) for Drive and Termination strengths
 Crosstalk for layout parallelism rules

Basic C/A ISI, channel drive and termination simulation


Setting up a sample C/A topology using the freeform schematic is easy.

C/A Drive and Termination


For these signals the topology usually has the FPGA on one end and more than one memory device
down the stream. The Fly-by topology is simple unless there are chips on both sides of the board in a
“Ping Pong” configuration because of concentrated additional capacitance.

New Schematic

1. Open a schematic window as shown above


2. Select File -> Save As and place the schematic in the simulation directory

Stackup entry

3. In the schematic, open the stackup and fill it out with the board’s
materials and dimensions. Be sure to use the dielectric tab to update the Loss Tangent. Then
click OK to save it

4. Since there will be more simulations coming up, use Setup -> Stackup -> Export to save the
stackup so you can import it into a new schematic and save having to re-enter stackup data

Creating the topology


1. Add components to the topology by clicking on the driver symbol on the ribbon

Add a symbol for each type of component in the topology being simulated

2. Double click on U1 and select the driver for the topology. NOTE: It is a good idea to open up the
IBIS file, arria10.ibs, and read the notes section be familiar with the notation used for driver
capabilities

3. Select the desired driver

4. The typical RLC values for all pins in this IBIS file are: 289milliohms, 2.16nH, and 1.43pf.
5. Repeat for U2 with the desired receiver

6. The RLC values for this pin are: 191.0milliohms 1.111nH 0.299pF, these values will be slightly
different from signal to signal
7. Set U1 up as an output
8. Click OK to return to the schematic where more accurate symbols have replaced the originals

9. Next, add transmission lines to the topology for the interconnect by selecting the transmission
line symbol

10. Place one on the schematic


11. Double click on the transmission line to edit its properties

12. Click on the Values tab then place the trace on the proper layer and adjust the linewidth get the
system near your target impedance

13. On clicking OK the dialog closes and the symbols can be copied and arranged to fill up the
topology, in this case an array of four memory chips. Connection are made by clicking on a
symbol pin and dragging to another pin
14. A termination needs to be added to the topology and wired in

15. Double click on the termination resistor and assign 50 for the value, then go to Setup -> Power
Supplies and change VpullUp to 0.6 to get the termination values correct

16. Double click on each transmission line segment and set the length value in the values tab to
those expected for the topology
Simulation lossy channel setup
DDR4 memory systems operate at high speed and the drivers have fast slew rates. It is important that
pre-layout simulations be ran with realistic material callout and lossy simulation enabled. Be sure that
the Enable Lossy Simulation icon is blue before continuing.

The simulations will be for a memory clock rate of 1066MHz.

Nominal single channel ISI simulation run


This fires simulation will be done using just the single set of parameters entered so far. To start with
either select Simulate SI -> Run Interactive Simulation or click on the single waveform icon on the
task bar. Here you will get an oscilloscope view with a lot of controls. The simulation that is need to see
if the channel will work OK is an eye diagram. Setting this up is a little convoluted. There are a lot of
choices on the right hand side of the Digital Oscilloscope. When you are setting up an Eye Diagram it is
very useful to first click on the Oscillator radio button and enter the frequency for the timing. For the
C/A system this would be ½ the clock frequency because the A/C system is single data rate.

Note that, under “IC modeling” Typical is selected. Altera recommends using this setting.
For a 1066 simulation this works out to using a 533.3333 frequency for the channel.

Then select the eye diagram option and the system should be set up for the simulation.

1. Select the devices you want to see the waveform for from the list. All of the inputs are of
interest just now so select U2 through U5 of the latest waveforms. We also want to see the
waveforms at the die instead of the pin because at these rates there is often a real difference

2. Run the simulation to see how things may look by clicking “Start Simulation” in the upper right
hand corner of the Digital Oscilloscope. Then zoom in on the resulting eye by clicking on the
zoom to extents button on the scope.

Adjust the sweep settings to something you are happy with.


The results look something like this

There seems to be some sort of problem here and it does not seem to be a reflection from the
end of the chain. The signal at U3 actually rings back to the threshold. Though it settles later
the eye opening will be compromised.
3. The only selections for the driver are 40 Ohms and 60 Ohms. We will now match the impedance
of the transmission lines to 40 Ohms, the termination to 39 Ohms, and see what the waveforms
look like
4. It looks like using A lower board impedance could be really good. There is a push to use lower
impedances for transceiver channels because matching to the BGA, vias and connector cutouts
is much easier when the board impedance is lower. It is a good idea to work with the whole
team when making these decisions.

Crosstalk Simulation Setup for C/A Channels for Parallelism


Now we will examine parallelism for the C/A channel. The victim trace will have four aggressors. This
topology can be created easily from one of the above topologies. The DDR4 parts were removed and
most of the T-lines.

Design File: Command_Address_prll.ffs <G:\ _ddr4_simulation\ Cmmd_Addr\ >


HyperLynx LineSim v9.2

U4.1706 TL4 R4

49.9 ohms 50.0 ohms


468.677 ps
Arria10 3.000 in
sstl12_rtpio_r40_lv Stackup
Net004 Net004

U2.1706
TL2 R2

49.9 ohms 50.0 ohms


468.677 ps VpullUp
Arria10 3.000 in 0.6V
sstl12_rtpio_r40_lv Stackup
Net002 Net002

U1.1706
TL1 R1

49.9 ohms 50.0 ohms


468.677 ps
Arria10 3.000 in
sstl12_rtpio_r40_lv Stackup
Net001 Net001
U3.1706 TL3 R3

49.9 ohms 50.0 ohms


468.677 ps
Arria10 3.000 in
sstl12_rtpio_r40_lv Stackup
Net003 Net003
U5.1706 TL5 R5

49.9 ohms 50.0 ohms


468.677 ps
Arria10 3.000 in
sstl12_rtpio_r40_lv Stackup
Net005 Net005

When the parts have been arrayed and connected the T-lines will be coupled together by doing the
following.
1. Double click on TL4 to open the Edit Transmission Line pop-up and then click on the Stackup
option under Coupled
2. The pop-up will change to the coupled line version. Click OK to start the new coupled region.

3. Next, double click on TL2 and click on the Coupled Stackup button then click OK. Do not select
“(New-Coupling)” as we are adding the transmission line to the existing coupling.
4. Repeat the above sequence, progressing down the array of T-lines to complete the topology.
You will observe dashed lines indicating the coupling between the T-lines.
Design File: Command_Address_prll.ffs <G:\ _ddr4_simulation\ Cmmd_Addr\ >
HyperLynx LineSim v9.2

U4.1706 TL4 R4

49.9 ohms 50.0 ohms


468.677 ps
Arria10 3.000 in
sstl12_rtpio_r40_lv Coupled Stackup
Net004 Net004

U2.1706
TL2 R2

49.8 ohms 50.0 ohms


468.677 ps VpullUp
Arria10 3.000 in 0.6V
sstl12_rtpio_r40_lv Coupled Stackup
Net002 Net002

U1.1706
TL1 R1

49.8 ohms 50.0 ohms


468.677 ps
Arria10 3.000 in
sstl12_rtpio_r40_lv Coupled Stackup
Net001 Net001
U3.1706 TL3 R3

49.8 ohms 50.0 ohms


468.677 ps
Arria10 3.000 in
sstl12_rtpio_r40_lv Coupled Stackup
Net003 Net003
U5.1706 TL5 R5

49.9 ohms 50.0 ohms


468.677 ps
Arria10 3.000 in
sstl12_rtpio_r40_lv Coupled Stackup
Net005 Net005
5. Double click on any T-line to observe the position it has in the array for simulation.

There is a lot of information about the array in this tab of the pop-up. Observe TL1 is selected and the
position of TL1 is shaded in the physical view.

1. Close the pop-up

Crosstalk simulation theory


Now we will analyze the coupling between the C/A signals. Before you enter into this effort, a decision
needs to be made concerning the goals of the process. The C/A section performs like a source
synchronous interface, everything happens at fairly close to the same time. Transitions of these signals
from high to low and low to high occur very close together in time. After this simultaneous switching
event it is very calm. The transitions get a little messy then the steady state of the eye damps out to a
very good opening. If you are designing a board for the PCIe form factor and, therefore, need to pack
the interface into a narrow channel on a crowded layer, then your needs are not going to be met for
someone designing a signal integrity demonstration board with almost infinite space. Many high
performance memory systems have been designed with a -20db crosstalk level for the C/A signals.
Signals outside of this grouping may deed more isolation due to the possibility of transient events after
the C/A signals should be settling. For this reason, the following investigation will be focused on a -20db
crosstalk limit.

C/A Parallelism Simulation and Analysis


In order to analyze the coupling between the aggressors and the victim in as short of time as possible
the strategy of using a sweep of parameters we can control. The parameters we can control have to do
with signal trace spacing and coupling length.

Here is how to set it up:

2. Select the Run Interactive Sweeps… icon or select Simulate SI -> Run Interactive Sweeps
to open the Sweep Manager.
3. Select Coupling regions and then the coupling setup where all the T-lines are

4. Highlight Length then select Add Range


5. In the pop-up Sweeping window select By list and enter values for the breakout and some other
distances up to the longest expected trace length. You can run many simulations easily but it
becomes a problem of managing the data.

6. Next the separation between the T-lines will need to be swept. Select the first Separation
between… and click Add Range
7. Enter in some spacing values and increment data, then click OK

8. Now select the separation just finished then click Copy Range

9. Then select the next Separation and click on Paste Range as a Lock to connect them together.
10. Repeat for the other separation selections

11. Now everything is set up except for the drivers. Close the Sweep Manager and set the driver for
U1 as Stuck High

12. The other drivers are just left as Output to switch together and provide the highest aggressor
interference

13. Reopen the Sweep Manager and click Run Sweeps. Here we will use the settings from the
termination run and not worry about the frequency of the clock because the slew rate of the
drivers is the determining factor for coupling. Set up the simulation for eye diagram to get some
randomization of the coupling, select the termination resistor of the victim driver for analysis
and click Start Sweeps.
14. After the sweeps finish there should be an output that looks like this.

15. Now to analyze the data:


o Use the Save/Load button to export the data to .csv file for analysis

16. Open the .csv file in a spreadsheet to enable analysis. This will be a huge spreadsheet but there
is room at the top to calculate the crosstalk in db for the data. This is created by taking the
maximum minus the minimum voltage for each column, dividing this by 1.2 and then calculating
20*log() of that value. It is up to the engineer to select the parallelism value for the interface.
Many systems have been created using the value for -20db with good success.
Chapter 5: Clock Simulation and Analysis
Clock simulation is very much like the C/A analysis except it is differential and very sensitive to Duty
Cycle Distortion (DCD). Setting up the simulation is much complicated by the need to start with coupled
differential signals with balanced length and using differential IBIS models. To this point we have the
tools for most of this and the drivers are just named differently as disclosed in the IBIS file header.
Because the 40 Ohm drivers worked well with 40 Ohm transmission lines it will be a good place to start.

Basic Clock Schematic

To place a differential IBIS part, select the differential driver symbol . Then assign a differential
driver to the symbol. When you add the differential model, the opposite polarity model is assigned to
the other output.

Once you have the Arria 10 model assigned, repeat the process for the memory devices by adding a

differential symbol from the ribbon ( ) and assigning the proper model from the memory IBIS file.
Once you have one instantiated, copy it for the remainder of the memory parts so you do not have to
repeat the assignment.

Place T-lines, adjust their lengths and couple the differential segments according to the board stackup
plan. Be careful to give each segment its own coupling region.

Then add the termination to the topology.


The finished schematic could look something like this:
Design File: Command_Address_basic_40.ffs <G:\_ddr4_simulation\Cmmd_Addr\>
HyperLynx LineSim v9.2

U2 U3 U4 U5
F7 F7 F7 F7

F8 F8 F8 F8
MT40L512M8HX MT40L512M8HX MT40L512M8HX MT40L512M8HX
VpullUp
CK_t CK_t CK_t CK_t
0.6V
Net001 Net001 Net001 Net001
Net002 Net002 Net002 Net002
TL1 TL3 TL5 TL7 TL9 R1

U1
699 40.1 ohms 40.1 ohms 40.1 ohms 40.1 ohms 40.1 ohms 39.0 ohms
312.451 ps 62.490 ps 62.490 ps 62.490 ps 62.490 ps
2.000 in 0.400 in 0.400 in 0.400 in 0.400 in
Coupled Stackup Coupled Stackup Coupled Stackup Coupled Stackup Coupled Stackup
700 Net001 Net001 Net001 Net001 Net001
Arria10 TL2 TL4 TL6 TL8 TL10 R2
dsstl12_rtiop_r40c...
Net001
Net002

40.1 ohms 40.1 ohms 40.1 ohms 40.1 ohms 40.1 ohms 39.0 ohms
312.451 ps 62.490 ps 62.490 ps 62.490 ps 62.490 ps
2.000 in 0.400 in 0.400 in 0.400 in 0.400 in
Coupled Stackup Coupled Stackup Coupled Stackup Coupled Stackup Coupled Stackup
Net002 Net002 Net002 Net002 Net002

Pre-Layout Clock Simulation


Since the clock is an oscillation the simulation setup is a little different.

1. Just open the interactive simulation window .


2. Set up the simulation as Standard, Global, Oscillator, and 1066.667 MHz

3. Run the simulation to see the waveform. If there are not at least 10 cycles of the signal, then
increase the horizontal scale to obtain many cycles so that the DC balance is good. Then reduce
the horizontal scale and use the position bar on the display to scroll to the right end to observe
the waveforms for the clock.

4. There are a couple of things that are of interest in these waveforms. One of them is the
overshoot difference, especially with reference to the signal U2. The other is duty cycle
distortion (DCD). The DDR4 JEDEC specification indicates that DCD is constrained to fairly small
values. Turn off all the waveforms except U2 and analyze the zero crossing time for each half
cycle.
This turns out to be 468.6ps for the positive side and 467.4 for the negative side. This is not a
big difference. The DCD is really low, so we are probably good to go with a 40 Ohm driver, 40
Ohm board impedance and 39 Ohm termination.
5. Other simulations can be created with different impedance driver/board/termination values to
see what happens. This effort is and optional exercise for the designer to see what seems best.

Pre layout clock crosstalk


1. First, set up the topology using the same widths as were used for the single line clock
simulation. Start with all of the T-lines uncoupled, as above, and go down through the
schematic, coupling the lines

2. Configure the drivers by double clicking on any driver


o The positive side of U1 to Stuck High
o The positive sides of U2 and U4 to Output
o The positive sides of U3 and U5 to Output Inverted to cause the most coupling

3. Set up the simulation for crosstalk by opening the interactive sweeps window
4. Add ranges for the length and couple separation, use the “Copy Range” and “Paste Range as a
Lock” buttons when doing more than one separation in sync. This limits the number of
simulations that need to be run
5. Click on Run Sweeps and set the oscilloscope up for Standard operation, Global Stimulus,
1066.666MHz Oscillator at 50% duty cycle, then double click on <Insert diff probe>

6. Select the two resistors for the victim channel, in the case of the schematic, above, these are R1
and R2 then click OK
7. The only signal that needs to be selected for the simulation is the one just added, the
differential probe

8. Select Start Sweeps and zoom in on the results, notice that there seems to be very little coupling
when starting with the nominal, 8mils, differential spacing

9. Click on Save/Load and save the results to a .csv file


10. Open the .csv file in a spreadsheet and analyze the crosstalk amounts
o Use the same sort of formulas as shown for the C/A system analysis, above
o Note that the maximum coupling is not very much, it is always less than -84db
o This indicates that 8mils is probably going to be OK 84db is quite a lot of isolation

Chapter 6: DQ/DQS Channel Simulation and Analysis


The drive standard for the DQ/DQS system for DDR4 is specified to be Pseudo Open Drain (POD). In POD
the signal lines are terminated by an impedance to Vcc and not a synthesized Vtt for the system. The
reason for this is the reduction of simultaneous switching noise for the system. The impact that this
standard has on the system is profound. With no Vtt constantly establishing a nominal threshold for the
eye diagram things get a little fuzzy when it comes to determining the best driver/board/termination
values for a particular system.

Termination
The nominal impedance for the DDR4 has been set to 40 Ohms during the C/A simulations. A DQ
topology, below, was created to determine what the optimal impedance would be for the drivers and
receivers.

Initial setup
Create a single ended topology with the same T-line as the CA system with a single ended driver on each

end using the single line model placement on the top ribbon.

Selecting the driver for the Arria 10 is fairly straight forward. Double click on the symbol for U1.1 and
select a POD symbol with 40 ohm drive and termination
and click OK.

Repeat the process for U2 by selecting DQ0 from the device.

Nominal Topology to the Memory Device

The memory device IBIS file was created to use sub-models and selecting them is done using the model
selector. The user should also be familiar with how the IBIS file provider meant for the models to be
used. The Micron models provide drive termination separate from receive termination to keep the
models clean so the simulation requires changing the sub-model and not just changing it from input to
output.
1. Double click on U2 to bring up the Assign Models pop-up then click on Select to bring up the
Select IC Model pop-up

2. Click on the Model Selector to discover the options

3. Select the model needed

then click OK up through the pop-ups.

Setting up and analyzing ISI termination simulations for DQ signals

Open up the interactive sweep manager by clicking on the icon - - or using the dropdown:
Simulate SI -> Run Interactive Sweeps

1. FPGA Drive strengths of 34, 40, 48, and 60 Ohms and terminations of 34, 40, 48, and 60 Ohms
were selected for the memories. The impedance of the T-line is still 40. Select Run Sweeps to
get to the Oscilloscope window.
2. Set the simulation up for an eye diagram at 1066.667MHz and select U2 for analysis and click
“Start Sweeps” to get the job done.

3. When the simulation is complete, open the probes pop-up by clicking on the + indicated below

4. You can expand the pop-up width to see the characteristics of the drivers and receivers, and
make changes to the color of each to compare the waveforms
5. The best eye is given when the driver and termination match the T-line impedance

6. If you use imbalanced drivers and terminations to achieve an eye with more vertical height it
will not help in the long run. One must be careful not to introduce excessive overshoot or
deterministic jitter around the crossing point.
Nominal Topology from the Memory Device
Design File: isi_pod.ffs <C:\ _AN_Docs\ DDR4_sim_guide\ >
HyperLynx LineSim v9.2

U1.46 TL1 U2.C2

49.4 ohms
468.677 ps
10AS048E5F29I3S... 3.000 in MT40L512M8HX
pod12_rtpin_g48c Coupled Stackup DQ0
Net001 Net001 Net001

Running a set of sweeps as was done above for the transmit eye the following waveforms were
observed at the die.
The same simulation created with the observation point at the pin of the FPGA looks like this.

This illustrates the apparent RLC interference with the signal when observed with an oscilloscope. This
simulation result should be provided in a signal integrity report for the interface so the engineers in the
lab will not be confused with the measurements they may make.
The best waveform set for the receive system was one that used 34 Ohm for the ODT and OCT values as
seen here.

Transmit Eye Mask Opening


The eye mask for the DQ/DQS systems in DDR4 is determined differently from previous generations of
DDR. For the eye opening height the number to use is Vcent_dq +/- the threshold value determined by
the data rate of the interface.
For data rates up to 2133Mbps the value is +/- 68mv. For data at 2400Mbps it is +/-65mv and above
that is +/-62.5mv.

The training that must take place during initialization for any DDR4 implementation establishes the best
level for the internal threshold voltage for the entire chip. As a result of this training the best possible
eye should resul0t.

Receive Eye Mask Opening


The process used for determining the best reverence voltage on the FPGA side should be identical to the
one used to set the DRAM threshold. The eye opening height is +/-70mv from that value.

Crosstalk Constraint Simulations


Crosstalk is the coupling between two or more conductors running close to each other. Some teams use
the term “Simultaneous Switching Noise” (SSN) for this but here we will use crosstalk to differentiate it
from the effects on the signal due to power supply collapse normally indicated by SSN. The amount of
crosstalk is determined by several factors. On a printed circuit board with controlled impedance traces,
the coupling variables are the spacing between the traces and the length of the coupling. There will be
more than one set of constraints, one for microstrip traces and another for the stripline traces.

There may be more than one layer of stripline traces in the case where a dual stripline stackup is
involved. If the thickness and of the board is not as much an issue as the isolation between signals then
single stripline an excellent solution. In the case of a PCIe form factor board that can only be 0.062”
thick, then dual stripline will be necessary.

Example of stripline crosstalk simulation for parallelism rules


Before the board is laid out a set of parallelism rules must be established. These rules are established to
guarantee that the signal will not be distorted beyond certain limits by nearby signals. Below is a
description of how to use HyperLynx to determine parallelism spacing rules.
A crosstalk topology
The topology is created on one layer of a dual-stripline layer with one victim (U3->U8) and four
aggressors. The simulation is run with the output of U3 stuck high and all of the aggressors toggling at
once and in phase to yield the worst-case crosstalk. These should be executed using the drive strength
and termination found above for eye optimization.

This simulation is for a stripline layer. The simulation topology for a microstrip layer is similar.
Design File: pod_couple_4_40_7.ffs <C:\ _AN_Docs\ DDR4_sim_guide\ >
HyperLynx LineSim v9.2

U1.1328 TL1 U6.C2

49.3 ohms
468.677 ps
Arria10 3.000 in MT40L512M8HX
pod12_rtnio_g40c... Coupled Stackup DQ0
Net001 Net001 Net001
U2.1344 U7.C2
TL2

49.3 ohms
468.677 ps
Arria10 3.000 in MT40L512M8HX
pod12_rtnio_g48c... Coupled Stackup DQ0
Net002 Net002 Net002
U3.1328 U8.C2
TL3

49.3 ohms
468.677 ps
Arria10 3.000 in MT40L512M8HX
pod12_rtnio_g40c... Coupled Stackup DQ0
Net003 Net003 Net003
U4.1328 U9.C2
TL4

49.3 ohms
468.677 ps
Arria10 3.000 in MT40L512M8HX
pod12_rtnio_g40c... Coupled Stackup DQ0
Net004 Net004 Net004
U5.1328 U10.C2
TL5

49.3 ohms
468.677 ps
Arria10 3.000 in MT40L512M8HX
pod12_rtnio_g40c... Coupled Stackup DQ0
Net005 Net005 Net005

Setting up the simulations


Newer versions of HyperLynx can run interactive sweeps of various properties in the topology.

Open up the interactive sweep manager by clicking on the icon - - or using the dropdown:
Simulate SI -> Run Interactive Sweeps

For this system we will sweep:

 The length of the T-lines from 1 inch to 6 inches in 0.5 inch increments
 The spacing between the T-lines from 4mils to 10mils
The simulation was run at 1066.666MHz for this system. The results for a faster or slower clock would
not matter much as the coupling is related to the risetime of the signal and not the repetition rate.

You may need to adjust the sweep parameters to suit the needs of your system.

Running the Simulations


Clicking on the “Run Sweeps” button takes you to the Digital Oscilloscope where the simulations are
executed. After the eye diagram simulation is set up and the latest sweeps are all checked, turn off the
display of all the traces except the input to the victim signal, in this case U8.
The results of this simulation are shown here.

This looks very messy for a good reason. The beginning spacing of the lines for some cases is very close
and the coupling is fairly high for those cases, also the propagation time for each length is different.

Next, export the waveform to a comma separated variable (.csv) file using the “Save/Load” button.

When the .csv file is opened with a spreadsheet program, formulas can be written to extract the peak to
peak noise contribution for each set of spacing and lengths. The amount of crosstalk that can be
tolerated should be determined by the system architect. The crosstalk will have an effect on the eye
opening so a simulation should be created for the topology under consideration to determine the
sensitivity of the channel. A simulation, below, will show the effects of the crosstalk.

Analysis of crosstalk for parallelism rules


When the “Saved” file is opened in a spreadsheet program the coupling values can be readily
determined. The data is used by finding the minimum spacing for given coupling lengths for a given
amount of crosstalk. The crosstalk in decibels can be calculated by finding the peak to peak maximum
for each case and using the nominal voltage swing for the POD 1.2V interface for the considered drivers.
Each designer has an idea about what a tolerable amount of coupling is. For this instance a 40 Ohm
driver was used and the nominal eye height was determined to be 950mv.
DQ/DQS Simulation for ISI and Crosstalk Effects
Above the amount of crosstalk for a given parallelism was determined. If the engineer has decided that
a -20db effect is sufficient, in the spreadsheet, above, the spacing was 7mils. It remains to see how that
interferes with the system. First the disturbance caused by successive symbols on the eye diagram (ISI)
is analyzed and then the effects of crosstalk added to see if the resulting eye diagram satisfies the
system design goals. For more data on crosstalk please see the documentation at
http://www.alterawiki.com/wiki/Arria_10_EMIF_Simulation_Guidance

ISI
Inter Symbol Interference (ISI) is the interference between successive signals on a channel. Given a
good driver, a channel with smooth impedance, and a termination that matches the transmission line,
the signal should show no ISI. In the real world there are many things in the channel that are not
matched to the board characteristic impedance. In addition the termination resistor value being off can
cause a huge amount of interference. . For more data on ISI distortion please see the documentation at
http://www.alterawiki.com/wiki/Arria_10_EMIF_Simulation_Guidance

Stackup and Topology Setup in HyperLynx


The stackup and topology are the same as those for crosstalk, above.

Termination variation simulation


Here we vary the ODT termination value between 40 Ohms, 48 Ohms and 60 Ohms with a 48 Ohm
driver. Only the topology for U3 to U8 is used and the line spacing set to 20mils to eliminate crosstalk
effects.
Here, when examined closely, it is evident that the 40 Ohm ODT receiver has the least amount of jitter
at the crossing.
Because DDR4 has a calibration mode that must be used to set the threshold value for the individual
part, this should be the best possible eye.

Other topologies can be simulated using these techniques. Please refer to the last section of this
document titled “Further Reading” for references to additional material.

Channel Simulation for crosstalk effects


When we add the -20db spacing value determined from the crosstalk simulation above (7 mils) and run
the simulations for ISI and ISI with crosstalk the signal looks fairly good. See the waveforms below.

The blue signal is ISI with aggressors quiet, the red signal is with aggressors running for a 3 inch
topology. Aggressors for this are out of phase with the victim.
This plot is the same topology with the memory devices driving the circuit the difference here is the
impedance of the FPGA package.

Channel Simulation for SSN effects


Simultaneous switching noise is defined here as the effects on the power supply of many signals
switching together and the effect of this on the channel. This sort of simulation is not possible in
HyperLynx LineSim.

Chapter 7: Further Investigations

Post Layout Simulation


In order to determine the overall usefulness of the layout a post-layout simulation of the channel should
be executed.

It is important to use an IBIS model for the FPGA that is generated from a Quartus® II project that is
created for the system. The IBIS model created this way will have correct values for the package
parasitic values.

Further Reading
Altera has a wealth of information on developing and simulating DDR devices. The address and
command signals for DDR4 are similar to those for DDR3 except for the voltage level of the interface.
EMIF Guidelines for Layout and Timing
http://www.altera.com/literature/hb/external-memory/emi_plan.pdf

Simulating for Timing Closure


The Altera wiki sight has a lot of information on this subject. Please refer to the following
documents for detailed information on this subject.

The following links have a lot of information on timing parameters.

http://www.alterawiki.com/wiki/Arria_10_EMIF_Simulation_Guidance

http://www.alterawiki.com/wiki/Measuring_Channel_Signal_Integrity

A definitive document on DDR4 voltages and timing is available from Micron Semiconductor.
Understanding the POD training is very valuable to the designer. A full PDF of the datasheet has all the
data in one place. Search on their website for “4Gb_DDR4_SDRAM” for a copy.
Check these things
Trace impedance plays an important role in the signal integrity. Users should perform board level
simulation to determine the best characteristic impedance for their PCB. For example, it is possible that
for multi rank systems 40 ohm would yield better result than the traditional 50 ohm characteristic
impedance.

To minimize PCB layer propagation variance, Altera recommend that you route signals from the same
net group on the same layer.

1. Use 45° angles (not 90° corners).


2. Disallow critical signals across split planes.
3. Route over appropriate VCC and GND planes.
4. Keep signal routing layers close to GND and power planes.
5. Avoid routing memory signals closer than 0.025 inch (0.635 mm) to memory clocks.
6. Match the (package + board) trace delays up to 20 ps of skew for DQ/DQS/DM signals within a
DQS group.
7. Details on how to do package de-skew is available in EMIF HB vol2 chapter 4.



8. All the address, command and control signals should match up to +/- 20 ps compare to the
mem_clk trace.
o For example if the mem_clk trace delay is 500 ps then the allowed range for any
address/command/control signal is 480 ps to 520 ps.
o For discrete components; make sure above recommendation is met for each component
in the fly-by chain.
o For DIMMs: For single or multiple DIMM configurations make sure this guideline is met
at each DIMM connector.

The timing between the DQS and clock signals on each device calibrates to meet tDQSS.

9. Make sure that DQS arrives after clock :


o (CKi ) – DQSi > 0; 0 < i < number of components – 1
10. Total skew of CLK and DQS signal between groups is less than one clock cycle:
o (CKi + DQSi) max – (CKi + DQSi) min < 1 × tCK

If you are using a DIMM topology, your delay and skew must take into consideration values for the
actual DIMM.

You might also like