Altera
Altera
Performance
Agenda
2
Comparison Between DDR3 and DDR4
3
DRAM Technology Comparison
DDR3 DDR4 GDDR5
Voltage 1.5 V / 1.35 V 1.2 V 1.5 V / 1.35 V
Free-running differential
Strobe Bi-directional differential Bi-directional differential
WRITE clock
Strobe Configuration Per byte Per byte Per word
READ Data Capture Strobe based Strobe based Clock data recovery
Data Termination VDDQ/2 VDDQ VDDQ
Address/Command
VDDQ/2 VDDQ/2 VDDQ
Termination
Burst Length BC4, 8 BC4, 8 8
Bank Grouping No 4 4
No Command / address parity
On-Chip Error Detection
CRC for data bus CRC for data bus
Configuration x4, x8, x16 x4, x8, x16 x16, x32
Package 78-ball / 96-ball FBGA 78-ball / 96-ball FBGA 170-ball FBGA
Data Rate (Mbps/Pin) 800 – 2,133 1,600 – 3,200+ 4,000 – 7,000
Component Density 1 GB – 8 GB 2 GB – 16 GB 512 MB – 2 GB
Up to 8H (128-GB stack);
Stacking Options DDP, QDP No
single load
4
DDR4 Power Savings
5
DDR4 Power Savings Features
6
Creating a Data Valid Window
7
Timing Margins Are Shrinking
2,500
Package/
Data Valid DRAM Chip
Board
Window Margin Margin
Margin
DDR1 2,500 900 800 800
DDR2 938 425 256 256
DDR3 469 188 140 140
DDR4 313 125 93 93
938
469
313
8
Shrinking the Window Even More:
DDR4 VREF Training (1/2)
10
Shrinking the Window Even More:
DDR4 VREF Training (2/2)
11
Shrinking the Window Even More:
Duty Cycle Error
Clock Timing
Average High Pulse Width tCH (avg) 0.48 0.52 0.48 0.52 0.48 0.52 tCK (avg)
Average Low Pulse Width tCL (avg) 0.48 0.52 0.48 0.52 0.48 0.52 tCK (avg)
12
Shrinking the Window Even More:
Calculating the PLL Jitter
Current Profile : I(f) PDN Impedance : Z(f) Jitter Sensitivity : S(f) PSRR of PLL: P(f)
f f f f
iFFT
p-p jitter
f t
I ( f ) × Z ( f ) × S ( f ) × P ( f ) = J ( f ) iFFT
→ jTIE (t )
13
DDR4 Bank Group Timing
Different timing within a group and between groups (tCCD, tWTR, tRRD)
− “Long” timing: bank-to-bank within a group
− “Short” timing: access to different bank groups
Maintain array timing requirements within bank group
Maintain speed between different bank groups
Bank 2 Bank 3
Short Timings
Long Timings
Bank Group 1
14
Calibration Is Critical to Shrinking Margins
0.5
FPGA Effects
0.4 External Calibration Calibration
Effects Effects Uncertainty
0.3
Margin (ns)
0.2
-0.1
15
What is Calibration?
VT Compensation
Voltage and
Data shifts temperature
due to VT
variations
tracking
16
High-Level Output Topology
CLK
X+90 phase
X phase
DQ OUT1 Delay DQ OUT2 Delay DQ
Calibration knobs
− DQ-out1 and DQ-out2 delay : Control the delay applied to outgoing DQ
pins
− DQS-out1 and DQS-out2 delay : Control the delay applied to outgoing DQS
pins
− Write leveling output : Changes the delay on both DQ and DQS relative to
the memory clock-in phase taps
17
High-Level Input Topology
dqs_en ptap
vfifo control control DQS en dtap
control
DQS
Enable DQS IN Delay DQS Delay Chain
DDIOin
DQS in dtap
LFIFO control
DQ
DQ IN Delay
Lfifo control
DQ in dtap
18
Calibration Stages
Start
DQS-enable calibration
− Calibrate DQS enable (delayed read data valid) relative to DQS
Wait for PLL/DLL locking
Post-amble tracking
Initialize INST/AC ROM
− Track DQS-enable across temperature variation for all pins on this
Mem Interface
Read data deskew
Initialize the memory
− Calibrate DQS relative to read command (read leveling) (Mode Registers etc.)
− Calibrate DQ versus DQS (per-bit deskew) for reads Calibration loop
LFIFO training
Calibrate
the Mem Interface
19
Calibration Is Critical to Shrinking Margins
0.5
FPGA Effects
0.4 External Calibration Calibration
Effects Effects Uncertainty
0.3
Margin (ns)
0.2
-0.1
20
Good Layout Practices for DDR4
21
DDR4 Output Driver
Overshoot
VDD
VSS
Undershoot
Jitter
Overshoot
VIHac VIHdc
Hi-Ringback
Lo-Ringback Vref
VILdc
VILac
Undershoot
25
OCT Calibration Scheme to Support DDR4
OCT can calibrate 2 times with 2 sets of pins (DQ/CA)
DQ and CA pins will have 2 different sets of codes in DDR4
DDR4 DDR3
26
General Layout Concerns
SSO
− Timing and noise issues generated due to rapid changes in voltage and
current caused by multiple circuits switching simultaneously in the same
direction
Problems caused by SSO
− False triggers due to power/ground bounce
− Reduced timing margin due to SSO induced skew
− Reduced voltage margin due to power/ground noise
− Slew rate variation
VREF noise
− Induces strobe to data skews and reduces voltage margins
− Power/ground plane noise
− Crosstalk
ISI
− Occurs when data is random
Clocks do not have ISI
− Multiple bits on the bus at the same time
Bus cannot settle from bit #1 before bit #2, etc.
− Signal edges jitter due to previous bit’s energy still on the bus
− Ringing due to impedance mismatches
− Low pass structures can cause ISI
Minimize ISI
− Optimize layout
− Keep board/DIMM impedances matched
Drive impedance should be same as Zo of transmission line
− Terminate nets
Termination values should be the same as Zo of transmission line
− Select high-quality connector
Matched to board/DIMM impedance
Low mutual coupling
Crosstalk
− Coupling on board, package, and connector from other signals, including
RPDs
Inductive coupling is typically stronger than capacitive coupling
− When aggressors fire at the same time as victim (e.g. data-to-data coupling)
Victim edge speeds up or slows down, causing jitter
− When aggressors do not fire at the same time as victim (e.g. data-to-
command/address coupling)
Noise couples onto victim at time of aggressor switching
Minimize crosstalk
− Keep bits that switch on same “clock” edge routed together
Route data bits next to other data bits; never next to CMD/ADDR bits
− Isolate sensitive bits (strobes)
If need be, route next to signals that rarely switch
− Separate traces by at least two to three {preferred} conductor widths
(more accurately, one would define by trace pitch and height above
reference plane)
Example: 5-mil trace located 5 mils from a reference plane should have a 15-mil gap
to its nearest neighbors to minimize crosstalk
− Choose a high-quality connector
− Run traces as stripline (as opposed to microstrip)
Not at the cost of additional vias
− Maintain good references for signals and their return paths
− Avoid RPDs
− Keep driver, BD Zo, and ODT selections well matched
Cin mismatch
− Differing input capacitances on receiver pins
− Adds skew to input timings
RTT mismatch
− Termination resistors not at nominal value
− Internal ODT on data pins have smaller variation than on DDR2
They are calibrated (so is DRAM’s Ron)
− External termination resistor variation must be accounted for
Consider one-percent resistors
Ground Plane
40
TimeQuest DDR Timing: Read Capture
41
EMIF Debug Toolkit Features
42
TimeQuest-Like GUI interface
Reports section
Tasks section
Commands run
Shown in console
43
“On-Chip” EMIF Debug Toolkit
44
Looking Ahead and Conclusion
45
Will There Be a DDR5?
Very unlikely
− SI for a parallel bus of 2 GHz and above would be very difficult
− Timing budget would be consumed in the package
PDN noise
Package skew
46
Conclusion
47
Thank You