Ug 086
Ug 086
Solutions
User Guide
R
R
Xilinx is disclosing this Document and Intellectual Property (hereinafter “the Design”) to you for use in the development of designs to operate
on, or interface with Xilinx FPGAs. Except as stated herein, none of the Design may be copied, reproduced, distributed, republished,
downloaded, displayed, posted, or transmitted in any form or by any means including, but not limited to, electronic, mechanical,
photocopying, recording, or otherwise, without the prior written consent of Xilinx. Any unauthorized use of the Design may violate copyright
laws, trademark laws, the laws of privacy and publicity, and communications regulations and statutes.
Xilinx does not assume any liability arising out of the application or use of the Design; nor does Xilinx convey any license under its patents,
copyrights, or any rights of others. You are responsible for obtaining any rights you may require for your use or implementation of the Design.
Xilinx reserves the right to make changes, at any time, to the Design as deemed desirable in the sole discretion of Xilinx. Xilinx assumes no
obligation to correct any errors contained herein or to advise you of any correction if such be made. Xilinx will not assume any liability for the
accuracy or correctness of any engineering or technical support or assistance provided to you in connection with the Design.
THE DESIGN IS PROVIDED “AS IS” WITH ALL FAULTS, AND THE ENTIRE RISK AS TO ITS FUNCTION AND IMPLEMENTATION IS
WITH YOU. YOU ACKNOWLEDGE AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN INFORMATION OR
ADVICE, WHETHER GIVEN BY XILINX, OR ITS AGENTS OR EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER
EXPRESS, IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE, TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS.
IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT, EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES,
INCLUDING ANY LOST DATA AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE DESIGN, EVEN IF YOU
HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION
WITH YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE, WILL IN NO EVENT EXCEED THE
AMOUNT OF FEES PAID BY YOU TO XILINX HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF
ANY, REFLECT THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND THAT XILINX WOULD NOT MAKE AVAILABLE
THE DESIGN TO YOU WITHOUT THESE LIMITATIONS OF LIABILITY.
The Design is not designed or intended for use in the development of on-line control equipment in hazardous environments requiring fail-
safe controls, such as in the operation of nuclear facilities, aircraft navigation or communications systems, air traffic control, life support, or
weapons systems (“High-Risk Applications”). Xilinx specifically disclaims any express or implied warranties of fitness for such High-Risk
Applications. You represent that use of the Design in such High-Risk Applications is fully at your risk.
© 2004–2010 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx,
Inc. PowerPC is a trademark of IBM Corp. and is licensed for use. All other trademarks are the property of their respective owners.
Revision History
The following table shows the revision history for this document.
Memory Interface Solutions User Guide www.xilinx.com UG086 (v3.6) September 21, 2010
Date Version Revision
04/30/07 1.7.2 MIG 1.72 release. Added support for Spartan-3AN FPGAs.
07/05/07 1.7.3 MIG 1.73 release. Added support for Spartan-3A DSP FPGAs. Corrected minor
typographical errors.
09/18/07 2.0 MAJOR REVISION. MIG Wizard guide added to Chapter 1. Design Frequency
Range and Hardware Tested Configuration tables added in most chapters. Diagram
and table updates throughout.
01/09/08 2.1 MIG 2.1 release. Revisions and added material throughout, including new Chapter
12, Appendix B, and Appendix D.
03/03/08 2.2 MIG 2.2 release. Added Qimonda support. Updated screen captures in Chapter 1.
Added Spartan-3E FPGA support to Chapters 7 and 8. Added footnote to Table 9-1,
page 356. Added “Timing Analysis” in Appendix A. Added information on loading
of address, command, and control signals to “Pin Assignments” in Appendix A.
Replaced Appendix D.
09/19/08 2.3 MIG 2.3 release. Added Chapter 12, “Implementing DDRII SRAM Controllers.”
Updated screen captures in Chapter 1. Added Appendix E, “Debug Port.” Added
support for Virtex®-5 TXT devices. Minor updates throughout.
10/02/08 2.3.1 MIG 2.3 release. Added Appendix G, “Low Power Options.” Additional
miscellaneous typographical edits throughout.
04/24/09 3.0 MIG 3.0 release. Updated screen captures in Chapter 1, “Using MIG.” Added
Chapter 13, “Simulating MIG Designs.” Updated Appendix B, “Pinout-Related
UCF Constraints for Virtex-5 FPGA DDR2 SDRAMs.” Added Appendix D, “SSO for
Spartan FPGA Designs.” Updated content throughout.
06/24/09 3.1 MIG 3.1 release. Updated description and screen captures in Chapter 1, “Using
MIG.” Updated Chapter 13, “Simulating MIG Designs,” and Appendix A,
“Memory Implementation Guidelines.” Minor updates throughout.
09/16/09 3.2 MIG 3.2 release. Updated screen captures in Chapter 1, “Using MIG.” Combined
Verify UCF and Update UCF sections into “Verify UCF and Update Design and
UCF” in Chapter 1. Updated Table 6-2, page 254. Added Table 10-8, page 420.
Replaced “Signals of Interest” section with “Debugging Calibration Failures,” page
538. Updated “Enabling the Debug Port,” page 571. Minor updates throughout.
12/02/09 3.3 MIG 3.3 release. Fixed Pin Out feature description, and updated screen captures in
Chapter 1, “Using MIG.” Removed references to sim.exe in Chapter 13,
“Simulating MIG Designs.” Added Appendix F, “Analyzing MIG Designs in the
ChipScope Analyzer with CDC.” Minor updates throughout.
04/19/10 3.4 MIG 3.4 release. Updated screen captures, removed “Using MIG in Batch Mode,”
and updated “Implementing MIG Designs in ISE GUI Mode” in Chapter 1. Updated
Figure 8-8.
07/23/10 3.5 MIG 3.5 release. Removed wdf_almost_full signal from Figure 2-13. Updated
Table 6-13. Updated “Simulating the RLDRAM II Design” in Chapter 6. Added
“Changing the Refresh Rate” in Chapter 9. Added “Burst Length of Two Design
without FIFO Interface” in Chapter 10. Updated “User Interface Accesses,” “Write
Interface,” and “Read Interface” in Chapter 10. Added “Changing the Refresh Rate”
in Chapter 11. Updated “Design Notes” in Chapter 13. Updated “Memory-Specific
Guidelines” in Appendix A.
UG086 (v3.6) September 21, 2010 www.xilinx.com Memory Interface Solutions User Guide
Date Version Revision
09/21/10 3.6 MIG 3.6 release. Updated ISE® Design Suite version to 12.3 throughout. Updated
“I/O Standards,” page 553.
Memory Interface Solutions User Guide www.xilinx.com UG086 (v3.6) September 21, 2010
Table of Contents
SECTION I: INTRODUCTION
Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Idelay_ctrl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
top_phy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
IODELAY Performance Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Multicontrollers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
DCI Cascading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
CQ/CQ_n Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
Pinout Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Test Bench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
QDRII SRAM Initialization and Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Clocking Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
Global Clock Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
QDRII Controller Interface Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
User Interface Accesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
415
User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
416
Write Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
418
Read Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
421
QDRII SRAM Signal Allocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Supported Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Simulating the QDRII SRAM Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Hardware Tested Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
Memory Implementation Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
Calculate WASSO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
Run SI Simulation Using IBIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
Verifying Design Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
Behavioral Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
Verify Modifications to MIG Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
Changing the Pinout Provided in the Output UCF. . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
Changing Design Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
Verify Successful Placement and Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Verify IDELAYCTRL Instantiation for Virtex-4 and Virtex-5 FPGA Designs . . . . . 527
Virtex-5 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Virtex-4 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Verify TRACE Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Debugging the Spartan-3 FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Read Data Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Verify Placement and Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
DQ Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
DQS Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
Debugging Physical Layer in Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
Loopback Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
Incorrect DQS Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
Proceed to General Board-Level Debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
Debugging the Virtex-4 FPGA Direct-Clocking Design . . . . . . . . . . . . . . . . . . . . . 534
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
Read Data Capture Timing Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
Signals of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
Proceed to General Board-Level Debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
Debugging the Virtex-4 FPGA SerDes Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
Read Data Capture Timing Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
Signals of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
Proceed to General Board-Level Debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Debugging the Virtex-5 FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Verify Placement and Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Debugging Calibration Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Physical Layer Debug Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Proceed to General Board-Level Debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
General Board-Level Debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
Isolating Bit Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
Board Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Supply Voltage Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
Synthesizable Testbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
Varying Read Capture Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
Preface
Guide Contents
This manual contains the following chapters:
• Section I: “Introduction”
• Chapter 1, “Using MIG,” shows how to install and use the MIG design tool.
• Section II: “Virtex-4 FPGA to Memory Interfaces”
• Chapter 2, “Implementing DDR SDRAM Controllers,” describes how to
implement DDR SDRAM interfaces that MIG creates for Virtex-4 FPGAs.
• Chapter 3, “Implementing DDR2 SDRAM Controllers,” describes how to
implement DDR2 SDRAM interfaces that MIG creates for Virtex-4 FPGAs.
• Chapter 4, “Implementing QDRII SRAM Controllers,” describes how to
implement QDRII SRAM interfaces that MIG creates for Virtex-4 FPGAs.
• Chapter 5, “Implementing DDRII SRAM Controllers,” describes how to
implement DDRII SRAM interfaces that MIG creates for Virtex-4 FPGAs.
• Chapter 6, “Implementing RLDRAM II Controllers,” describes how to implement
RLDRAM II interfaces that MIG creates for Virtex-4 FPGAs.
• Section III: “Spartan-3/3E/3A/3AN/3A DSP FPGA to Memory Interfaces”
• Chapter 7, “Implementing DDR SDRAM Controllers,” describes how to
implement DDR SDRAM interfaces that MIG creates for Spartan-3 FPGAs.
• Chapter 8, “Implementing DDR2 SDRAM Controllers,” describes how to
implement DDR2 SDRAM interfaces that MIG creates for Spartan-3 FPGAs.
• Section IV: “Virtex-5 FPGA to Memory Interfaces”
• Chapter 9, “Implementing DDR2 SDRAM Controllers,” describes how to
implement DDR2 SDRAM interfaces that MIG creates for Virtex-5 FPGAs.
References
The following documents provide supplementary material useful with this user guide:
1. Samsung Data Sheet k7i321884m_R04
http://www.samsung.com/Products/Semiconductor/SRAM/SyncSRAM/DDRII_CIO_SIO/
36Mbit/K7I321884M/K7I321884M.htm
2. Micron Data Sheet MT47H16M16FG-37E
http://www.micron.com/products/dram/ddr2sdram/partlist.aspx
3. Samsung Data Sheet k7r323684m
http://www.samsung.com/Products/Semiconductor/common/product_list.aspx?family_cd
=SRM020302
4. Micron Data Sheet MT49H16M18FM-25
http://www.micron.com/products/dram/rldram/part.aspx?part=MT49H16M18FM-25
5. Micron Data Sheet MT46V16M16FG-5B
http://www.micron.com/products/dram/ddrsdram/partlist.aspx
6. Xilinx® ChipScope™ Pro analyzer documentation
http://www.xilinx.com/literature/literature-chipscope.htm
Additional Resources
Additional Resources
To search the database of silicon and software questions and answers, or to create a
technical support case in WebCase, see the Xilinx website at:
http://www.xilinx.com/support.
Typographical Conventions
This document uses the following typographical conventions. An example illustrates each
convention.
Section I: Introduction
Chapter 1
Using MIG
MIG is a tool used to generate memory interfaces for Xilinx® FPGAs. MIG generates
Verilog or VHDL RTL design files, user constraints files (UCF), and script files. The script
files are used to run simulations, synthesis, map, and par for the selected configuration.
This chapter describes the user interface details of all memory interfaces supported in
MIG. It provides MIG features, usage, and installation details and describes the output
files. This chapter also summarizes the changes and enhancements made from earlier
versions of MIG.
• Uncommon banks are faded out in the Bank Selection page when the user selects
compatible FPGAs, allowing only the common banks for pin allocation
• Attributes X_CORE_INFO and CORE_GENERATION_INFO support for all designs
• Updates to Virtex-5 FPGA designs:
• DDR2 SDRAM
- Changing the MIG 1.73 or prior versions of UCF files compatible to MIG 2.0
or following versions of designs using Verify UCF feature
• QDRII SRAM
- BL2 support
- DCI cascade support
• Updates to Virtex-4 FPGA designs:
• DDR2 SDRAM Direct Clocking
- CAS latency 5 support
- Linear addressing support from the user interface
- Calibration algorithm modified to fix the low-frequency issues
• DDR2 SDRAM SerDes
- Linear addressing support from the user interface
• DDR SDRAM
- Linear addressing support from the user interface
• DDRII SRAM
- Two address FIFOs replaced by a common address FIFO for both write and
read commands
• Updates to Spartan FPGA designs:
• DDR2 SDRAM and DDR SDRAM
- Linear addressing support from the user interface
For MIG 2.1 release notes and a list of specific issues addressed in this release, consult
Xilinx Answer Record 29767.
• Pinout compatibility with MIG 1.6 and MIG 1.5 versions for Spartan-3 and
Spartan-3E devices. There are several limitations to this feature. Contact Xilinx
support for more details.
For MIG 1.7 release notes and a list of specific issues addressed in this release, consult
Xilinx Answer Record 25406.
Tool Features
The key features of MIG are listed below:
• Supported memory types for Virtex-5 FPGA interfaces:
• DDR2 SDRAM components and single-rank DIMMs
See “Supported Devices” in Chapter 9 for a complete listing of supported devices.
• QDRII SRAM and DDRII SRAM
See “Supported Devices” in Chapter 10 for a complete listing of supported QDRII
devices. See “Supported Devices” in Chapter 12 for a complete listing of
supported DDRII devices.
• DDR SDRAM components and single-rank DIMMs
See “Supported Devices” in Chapter 11 for a complete listing of supported
devices.
Both Verilog and VHDL RTL are generated. Additional devices can be created using
the “Create Custom Part” feature.
• Supported memory types for Virtex-4 FPGA interfaces:
• DDR SDRAM components, registered DIMMs, unbuffered DIMMs, and
SODIMMs.
See “Supported Devices” in Chapter 2 for a complete listing of supported devices.
• DDR2 SDRAM components and single-rank DIMMs. The DDR2 controller
supports deep memory depths from one to four.
See “Supported Devices” in Chapter 3 for a complete listing of supported devices.
• QDRII and DDRII SRAMs
See “Supported Devices” in Chapter 4 for a complete listing of supported QDRII
devices.
See “Supported Devices” in Chapter 5 for a complete listing of supported DDRII
devices.
• RLDRAM II CIO and SIO memories
See “Supported RLDRAM II Devices” in Chapter 6 for a complete listing of
supported devices.
Additional devices can be created using the “Create Custom Part” feature.
• Supported memory types for Spartan-3 FPGA interfaces:
• DDR SDRAM components, registered DIMMs, unbuffered DIMMs, and
SODIMMs.
See “Supported Devices” in Chapter 7 for a complete listing of supported devices.
• DDR2 SDRAM components, registered DIMMs, unbuffered DIMMs, and
SODIMMs.
See “Supported Devices” in Chapter 8 for a complete listing of supported devices.
Additional devices can be created using the “Create New Memory Part” feature.
• Supported memory types for Spartan-3E FPGA interfaces:
• DDR SDRAM components
See “Supported Devices” in Chapter 7 for a complete listing of supported devices.
Tool Features
Additional devices can be created using the “Create New Memory Part” feature.
• Supported memory types for Spartan-3A/3AN FPGA interfaces:
• DDR SDRAM components, registered DIMMs, unbuffered DIMMs, and
SODIMMs.
See “Supported Devices” in Chapter 7 for a complete listing of supported devices.
• DDR2 SDRAM components, registered DIMMs, unbuffered DIMMs, and
SODIMMs.
See “Supported Devices” in Chapter 8 for a complete listing of supported devices.
Additional devices can be created using the “Create New Memory Part” feature.
• Supported memory types for Spartan-3A DSP FPGA interfaces:
• DDR SDRAM components, unbuffered DIMMs, and SODIMMs.
See “Supported Devices” in Chapter 7 for a complete listing of supported devices.
• DDR2 SDRAM components, unbuffered DIMMs, and SODIMMs.
See “Supported Devices” in Chapter 8 for a complete listing of supported devices.
Additional devices can be created using the “Create New Memory Part” feature.
• Supported synthesis and place-and-route tools:
• XST (Xilinx ISE Design Suite 10.1) and Synplify Pro Version 8.8.0.4 are supported
for Virtex-5, Virtex-4, and Spartan-3/3E/3A/3AN/3A DSP FPGA interfaces
• All currently available Virtex-5, Virtex-4, Spartan-3A, Spartan-3AN, Spartan-3A DSP,
Spartan-3E, and Spartan-3 FPGAs are supported.
• DDR2 designs can use either the SerDes or the direct-clocking technique. The
individual bits are deskewed in the direct-clocking technique used in DDR2 designs.
The direct-clocking technique for other memories does not deskew each bit. Details
are explained in the appropriate application notes referenced in this document.
• Direct and SerDes clocking techniques for data capture for Virtex-4 FPGA interfaces.
Direct clocking using per-bit deskew is explained in XAPP701 [Ref 18]. With this
technique, it is not necessary to use clock-capable I/Os for strobes or read clocks.
SerDes clocking is explained in XAPP721 [Ref 23]. The use of clock-capable I/Os for
strobes and read clocks is recommended for maximum flexibility with higher
frequency designs (200 MHz and above).
• Local clocking technique for data capture for all Spartan-3, Spartan-3A/3AN/3A DSP,
and Spartan-3E FPGA interfaces.
The data capture technique using Spartan-3 FPGAs is explained in XAPP768c [Ref 24].
• VHDL and Verilog RTLs are supported for all designs.
• Variable data widths in multiples of 8 up to 144 bits.
The actual width depends upon the selected component. For a 9-bit wide component,
data widths of 9, 18, 36, and 72 are supported.
For DDR2 SDRAM, most of the components support up to a 144-bit data width. 16-bit
or 8-bit wide components can be used to create designs of any data width that is a
multiple of 8.
• User-selectable banks for address, data, system control, and system clock signals.
For QDRII SRAM and RLDRAM II (SIO) memories, the user selects the data banks for
reads and writes separately.
Design Tools
All MIG designs have been tested with ISE Design Suite 10.1 and Synplify Pro. MIG is
currently supported on the following operating systems: 64-bit/32-bit Microsoft Windows
XP, 64-bit/32-bit Linux Red Hat Enterprise 4.0, 32-bit Vista Business, and 64-bit SUSE 10
Enterprise.
Installation
MIG provides Xilinx CORE Generator™ tool reference designs and is included in the latest
IP update. IP updates are available through the Xilinx Download Center or WebUpdate.
Visit the Xilinx Download Center for the latest IP update and full documentation on both
installation methods at http://www.xilinx.com/download.
Getting Started
MIG is a self-explanatory tool. This section is intended to help with understanding the
various steps involved in using it.
The following steps launch MIG:
1. The CORE Generator system is launched by selecting Start → Xilinx ISE Design
Suite 12.3 → ISE → Accessories → CORE Generator.
2. Create a CORE Generator project.
3. The Xilinx part must be correctly set because it cannot be changed inside MIG.
Virtex-5, Virtex-4, and Spartan-3/Spartan-3E/Spartan-3A/3AN/3A DSP devices are
supported. Select the part via the part's Project Options menu in the CORE Generator
system. The Generation tab is used to select between Verilog or VHDL by “design
entry” under “flow”. The “flow settings” and “vendor” must be chosen appropriately.
The vendor choices are “Synplicity” for Synplify and “ISE” for XST.
4. Remember the location of the CORE Generator project directory. The “View by
Function” tab to the left shows the available cores organized into folders.
5. MIG is launched by selecting Memories & Storage Elements → Memory Interface
Generator → MIG.
6. The name of the module to be generated is entered in the Component Name text box.
After entering all the parameters in the GUI, click Generate to generate the module
files in a directory with the same name as the component name in the CORE Generator
project directory. After successful generation of the module files, the GUI is closed
automatically.
The “Generated IP” tab to the left lists the generated modules.
Getting Help
At any point in time, the MIG user manual can be accessed by clicking the User Guide
button.
Version Information
The Version Info Button gives the information on new features added and the bugs fixed
in the current version. It opens the web browser to display the contents.
UG086_c1_04_072008
The CORE Generator Options screen displays the details of the selected CORE Generator
options that are selected before invoking MIG.
Note: CORE Generator project options are used in the generation of the memory controller.
Correct CORE Generator project options must be selected.
If the displayed CORE Generator Project Options are inaccurate, click the Cancel button
and reselect the CORE Generator Project Options.
Click Next to continue. A new window shows the MIG Output Options page.
UG086_c1_05_030510
The Spartan-3A DDR2 SDRAM 200 MHz Design option appears only for Spartan-3A
FPGA designs (see Figure 1-3).
UG086_c1_55_030510
Create Design
Using the Create Design option, designs can be generated that are supported for that
FPGA family. For example, the Virtex-4 FPGA family supports DDR2 SDRAM, DDR
SDRAM, QDRII SRAM, DDRII SRAM, and RLDRAM II. Here is the flow for creating a
design:
1. Pin Compatible FPGAs
2. Memory Selection
3. Controller Options
4. Memory Options
5. FPGA Options
6. Extended FPGA Options
7. Reserve Pins
8. Bank Selection
9. Summary
10. Memory Model License
11. PCB Information
12. Design Notes
All the options are described in this section.
UG086_c1_06_021109
UG086_c1_66_072008
Select any number of compatible FPGAs out of the listed ones. Only the common pins
between target and selected FPGAs are used by MIG. The name in the text box signifies the
Target FPGA selected. When a Virtex-5 FXT device is selected as either compatible or
target, the PPC440 checkbox is enabled. The PPC440 checkbox is enabled only when the
Number of Controllers is selected as 1 in the MIG Output Options page. If the Number of
Controllers is more than 1, then the PPC440 checkbox is disabled and masked, and user can
no longer selected this option. Click Next to continue. The Memory Selection is displayed.
Memory Selection
This page displays all memory types that are supported by the selected FPGA family. An
example is shown in Figure 1-6 for Virtex-4 FPGA designs and in Figure 1-7 for Virtex-5
FPGA designs. In Virtex-5 FPGA designs, the user can select the combination of both
DDR2 SDRAM and QDRII SRAM interfaces for a multicontroller design.
UG086_c1_07_072008
UG086_c1_56_072008
For Virtex-5 FPGA designs, only the DDR2 SDRAM controller is shown when the PPC440
checkbox is enabled in the Pin Compatible FPGAs page. Select the appropriate option, and
then click Next to continue. The Controller Options window is displayed.
Controller Options
This page shows the various controller options that can be selected. If the design has
multiple controllers, this page is repeated for each of the controllers. The page is
partitioned into a maximum of nine sections. The number of partitions depends on the
type of selected memory.
• Capture Method. This feature deals with the data capture method. The DDR2 SDRAM
controller for Virtex-4 devices supports two types of capture method. For other
designs, the capture method is displayed, but it cannot be changed.
UG086_c1_08_020709
Click the pull-down menu button and select an option. Certain other options such as
frequency and ECC are restricted based on this selection.
• Frequency. This feature indicates the desired frequency for all the controllers. This
frequency block is limited by factors such as the selected FPGA, device speed grade,
and clocking type.
UG086_c1_09_102009
UG086_c1_10_072809
Click the pull-down menu combo box and select the memory type. Memory type
options marked with a warning symbol are not compatible with the selected
frequency.
• Memory Part. This feature helps the selection of a memory part for the design.
Selection can be made from an existing list, or a new part can be created.
UG086_c1_11_072809
Select the appropriate memory part from the list. Select the larger memory parts to get
all address lines in UCF. If the required part or its equivalent is unavailable, a new
memory part can be created. Parts marked with a warning symbol are not compatible
with the selected frequency. To create a custom part, select the Create Custom Part
from the drop down combo box. A new window appears as shown in Figure 1-12.
The window called Create Custom Part includes all the details of the memory
component selected in Select Base Part. Enter the appropriate memory part name in
the text box. Select the suitable base part from the Select base part list. Edit the Value
column as needed. Select the suitable values from the Row, Column, and Bank options
as per the requirements. After editing the required fields, click the Save button. The
new part can be saved with the selected name. This new part is added in the Memory
Parts list as shown in Figure 1-13 and saved into the database for reuse and to produce
the design.
Note: The rank and speed grade of the Create Custom Part is the same as the base part
selected in the GUI. For example, if the base part selected in the GUI is dual-rank, the new part
created is also a dual-rank part.
UG086_c1_12_072809
UG086_c1_13_072809
Note: In order to use the MIG-generated design UCF for different densities of the same
memory type, select the highest memory part from Memory Selection page. For different density
parts, only the number of address bits differ. Hence, if a design is generated selecting the highest
density memory part, the same UCF can be used for the lower density memory part of the same
memory type. For example, if a 128 x 16 DDR2 SDRAM design is generated, its UCF can be
used for 64 x 16 DDR2 SDRAM design by connecting the most-significant address bit to ground.
This scenario works only if the different density memory parts have the same design parameters
(such as data width and other memory parameters).
• Data Width. The data width value can be selected here based on the memory type
selected earlier. The list shows all supported data widths for the selected part. Choose
one of them. These values are generally multiples of the individual device data
widths. In some cases, the width might not be an exact multiple. For example, though
16 bits is the default data width for x16 components, 8 bits is also a valid value.
UG086_c1_14_020709
UG086_c1_15_020709
UG086_c1_16_020709
Note that ECC selection is enabled only when the appropriate data width is selected.
DDR2 SDRAM Virtex-4 FPGA design supports three modes: ECC Disabled,
Unpipeline Mode, and Pipeline Mode, as shown in Figure 1-16. Select the appropriate
mode. The Pipeline mode improves frequency performance at the cost of an extra
pipeline stage.
UG086_c1_17_020709
For other Virtex-4 FPGA designs, this window is disabled as shown in Figure 1-17. For
Virtex-5 FPGA DDR2 SDRAM designs, the two options are ECC Enabled and ECC
Disabled.
UG086_c1_18_020709
Figure 1-18 shows the ECC option section for the Virtex-5 FPGA design GUI. For
Virtex-5 devices, ECC is supported for 72-bit or 144-bit DDR2 SDRAM designs.
• Data Mask. When this Data Mask checkbox is marked, the data mask pins are
allocated. When this Data Mask checkbox is not checked, the data mask pins are not
allocated, which increases the pin efficiency. This option is disabled and cannot be
changed for memory parts that do not support data masks. Similarly, this option is
disabled for ECC enabled designs. This option is available only for DDR2 and DDR
SDRAMs.
UG086_c1_43_021009
UG086_c1_19_021009
UG086_c1_57_021009
• Memory Details. This section displays details about the selected memory. For
DIMMs, x4/x8/x16 indicates the memory width of the base device.
UG086_c1_20_021009
Click Next to continue. The Memory Options window is displayed for RLDRAM II,
DDR, and DDR2 SDRAM devices. For other memories, the next window displayed is
FPGA Options.
Memory Options
This feature allows selection of various memory options as supported by the controller
type.
UG086_c1_21_021009
Figure 1-23: Memory Options for Virtex-4 FPGA DDR2 Direct Clocking Design
UG086_c1_22_021009
to 333 MHz. Thus, the CAS latency value is not shown in a frequency range of 267 MHz to 333 MHz,
and the value selected for CAS latency is 5.
Click Next to continue. The FPGA Options window is displayed.
FPGA Options
This feature is partitioned into three or four sections based on the FPGA family selected:
DCM, DCI, SSTL Class, and Debug Signals Control. For Virtex-5 FPGA designs, the DCI
option appears in the Extended FPGA Options page.
• DCM. DCM allows design generation with or without a DCM in the design. This
option appears only for Virtex-4 and Spartan FPGA designs. When a design is
generated with DCM, all the required clocks for the design are generated out of the
DCM using the system clock inputs. When the DCM is disabled, the user must
implement a user clocking scheme to generate the required design clocks. Refer to the
Clocking Scheme section of a design for details about the clocks that the user must
generate when the DCM is not instantiated in the design.
UG086_c1_23_021009
• PLL. PLL allows design generation with or without a phase-locked loop (PLL) in the
design. This option appears only for Virtex-5 FPGA designs. In MIG 3.0 and later,
DCM is replaced with PLL for all Virtex-5 FPGA designs. When a design is generated
with PLL, all the required clocks for the design are generated out of the PLL using the
system clock inputs. When the PLL is disabled, the user must implement a user
clocking scheme to generate the required design clocks. Refer to the Clocking Scheme
section of a design for details about the clocks that the user must generate when the
PLL is not instantiated in the design.
UG086_c1_85_021009
• DCI. This feature indicates whether the Digitally Controlled Impedance is Disabled or
Enabled. This option appears only for Virtex-4 FPGA designs. DCI can be enabled or
disabled for input, bidirectional, or output pins. This option can change according to
the memory selected. They are listed as follows:
DDR2 SDRAM — DCI for DQ/DQS and DCI for Address/Control
DDR SDRAM — DCI for DQ/DQS and DCI for Address/Control
RLDRAM II — DCI for Data, Read Clocks, and Data Valid Signals and DCI for
Address/Control
QDRII SRAM — DCI for Data and Read Clocks
DDRII SRAM — DCI for Data and Read Clocks
UG086_c1_24_021009
• SSTL Class Option. SSTL Class Option determines the I/O standard drive strength in
the UCF of DDR and DDR2 SDRAM. These I/O standards can be changed based on
their application.
UG086_c1_26_021009
• Debug Signals Control. Selecting this option enables the debug signals to be port-
mapped to the ChipScope™ analyzer modules in the design top module. This helps in
monitoring the debug signals on the ChipScope tool. When the generated design is
run in batch mode using ise_flow.bat in the design’s par folder, the CORE
Generator system is called to generate ChipScope analyzer modules (that is, EDIF files
are generated). Deselecting this option leaves the debug signals unconnected in the
design top module, with no ChipScope analyzer modules instantiated in the design
top module or generated by the CORE Generator system. In Virtex-4 FPGA
multicontroller designs, the Debug port is supported for the first controller.
UG086_c1_44_072909
UG086_c1_59_102009
• System Clock. This option enables users to select the system clock type for the design
to be generated (Figure 1-31). This option is applied on both system clock as well as
IDELAYCTRL clock (200 MHz clock). When Differential is selected, only differential
clock pairs appear in the design top RTL file as well as in the design UCF. When
Single-Ended is selected, only single-ended clock input pins appear in the design top
RTL file as well as in the design UCF. This option is grayed out when PLL/DCM
option is deselected and the clock type remains single-ended (Figure 1-32).
UG086_c1_77_021009
Figure 1-31: System Clock Type Selection when PLL/DCM Option is Selected
UG086_c1_78_072108
Figure 1-32: System Clock Type Selection when PLL/DCM Option is Deselected
• High Performance Mode. This is the IODELAY element High Performance Mode
selection type for Virtex-5 FPGA designs (Figure 1-33). This option sets the IODELAY
in HIGH power or LOW power mode. This option is enabled for selection only when
the design frequency is less than the false frequency mode. When the frequency set in
the Controller options page is more than the specified false frequency range, then the
High Performance Mode option is not available for selection and is grayed out with
the default value set to TRUE (Figure 1-34).
UG086_c1_79_021009
Figure 1-33: High Performance Mode Selection Type for Virtex-5 FPGA
Interface Single Controller Designs
UG086_c1_80_021009
Figure 1-34: High Performance Mode Default: Design Freq > False Freq Range
UG086_c1_87_102009
Figure 1-35: High Performance Mode Selection Type for Virtex-5 FPGA
Multiple Interface Designs
Refer to Appendix E, “Debug Port” for more information on False Mode frequency
ranges for all Virtex-5 FPGA memory interface designs.
• Limit to 2 Bytes per Bank. Enabling this option allows only two bytes of data into a
single bank (Figure 1-36). This option is available only for Virtex-5 FPGA DDR2
SDRAM memory interface designs.
UG086_c1_76_021009
Figure 1-36: Limit to 2 Bytes per Bank in Virtex-5 FPGA DDR2 SDRAM
Interface Designs
Click Next to continue. For Virtex-5 FPGA designs, the Extended FPGA Options
window is displayed, and for other designs, the Reserve Pins window is displayed.
UG086_c1_58_021009
Figure 1-37: DCI Option for Multiple Interfaces Selected in Virtex-5 FPGA Designs
If DCI is enabled, the pins are characterized by the DCI I/O standards.
• DCI Cascading Information. This option appears only for QDRII Virtex-5 FPGA
designs. This option is necessary for generating 36-bit component designs with DCI
support.
UG086_c1_25_021009
after the user chooses the Pin/Bank selection mode. The Pin Selection feature is only
implemented for Virtex-5 FPGA devices in DDR and DDR2 SDRAMs, and QDR II SRAM.
UG086_c1_88_102709
Reserve Pins
This feature allows specific pins to be reserved for other applications. After selecting
suitable pins as necessary, the reserved pins are not used by the MIG tool while generating
the pinout for that particular design.
UG086_c1_27_021009
Select the pins from the Available Pins column, and click the Prohibit button. The
particular pin is transferred to Reserve Pins column along with its bank information. This
signifies that the selected pin has been reserved. To unreserve a reserved pin, click the
appropriate pin that needs to be removed, and then click the Allow button. The number
408 in the Available Pins header signifies the number of pins available for pinout, whereas
the number 16 in the Reserve Pins header signifies the number of pins selected to be
reserved.
The reserved pins information can be saved in a user defined file using the Save as button.
A browser window appears after clicking the Save as button. Set the file location here.
Use the Read UCF File button to read a reserve pins from a UCF. When the Read UCF File
button is clicked, a new window pop ups. Select the UCF to be read. After reserving the
pins, click Next to continue. The Bank Selection window is displayed.
UG086_c1_89_102709
After selecting the appropriate option, click the Next button to continue. If the Fixed Pin
out: Pre-existing pin out is known and fixed option is selected, the Pin Selection page is
displayed (“Pin Selection,” page 53). If the New Design: Pick the optimum banks for a
new design option is selected, the Bank Selection page is displayed (“Bank Selection,”
page 54).
Pin Selection
This page allows for the selection of pins for the various signals on the memory interface.
There are five columns in the pin selection table:
• Signal Name
• Signal Group
• Bank Number
• Pin Number
• Pin Name
UG086_c1_90_102709
• Signal Name. This column displays all the required signals for the specific memory
interface. The signal name cannot be edited or changed.
• Signal Group. This column displays the group name of the corresponding signal
name. The signal name cannot be edited or changed.
• Bank Number. This column displays all the banks available in the FPGA. Each cell of
this column contains a pull-down menu. The entries in the pull-down menu are the
banks of the corresponding FPGA. Select the appropriate banks for specific memory
signals. After the bank is selected in the bank number, the Pin Number pull-down
menu is automatically populated with a list of pins from the corresponding bank.
Note: It is not mandatory to select a bank number. When the pin number is selected, the bank
number pull-down menu is automatically updated with the corresponding bank number.
• Pin Number. This column displays all the pin numbers available in the FPGA. Each
cell of this column contains a pull-down menu that lists all the pins in the FPGA.
Select the appropriate pins for specific memory signals. After the pin number is
selected in the Pin Number menu, the Bank Number menu automatically changes to
the corresponding bank.
• Pin Name. This column displays the pin name (I/O type) of the user selected pin
number. The pin name cannot be edited or changed.
Use the Validate button to verify the design rule of all the assigned pins to memory signals.
After verifying the design rule, a DRC Validation Log message window is displayed with
a list of success and failure messages. This window can be saved using the Save Log
Message button.
UG086_c1_91_102709
After successfully assigning the memory interface signals to the appropriate pin, click the
Next button to continue. If there are no errors, the Summary page is displayed
(“Summary,” page 75).
Bank Selection
This feature allows selection of banks for the Memory interface. Banks can be selected for
different groups of memory signals. The different groups are:
• Address and Control Signals
• Data Signals
• System Control Signals
• System Clock
UG086_c1_28_021009
UG086_c1_29_021009
In certain banks, global clock pins are not allowed for system clock. This is because system
clock signals have different I/O standards as compared to those of any other signals in the
design. In such banks, global clock pins are left unused.
• Real-time pin allocation. As the user selects the banks, pin allocation is done
dynamically, and the number of pins required is updated for each group of signals.
• The red circle with a cross mark at each group indicates that sufficient pins are not
allocated, and additional pins are required for the selected configuration.
• The green circle with a tick mark at each group indicates that sufficient pins are
allocated for the selected configuration.
• The denominator in each group indicates the total number of pins required for
each group.
The user must select banks until the numerator equals the denominator. The user
cannot move to the next page unless sufficient pins are allocated for each group.
Figure 1-46 illustrates the conditions where sufficient banks are selected in order to
successfully generate the design.
UG086_c1_45_021009
Figure 1-47 indicates when sufficient banks are not allocated for each signal group.
UG086_c1_46_021009
Figure 1-48 indicates sufficient pins are allocated for System Control and System Clock
groups, but sufficient pins are not allocated for Data and Address groups.
UG086_c1_47_021009
Figure 1-48: Real-Time Pin Allocation: Insufficient Pins for Data/Address Groups
• Pin Allocation Priority. MIG allocates the pins starting with exclusive data banks
first, followed by data banks that combine with other groups.
Figure 1-49 indicates that data banks are selected in bank 11, bank 19, bank 20, and
bank 12. In bank 11, bank 20 and bank 12, only data is selected; in bank 19, data,
address, and system control are selected. Here, data is allocated first in bank 11,
bank 20 and bank 12, and then in bank 19. This Pin Allocation Priority is applicable
only for data group signals in Virtex-4 and Virtex-5 devices.
Note: Data group infers Data group pins in CIO designs and Data Read group pins in SIO
designs.
UG086_c1_48_021009
• Master Bank selection. This is applicable only for QDRII SRAM and DDRII SRAM
Virtex-5 FPGA designs when the DCI Cascading Information option is selected. A
Master bank should be selected in each column when a Data Read is selected in that
particular column. There is an exception for the middle column. The middle column is
divided into two parts: above zero bank and below zero bank. The middle column can
have two Master banks, depending on where the Read Data banks are selected. If the
Read Data bank is selected either above or below the Zero bank, only one Master Bank
is required. If the Read Data banks are selected both above and below Zero bank, two
Master banks are required.
Figure 1-50 shows that the Data Read is selected in both the columns and user needs to
select the Master Banks in both the columns. Master bank selection box lists all the
possible banks that can be selected as Master Bank. MIG does not show the Master
Bank selection box for a column if that column does not have enough pins in the banks.
UG086_c1_49_021009
Figure 1-51 shows the Master Bank selection in the center column. It uses all the pins
for Read Data from the center column.
UG086_c1_50_021009
• Bank Selections for Multiple Memory Interfaces in Virtex-5 FPGA Designs. For a
multiple interface design, a particular group is allowed to select in a bank only for
compatible I/O standards. For example (with the selected FPGA as
XC5VLX220-FF1760), Controller 0 is DDR2 SDRAM (see Figure 1-52) and Controller 1
is QDRII SRAM (see Figure 1-53). In DDR2 SDRAM, bank 19 is selected for Data, bank
19 and bank 15 are selected for Address, and bank 1 is selected for System Control. In
QDRII SRAM, neither bank 19 nor bank 15 are allowed to select Data Read, because
the I/O standard for DDR2 SDRAM Data and Address is SSTL18_II_DCI, and the I/O
standard for QDRII SRAM Data Read is HSTL_I_DCI_18. These two I/O standards
are not compatible. Hence MIG does not allow bank selection for the group of signals
that do not follow the I/O standard compatibility rules.
UG086_c1_60_050409
UG086_c1_61_050409
When PPC440 compatible pinouts are selected, MIG outputs the fixed pinout. The banks
that are checked indicate the banks used for PPC440 pinouts; the user does not have an
option to select the specific banks. For example, bank selection of a 16-bit design with a
XC5VFX100T-FF1136 target device and the PowerPC440 Block Selection value in the Pin
Compatible FPGAs page set to Top is shown in Figure 1-54.
UG086_c1_73_050409
After selecting the banks, click Next to continue. The Summary page window is displayed.
Summary
This window provides complete details about the CORE Generator options, Interface
parameters, FPGA options, and Bank selections of the active project (Figure 1-55).
UG086_c1_31_030510
Click Next to move to the License Agreement page of the selected memory of the
Micron/Qimonda memory model only for DDR2 SDRAM, DDR SDRAM, and RLDRAM II
memory interface designs. For other memory interface designs, clicking the Next button
will move to the PCB information page.
UG086_c1_30_072809
MIG outputs Micron and Qimonda memory models only. MIG does not output Samsung
and Cypress memory models; these should be download from the vendor’s respective
websites (Figure 1-57).
UG086_c1_86_102009
PCB Information
This page displays the PCB related information to be considered while designing the board
that uses MIG generated designs (Figure 1-58).
UG086_c1_42_072008
Design Notes
This page provides the design notes that should be taken into account while using MIG
generated designs (Figure 1-59).
UG086_c1_32_030510
Click Generate to generate the design files. MIG generates three output directories:
example_design, user_design, and docs. After generating the design, the MIG GUI closes.
Click Cancel. A Quit Confirmation window appears, as shown in Figure 1-60.
UG086_c1_33_072008
Output Files
A MIG-generated design has the following output files and directory:
• A <component name>_xmdf.tcl file. This is the interface file between the ISE and
CORE Generator software. The ISE software uses this file to determine the files output
by the CORE Generator software for the core to be integrated into the ISE project.
• A <component name>.vho file, used for the core to be instantiated, created only
when a VHDL design is generated.
• A <component name>.veo file, used for the core to be instantiated, created only
when a Verilog design is generated.
• A <component name>_readme.txt file, includes information about the files
generated and how they are used.
• A <component name> directory.
In the <component name> directory, three folders are created:
• docs
• example_design
• user_design
Any relevant documents, such as application notes, timing analysis spreadsheets, and user
guide are in the docs directory.
The example_design and user_design folders contain several other folders and files.
They are:
• rtl — Contains all the RTL files (either VHDL or Verilog design files). The RTL files
generated for Virtex-5 FPGA designs do not have user design names (component
names) prepended to the RTL file names. MIG generates the same code for both the
XST and Synplicity tools. The generated RTL has separate XST and Synplicity
attributes. While running XST designs, Synplicity attributes might cause warning
messages to appear, and vice versa. The warning messages related to these attributes
can be ignored.
• par — Contains the UCF with constraints for the design, including two scripts files
that are generated (ise_flow.bat and create_ise.bat):
• ise_flow.bat — The user double-clicks the ise_flow.bat script file to run
the design through synthesis, build, map, and par. This script file sets all the
required options. Users should refer to this file for the recommended build
options for the design. This file takes all synthesis options from the
xst_run.txt file located in the par folder. All map, place-and-route, and
Timing Reporter and Circuit Evaluator (TRCE) options are set in the
ise_flow.bat file. BITGEN options are taken from the UT file located in the
par folder. The options that are not listed out in these files (such as Synthesis and
others) are set to their default values. For more information about the allowed
option values, refer to the Development System Reference Guide in the Xilinx ISE
12.3 Design Suite Software Manuals and Help – PDF Collection at
http://www.xilinx.com/support/documentation/index.htm.
• create_ise.bat — The generated MIG design includes a create_ise.bat
script file in both the example_design/par and user_design/par
directories. Running the create_ise.bat script generates an ISE tools project
that incorporates the generated MIG design. If the file is run from the
example_design/par directory, the ISE tools project includes the
example_design. If the file is run from the user_design/par directory, the ISE
UG086_c1_34_030510
The pull-down menu includes a list of boards. Select the appropriate board. Details about
the particular board are displayed in the pane below. After selecting the board, click Next
to move to next page.
UG086_c1_54_102009
PCB Information
This page displays the PCB-related information to be considered while designing the
board that is to use a MIG generated design (Figure 1-63).
UG086_c1_51_072008
Design Notes
Click Generate to generate the board files for the specified Xilinx reference board
(Figure 1-64). After successful generation of the board files, the MIG GUI closes.
UG086_c1_35_030510
Click Cancel. A Quit Confirmation window appears, as shown in Figure 1-65. Click Yes to
exit or No to return to the current page.
UG086_c1_33_072008
to ensure that the update completes successfully. If a MIG design has already been
generated, locate the mig.prj file in the generated MIG project. Ensure that the banks
selected for this project match the banks of the desired pinout. This is required for the
update to complete successfully.
Open MIG either by invoking a new MIG project or by reopening the previously generated
MIG project. Verify the options displayed on screen one and click Next. On screen two,
provide a component name and select Verify UCF and Update Design and UCF.
Note: Verify UCF and Update Design and UCF verifies that the pinout adheres to the required
pinout rules outlined in Appendix A, “Memory Implementation Guidelines.” The pinout must meet
these guidelines for the tool to complete the Update Design option.
The flow is as follows:
1. Load mig.prj and UCF
2. Summary
3. Verification Report
4. Memory Model License
5. PCB Information
6. Design Notes
UG086_c1_37_072809
Select the appropriate files. After selecting the files, click Next to continue.
Summary
This page provides complete details about the bank selection, Interface parameters, CORE
Generator options and FPGA options of the project for which the UCF is to be verified.
UG086_c1_38_030510
Verification Report
This window indicates if the input UCF has been verified and provides warning or error
messages if the input UCF does not follow the pin allocation rules.
UG086_c1_69_072809
UG086_c1_70_072809
Click Next to move to Memory Model License agreement page. If the Verification Report
file has a warning message(s), MIG proceeds with updating the design.
When the input UCF does not follow the MIG pin allocation rules, for example if the
Verification Report file has an error message(s), MIG does not proceed with updating the
design.
UG086_c1_71_072809
PCB Information
This page displays the PCB related information to be considered while designing the board
that uses MIG generated designs. Click Next to go to the Design Notes page.
UG086_c1_72_072208
Design Notes
This page provides the design notes that should be taken into account while using the MIG
generated designs.
Click the Generate button to generate the complete design with the loaded Prj settings and
modified UCF (the UCF is updated without affecting the pin allocation constraints). MIG
generates three output directories example_design, user_design and docs. The UCF files in
the example_design and user_design folders are updated to the input UCF. After
generating the design, the MIG GUI closes.
Click Cancel. A Quit Confirmation window appears, as shown in Figure 1-72. Click Yes to
exit or No to return to the Current page.
UG086_c1_84_092208
• Bus notations:
• NET “ddr2_dq[0]”
• NET “ddr2_dq<0>”
• LOC constraints (supports various syntaxes, such as LOC, PROHIBIT and INHIBIT):
• LOC = “N33”
• LOC = N33
• IOSTANDARD:
• NET “ddr2_dq[*]” IOSTANDARD = SSTL18_II_DCI;
• NET “ddr2_dq[*]” LOC=N33 | IOSTANDARD = SSTL18_II_DCI;
• The MIG output UCF has spaces between the various KEYWORDs. The input UCF
need not have the same number of spaces, but each KEYWORD should be followed
by at least one space:
• NET “ddr2_dq[*]” IOSTANDARD = SSTL18_II_DCI;
• NET “ddr2_dq[*]” IOSTANDARD = SSTL18_II_DCI;
All of the above elements of syntax are treated the same by the MIG tool.
The following rules are verified from the input UCF file.
DDR2 SDRAM/DDR SDRAM Spartan FPGA designs:
1. Verifies the slice location constraints for DQ, DQS delayed col, FIFO write enable and
FIFO write address, and rst_dqs_div signals.
2. Verifies RLOC and BEL constraints for LUT delay calibration chain.
3. Verifies RLOC_ORIGIN constraints for LUT delay calibration chain.
4. Verifies AREA group constraint for calibration logic.
5. Verifies 5 Up/6 Down rule for DQ signals from DQS for left/right banks.
6. Verifies 5 Right rule for DQ signal from DQS for top/bottom banks
7. Verifies the DQ, DQS, DM, memory clocks, and rst_dqs_div (loop back) signals on the
same side of the FPGA.
8. Verifies the rst_dqs_div signal (loopback) should be center of the DQS sets.
9. Verifies system clock signals, whether allocated to the global clock pair of the device.
10. Verifies memory clock signals and differential DQS to be allocated to the differential
I/O pair of the device.
11. Verifies if reserved pins are used in the UCF.
12. Verifies VRN/VRP and VREF pins are not used in the UCF.
13. Verifies if the same tile is used for left/right banks for the following signals:
• DQ and DQS
• DQ and Address
• DQ and rst_dqs_div_out
• DM and DQS
• DM and Address
• DM and rst_dqs_div_out
The above signals cannot be used in the same tile because the two signals are in two
different clock phases.
14. Verifies all the pins and slice allocation rules if user selects compatible UCF.
DDR2 SDRAM Virtex-4/Virtex-5 FPGA designs:
1. Updates the IDELAYCTRL LOCs and slice constraints in the updated UCF.
2. Verifies the UCF and Updates the design even with the compatible UCF.
3. Verifies the UCF and Updates the design even with the user design UCF.
4. Verifies the UCF for system clocks allocated to global clock pins.
5. Updates the design for Virtex-4 FPGA DDR2 SDRAM multicontroller user design
UCF. If MIG does not find sufficient pins in a bank for allocating ERROR signal, then
user has to manually allocate the LOCs for the ERROR signal in the updated UCF file.
6. In designs containing an x4 memory part, the MIG verifies the UCF only when the DM
is associated with DQ higher nibble.
7. Verifies UCF for differential DQS signals allocated to differential pair pins in DDR2
SDRAM designs.
DDR SDRAM Virtex-4/Virtex-5 FPGA designs:
1. Updates the IDELAYCTRL LOCs and slice constraints in the updated UCF.
2. Verifies the UCF and Updates the design even with the compatible UCF.
3. Verifies the UCF and Updates the design even with the user design UCF.
4. Verifies the UCF for system clocks allocated to global clock pins.
5. Verifies the UCF for DQS to CC_P pin and corresponding DM to the CC_N for a CC
pair in Virtex-5 FPGA designs.
6. For parts with no DM, verifies the UCF for DQS to CC_P pin and the CC_N to any of
the output pins only in Virtex-5 FPGA designs.
7. In designs containing an x4 memory part, the MIG verifies the UCF only when the DM
is associated with DQ higher nibble.
QDRII SRAM/DDRII SRAM Virtex-4/Virtex-5 FPGA designs:
1. Updates the IDELAYCTRL LOCs in the updated UCF.
2. Verifies the UCF and updates the design even with the compatible UCF.
3. Verifies the UCF and Updates the design even with the user design UCF.
4. Verifies the UCF for system clocks allocated to global clock pins.
5. Verifies the UCF for CQ and CQ_n allocation only to P-pins.
6. For DDRII SRAM CIO x36 Virtex-5 FPGA designs, MIG verifies the UCF only when
the K/K_n and C/C_n are associated with the most significant 18 bits of data.
RLDRAMII Virtex-4 FPGA design:
1. Updates the IDELAYCTRL LOCs in the updated UCF.
2. Verifies the UCF and updates the design even with the compatible UCF.
3. Verifies the UCF and updates the design even with the user design UCF.
4. Verifies the UCF for system clocks allocated to global clock pins.
The following rules are not verified from the input UCF file.
DDR2 SDRAM/DDR SDRAM Spartan FPGA designs:
1. MIG does not verify if user changes vector notation of generate statement in the
instance hierarchy path.
2. If two pins are allocated to the same signal, MIG is not giving any error or warning
message.
3. MIG does not verify if DQS signal is not present in the UCF and does not report any
message in the report.
4. MIG does not give any message if DIRT strings are missing in the UCF file for
top/bottom banks of XC3S2000, XC3S4000 and XC3S5000 devices.
DDR2 SDRAM Virtex-4/Virtex-5 FPGA designs:
1. Verify UCF for IDELAYCTRL LOCs and slice constraints.
2. Update design for the PPC supported designs.
3. Verify UCF and Update design for Virtex-5 FPGA multicontroller designs.
4. Updated design using the updated UCF. It will result in unknown error messages.
5. Verify UCF for differential system clocks and differential memory signals allocated to
differential pair.
6. Verify UCF for vacant VRN/VRP and VREF pins.
7. Verify UCF when a signal is allocated to two pins.
8. Verify UCF when reserved pins used for pin allocation.
DDR SDRAM Virtex-4/Virtex-5 FPGA designs:
1. Verify UCF for IDELAYCTRL LOCs and slice constraints.
2. Updated design using the updated UCF. It will result in unknown error messages.
3. Verify UCF for differential system clocks and differential memory signals allocated to
differential pair.
4. Verify UCF for vacant VRN/VRP and VREF pins.
5. Verify UCF when a signal is allocated to two pins.
6. Verify UCF when reserved pins used for pin allocation.
QDRII SRAM/DDRII SRAM Virtex-4/Virtex-5 FPGA designs:
1. Verify UCF for IDELAYCTRL LOCs.
2. Updated design using the updated UCF. It will result in unknown error messages.
3. Verify UCF for differential system clocks and differential memory signals allocated to
differential pair.
4. Verify UCF for vacant VRN/VRP and VREF pins.
5. Verify UCF for Master-Slave banks association.
6. Verify UCF when a signal is allocated to two pins.
7. Verify UCF when reserved pins used for pin allocation.
RLDRAMII Virtex-4 FPGA design:
1. Verify UCF for IDELAYCTRL LOCs.
2. Updated design using the updated UCF. It will result in unknown error messages.
3. Verify UCF for differential system clocks and differential memory signals allocated to
differential pair.
4. Verify UCF for vacant VRN/VRP and VREF pins.
5. Verify UCF when a signal is allocated to two pins.
6. Verify UCF when reserved pins used for pin allocation.
Error Messages
This section describes the different error messages that can be generated when verifying
the UCF.
The reference UCF must follow the MIG naming conventions (refer to the UCF generated
by MIG). For example, the Virtex-4 FPGA DDR2 SDRAM controller 0 should have
cntrl0_ddr2_dq[0] for data bits, and RLDRAM controller 0 should have cntrl0_rld2_dq[0]
for data bits.
• Uniqueness. If two signals are allocated to the same pins in the reference UCF, an error
message is listed in the directed file with a user-assigned name.
The error message format is “<signal_name1> and <I> are allocated to same pins.”
For example, if cntrl0_ddr2_dq[0] and cntrl0_ddr2_dqs[0] are allocated to same pin,
such as:
NET "cntrl0_ddr2_dq[0]" LOC = "D12" ;
NET "cntrl0_ddr2_dqs[0]" LOC = "D12" ;
Then the following error message is printed:
ERROR: cntrl0_ddr2_dq[0] and cntrl0_ddr2_dqs[0] are allocated to the
same pins. Pins are not unique.
• Association. Signals in the same group should be allocated in the same bank,
otherwise MIG reports error messages. For CIO parts (such as DDR2 SDRAM, DDR
SDRAM, RLDRAMII, and DDRII SRAM CIO), this rule is applied on data group pins.
For SIO parts (such as QDRII SRAM, and DDRII SRAM SIO), this rule is applied only
for data read group pins and not for data write group pins.
The error message format is "<signal_name1> and <signal_name2> are not allocated in
the same banks."
For example, DDR2 SDRAM:
NET "cntrl0_ddr2_dq[0]" LOC = "D12" ; #bank 6
NET "cntrl0_ddr2_dq[1]" LOC = "C12" ; #bank 6
NET "cntrl0_ddr2_dq[2]" LOC = "B10" ; #bank 6
NET "cntrl0_ddr2_dq[3]" LOC = "C10" ; #bank 7
Assume cntrl0_ddr2_dq[3] and cntrl0_ddr2_dq[2] are allocated to pins of different
banks, such as bank 7 and bank 6, respectively. The following error messages are
printed:
ERROR: cntrl0_ddr2_dq[0] (6) and cntrl0_ddr2_dq[3] (7) are not
allocated in the same banks
ERROR: cntrl0_ddr2_dq[1] (6) and cntrl0_ddr2_dq[3] (7) are not
allocated in the same banks
ERROR: cntrl0_ddr2_dq[2] (6) and cntrl0_ddr2_dq[3] (7) are not
allocated in the same banks
For example, SIO part - QDRII SRAM:
NET "qdr_q[0]" LOC = "K24" ; #Bank 19
NET "qdr_q[1]" LOC = "L24" ; #Bank 19
NET "qdr_q[2]" LOC = "L25" ; #Bank 19
NET "qdr_q[3]" LOC = "E29" ; #Bank 15
Assume qdr_q[3] and qdr_q[2] are allocated to pins of different banks, such as bank 15
and bank 19, respectively. The following error messages are printed:
ERROR: qdr_q[0] (19) and qdr_q[3] (15) are not allocated in the same
banks
ERROR: qdr_q[1] (19) and qdr_q[3] (15) are not allocated in the same
banks
ERROR: qdr_q[2] (19) and qdr_q[3] (15) are not allocated in the same
banks
These types of error messages are reported for each pair of signals of the same group,
but are allocated to different banks.
• Strobes vs. Data allocation for Virtex-4 FPGA DDR2 SDRAM SerDes designs. Data
(DQ) should be allocated within the same or one clock region above or below the
corresponding strobe (DQS) clock region. This rule applies for Virtex-4 FPGA DDR2
SDRAM SerDes designs only. If not, an error message is displayed.
For example, the XC4VLX100-FF1148 Virtex-4 FPGA DDR2 SDRAM SerDes design
has this pinout:
NET "cntrl0_ddr2_dq[0]" LOC = "F20" ; #Bank 1
NET "cntrl0_ddr2_dq[1]" LOC = "A15" ; #Bank 1
NET "cntrl0_ddr2_dq[2]" LOC = "B15" ; #Bank 1
NET "cntrl0_ddr2_dq[3]" LOC = "N19" ; #Bank 1
NET "cntrl0_ddr2_dqs[0]" LOC = "F13" ; #Bank 1
The clock region of cntrl0_ddr2_dq[0], cntrl0_ddr2_dq[1], and cntrl0_ddr2_dq[2] is
CLOCKREGION_X1Y8. The clock region of cntrl0_ddr2_dq[3] is
CLOCKREGION_X1Y5, while the clock region of cntrl0_ddr2_dqs[0] is
CLOCKREGION_X1Y6. The allowed clock regions for dq[0] to dq[7] are X1Y5, X1Y6,
and X1Y7. In this example, the data signals cntrl0_ddr2_dq[0], cntrl0_ddr2_dq[1], and
cntrl0_ddr2_dq[2] violate the rule. The MIG tool outputs the following error messages
for each signal that violates the clock region rule:
ERROR: cntrl0_ddr2_dq[0](1) is not allocated within the clock region
boundary of cntrl0_ddr2_dqs[0](1). (DQ should be in the same or one
above/below clock region of the corresponding DQS clock region).
ERROR: cntrl0_ddr2_dq[1](1) is not allocated within the clock region
boundary of cntrl0_ddr2_dqs[0](1). (DQ should be in the same or one
above/below clock region of the corresponding DQS clock region).
ERROR: cntrl0_ddr2_dq[2](1) is not allocated within the clock region
boundary of cntrl0_ddr2_dqs[0](1). (DQ should be in the same or one
above/below clock region of the corresponding DQS clock region).
• Clock Capable I/Os for strobes/read clock. Check for CC pins if Use CC for direct
clocking is clicked. In this case, the strobe/read_clock signals should be allocated to
the CC pins only. If not, an error message is displayed.
The error message format is “<signal_name> should be allocated to the CC Pins.” For
example, cntrl0_ddr2_dqs[0] is a strobe. Assume it is allocated to the K12 pin, which is
not a clock capable I/O pin. The following error message is printed:
ERROR: cntrl0_ddr2_dqs[0 should be allocated to the CC Pins.
• Absence of signals. If one or more signal-pin pair is missing and/or commented in the
given UCF against the selected inputs, the verification result indicates the absence of
those signal-pin pairs as a warning.
The warning message format is ”<signal_name> is forbidden in the given UCF against
the selected inputs.”
For example, assume the reference UCF has 8 bits (dq[0:7]), and the data width passed
through PRJ is 16 bits. While checking, MIG verifies only 8 bits and reports the other
expected bits as follows:
WARNING : cntrl0_ddr2_dq[8] is expected, but not present in the UCF.
WARNING : cntrl0_ddr2_dq[9] is expected, but not present in the UCF.
WARNING : cntrl0_ddr2_dq[10] is expected, but not present in the
UCF.
WARNING : cntrl0_ddr2_dq[11] is expected, but not present in the
UCF.
WARNING : cntrl0_ddr2_dq[12] is expected, but not present in the
UCF.
WARNING : cntrl0_ddr2_dq[13] is expected, but not present in the
UCF.
WARNING : cntrl0_ddr2_dq[14] is expected, but not present in the
UCF.
WARNING : cntrl0_ddr2_dq[15] is expected, but not present in the
UCF.
• Bank selection. If one or more banks are not selected and one or more pins from that
(those) bank(s) is (are) used for some purpose, an error message is printed.
The error message format is “<signal_name> (<signal_group>) should not be allocated
to bank <bank_number>. The rule is, it can only be moved within the bank(s)
“<bank_numbers>” specified in the input mig.prj file for the <signal_group> group.”
For example:
NET "cntrl0_ddr2_dqs[0]" LOC = "D12" ;#bank 6
Bank 6 is not selected for Data (as cntrl0_ddr2_dqs[0] from Data). Assume that
cntrl0_ddr2_dqs[0], which belongs to the strobe group, is allocated to a pin belonging
to bank 6. The following error message is printed:
ERROR: cntrl0_ddr2_dqs[0](Strobe) should not be allocated to bank 6.
The rule is, it can only be moved within the bank(s)
"11,13,15,17,19,21" specified in the input mig.prj file for "Data"
group.
UG086_c1_53_072108
Chapter 2
Feature Summary
Supported Features
The DDR SDRAM controller design supports the following:
• Burst lengths of two, four, and eight
• Sequential and interleaved burst types
• CAS latencies of 2, 2.5, and 3
• Precharge based on the row to be accessed or the precharge command given by the
user
• Registered DIMMs, unbuffered DIMMs, and SODIMMs
• Different memories (density/speed)
• Auto refresh
• Linear addressing
• VHDL and Verilog
• With and without a testbench
• With and without a DCM
• Data mask
• System clock, differential and single-ended
The supported features are described in more detail in “Architecture.”
Unsupported Features
• Dual Rank DIMMs
• Deep Memory
• Auto Precharge
• Bank Management
• Multi Controller
Architecture
Interface Model
DDR SDRAM interfaces are source-synchronous and double data rate. They transfer data
on both edges of the clock cycle. A memory interface can be modularly represented as
shown in Figure 2-1. A modular interface has many advantages. It allows designs to be
ported easily and also makes it possible to share parts of the design across different types
of memory interfaces.
Xilinx FPGA
Control Layer
Physical Layer
Memories
UG086_c2_01_012507
Implemented Features
This section provides details on the supported features of the DDR SDRAM controller.
Based on user selection, the tool generates a parameter file, which is used to set various
features of the memory and to generate the control signals accordingly.
The parameter file provides the settings for burst length, CAS latency, sequential or
interleaved addressing, number of row address bits, number of column address bits, bank
address, and the timing parameters based on the frequency and the speed grade selected
from the GUI. The DDR SDRAM controller uses these parameters directly.
Architecture
The user issues a command through the FIFOs (user_interface). The user address (i.e.,
APP_AF_ADDR that is written into the FIFO as shown in Figure 2-11 or Figure 2-13) is
decoded in a sequence. The total width of the Read/Write Address FIFO
(rd_wr_addr_fifo) is 36 bits. The user writes the column address (least-significant bits),
row address, bank address, chip address [31:0], and the command to be issued [34:32]. The
36th bit (APP_AF_ADDR[35]) is reserved by the design to manipulate whether or not the
row to be accessed is same as that of the previous row. The APP_AF_ADDR[35] input is a
don't care for the design. The controller takes the row and column address bits based on
the selected component. The “Write Interface” and “Read Interface” sections provide
further details on how to issue the write and read commands, respectively.
Table 2-2 lists the commands that the user can issue through the User interface. If the user
issues an invalid command, the state of the controller is undefined. The functionality is not
guaranteed when an invalid command is issued.
Burst Length
Bits M0:M3 of the Mode Register define the burst length and burst type. Read and write
accesses to the DDR SDRAM are burst-oriented. The burst length is programmable to
either 2, 4, or 8 from the GUI. It determines the maximum number of column locations
accessed for a given READ or WRITE command.
The DDR SDRAM ddr_controller module implements the user-selected burst length from
MIG.
CAS Latency
Bits M4:M6 of the Mode Register define the CAS latency (CL). CL is the delay in clock
cycles between the registration of a READ command and the availability of the first bit of
output data. CL can be set to 2, 2.5, or 3 clocks from the GUI.
The controller supports CAS latencies of 2, 2.5, and 3.
During read data operations, the generation of the read_en signal varies according to the
CL in the ddr_controller module.
Registered DIMMs
DDR SDRAM supports registered DIMMs. This feature is implemented in the
ddr_controller module. For registered DIMMs, the READ and WRITE commands and
address have one additional clock latency than unbuffered DIMMs. Also for registered
DIMMs, the controller delays the data and the strobe by one clock because the command
has one clock latency due to the register in the DIMM.
Precharge
The PRECHARGE command is issued before the next read or write is issued for a different
row, but not if the read or write is in the same row. The PRECHARGE command checks the
row address, bank address, and chip selects. The DDR Virtex-4 FPGA controller issues a
PRECHARGE command if there is a change in any address where a read or write
command is to be issued. The AUTO PRECHARGE command via the A10 column bit is
not supported.
Auto Refresh
The DDR SDRAM controller issues AUTO REFRESH commands at specified intervals for
the memory to refresh the charge required to retain the data in the memory. The user can
also issue a REFRESH command through the user interface by setting bits 34, 33, and 32 of
the app_af_addr signal in the user_interface module to 3’b001. If there is a refresh request
while during an ongoing read or write burst, the controller issues a REFRESH command
after completing the current read or write burst command.
Linear Addressing
The DDR SDRAM controller supports linear addressing. Linear addressing refers to the
way the user provides the address of the memory to be accessed. For Virtex-4 FPGA DDR
SDRAM controllers, the user provides the address information through the app_af_addr
signal. As the densities of the memory devices vary, the number of column address bits
and row address bits also changes. In any case, the row address bits in the app_af_addr
signal always start from the next-higher bit, where the column address ends. This feature
increases the number of devices that can be supported with the design.
Data Mask
MIG supports a data mask option. If this option is checked in the GUI, MIG generates a
design with data mask pins. This option can be chosen if the selected part has data
masking.
Architecture
System Clock
MIG supports a differential or single-ended system clock option. Based on the selection in
the GUI, input system clocks and IDELAY clocks are differential or single-ended.
Table 2-3 lists the timing parameters for components, and Table 2-4 lists the timing
parameters for DIMMs.
Table 2-4: Timing Parameters for DIMMs (Unbuffered and Registered) (Cont’d)
Micron 128 MB Micron 256 MB Micron 512 MB Micron 1 GB
Parameter Description
-40 -40 -40 -40
TRCD ACTIVE to READ or WRITE 15 ns 15 ns 15 ns 15 ns
Delay
TRAS ACTIVE to PRECHARGE 40 ns 40 ns 40 ns 40 ns
Command
TRC ACTIVE to ACTIVE (Same 55 ns 55 ns 55 ns 55 ns
Bank) Command
TWTR WRITE to READ Command 2 * TCK 2 * TCK 2 * TCK 2 * TCK
Delay
TWR WRITE Recovery Time 15 ns 15 ns 15 ns 15 ns
Note: For the latest timing information, refer to the vendor memory data sheets.
Architecture
Hierarchy
Figure 2-2 shows the hierarchical structure of the DDR SDRAM design generated by MIG
with a testbench and a DCM. The physical and control layers are clearly separated in this
figure. MIG generates the entire DDR SDRAM controller as shown in this hierarchy,
including the testbench. MIG also generates a parameter file where all user input
parameters or some parameters used internally by the design are defined.
<top_
module>
test_
top*
bench*
rd_wr_
v4_dq_ v4_dm_ v4_dqs_ tap_ data_ rd_data pattern_ wr_data
addr_
iob iob iob ctrl* tap_inc* _fifo* compare _fifo
fifo*
Figure 2-2: Hierarchical Structure of the Virtex-4 FPGA DDR SDRAM Design
Design clocks and resets are generated in the infrastructure module. The DCM clock is
instantiated in the infrastructure module for designs with a DCM. The inputs to this
module are the differential design clock and a 200 MHz differential clock for the
IDELAYCTRL module. A user reset is also input to this module. Using the input clocks and
reset signals, the system clocks and the system reset are generated in this module, which is
used in the design.
The DCM primitive is not instantiated in this module if the Use DCM option is unchecked.
So, the system operates on the user-provided clocks. The system reset is generated in the
infrastructure module using the dcm_lock input signal.
Architecture
clk200
clk200_p sys_rst_rt idelay_ctrl idelay_ctrl_rdy
clk200_n
System
Clocks sys_clk_p
clk_0 ddr_ras_n
and Reset Infrastructure
sys_clk_n
clk_90 ddr_cas_n
sys_reset_in_n
sys_rst ddr_we_n
sys_rst90 ddr_cs_n
ddr_cke
ddr_dm
main_0 ddr_ba Memory
Device
ddr_a
UG086_c2_03_092908
Figure 2-3: Top-Level Block Diagram of the DDR SDRAM Design with a DCM and a Testbench
The error output signal indicates whether the case passes or fails. The testbench module
does writes and reads, and also compares the read data with the written data. The error
signal is driven High on data mismatches.
The init_done signal indicates the completion of initialization and calibration of the design.
All the signals listed under the Memory Device category do not necessarily appear in the
top-level block port list. The port list varies according to the memory type selected, such as
a component or a registered DIMM. For example, a component does not have the
ddr_reset_n signal.
Figure 2-4 shows a block diagram representation of the top-level module for a design with
a DCM but without a testbench.
clk200
clk200_p sys_rst_rt idelay_ctrl idelay_ctrl_rdy
clk200_n
System
Clocks sys_clk_p
Infrastructure clk_0 ddr_ras_n
and Reset sys_clk_n
clk_90 ddr_cas_n
sys_reset_in_n sys_rst ddr_we_n
sys_rst90 ddr_cs_n
ddr_cke
app_af_addr
ddr_dm
app_af_wren
ddr_ba Memory
app_wdf_data Device
ddr_a
app_mask_data
ddr_ck
app_wdf_wren top_0 ddr_ck_n
wdf_almost_full
User ddr_dq
Application af_almost_full
ddr_dqs
burst_length_div2
ddr_reset_n
read_data_valid
read_data_fifo_out
clk_tb
reset_tb
init_done
UG086_c2_04_091708
Figure 2-4: Top-Level Block Diagram of the DDR SDRAM Design with a DCM but without a Testbench
The DCM clock module is instantiated in the infrastructure module. Using the differential
sys_clk_p and sys_clk_n signals, the internal DCM generates all the required clocks for the
design. The differential clk200_p and clk200_n are used by the idelay_ctrl element. The
active-Low system reset signal is sys_reset_in_n. All design resets are generated using the
input reset signal gated by the dcm_lock signal.
The init_done signal indicates the completion of initialization and calibration of the design.
The application’s user interface signals are listed in Figure 2-4. The design provides the
clk_tb and reset_tb signals to the user to synchronize with the design.
Architecture
Figure 2-5 shows a block diagram representation of the top-level module for a design
without a DCM or a testbench. There is no DCM instantiated in the infrastructure module.
All the clocks and dcm_lock should be given as inputs from the user interface. Resets are
generated using the sys_reset_in_n signal gated by the dcm_lock signal in the
infrastructure module. Clk200 is used by the idelay_ctrl element. All the clocks should be
single-ended. The user application must have a DCM primitive instantiated in the design.
The init_done signal indicates the completion of initialization and calibration of the design.
The user interface signals are also listed in the <top_module> module. The design
provides the clk_tb and reset_tb signals to the user to synchronize with the design.
clk_200
idelay_ctrl_rdy
System idelay_ctrl
Reset sys_rst_r1
and User clk_0
DCM
clk_90 Infrastructure
Clocks sys_rst
sys_reset_in_n
sys_rst90 ddr_ras_n
dcm_lock
ddr_cas_n
ddr_we_n
ddr_cs_n
app_af_addr ddr_cke
app_af_wren ddr_dm
app_wdf_data ddr_ba Memory
Device
app_mask_data ddr_a
app_wdf_wren top_0 ddr_ck
wdf_almost_full ddr_ck_n
User af_almost_full ddr_dq
Application ddr_dqs
burst_length_div2
read_data_valid ddr_reset_n
read_data_fifo_out
clk_tb
reset_tb
init_done
UG086_c2_05_091708
Figure 2-5: Top-Level Block Diagram of the DDR SDRAM Design without a DCM or a Testbench
Figure 2-6 shows a block diagram representation of the top-level module for a design with
a testbench but without a DCM. The user should provide all the clocks and the dcm_lock
signal. These clocks should be single-ended. The active-Low system reset signal is
sys_reset_in_n. All design resets are gated by the dcm_lock signal.
clk_200
idelay_ctrl_rdy
System idelay_ctrl
Reset sys_rst_rt
and User clk_0
DCM
clk_90 Infrastructure
Clocks sys_rst ddr_ras_n
sys_reset_in_n
sys_rst90 ddr_cas_n
dcm_lock
ddr_we_n
ddr_cs_n
ddr_cke
ddr_dm
Memory
ddr_ba Device
main_0 ddr_a
error ddr_ck
Status
Signals init_done ddr_ck_n
ddr_dq
ddr_dqs
DDR_RESET_N
ddr_reset_n
UG086_c2_06_091708
Figure 2-6: Top-Level Block Diagram of the DDR SDRAM Design with a Testbench but without a DCM
The error output signal indicates whether the case passes or fails. The testbench module
does writes and reads, and also compares the read data with the written data. The ERROR
signal is driven High on data mismatches. The init_done signal indicates the completion of
initialization and calibration of the design.
Architecture
Figure 2-7 shows the expanded block diagram of the design. The top module is expanded
to show various internal blocks. The functions of these blocks are explained in the
subsections following the figure.
clk_200
idelay_ctrl
reset idelay_ctrl_rdy
sys_clk_p
sys_clk_n
sys_clk
clk200_p
Infrastructure
clk200_N
Clocks and Resets
idly_clk_200 clk
sys_reset_in_n clk_n
init_done
clk_tb ba
reset_tb ctrl_ddr_address address
af_addr
wdf_almost_full ctrl_ddr_ba ras_n
af_empty
af_almost_full ctrl_ddr_ras_l cas_n
ctrl_af_rden
Controller ctrl_ddr_cas_l IOBs
burst_length_div2[2:0] ctrl_wdf_rden we_n
read_data_valid ctrl_ddr_we_l cke_n
burst_length_div2
ctrl_ddr_cs_l cs_n
read_data_fifo_out ctrl_rden
User Interface ctrl_ddr_cke
app_af_addr
app_af_wren dq
dqs
app_wdf_data[2n:0] dqs_delayed
mask_data dm
app_mask_data[2m-1:0] data_idelay_inc
wdf_data reset_n
app_wdf_wren data_idelay_ce
data_idelay_rst
dqs_idelay_inc
dqs_idelay_ce
dqs_idelay_rst
Datapath
rising_first
dqs_rst
dqs_en
wr_en
wr_data_rise
wr_data_fall
mask_data_rise
mask_data_fall
UG086_c2_07_091708
Controller
The DDR SDRAM controller initializes the memory, accepts and decodes user commands,
and generates READ, WRITE, and REFRESH commands. The DDR SDRAM controller also
generates signals for other modules. The memory is initialized and powered-up using a
defined process. The controller state machine handles the initialization process upon
power-up. If the AUTO REFRESH command is to be issued between any user read or write
commands, then the read or write command is suspended until the ref_done flag is
deasserted.
Datapath
This module transmits data to the memories. Its major functions include storing the write
data and calculating the tap value for the read datapath. The data_write and
data_path_IOBs modules do the actual write functions. The Idelay_ctrl, tap_ctrl and
data_tap_inc modules do the calibration.
User Interface
This module stores write data in its Write Data FIFO (wr_data_fifo), stores write and read
addresses in its Read/Write Address FIFO (rd_wr_addr_fifo), and stores received read
data from memory in its Read Data FIFO (rd_data_fifo). The width of the Write Data FIFO
is twice the data width and mask width of the memory. For example, for a 16-bit width, the
width of the FIFO is 36 because the data width is 32 and the mask width is 4. The
rd_wr_addr_fifo and wr_data_fifo modules store the data and address in block RAMs. The
rd_data_fifo module captures the data in the LUT-based RAMs.
The FIFOs are built using FIFO16 primitives in the rd_wr_addr_fifo, wr_data_fifo_16, and
wr_data_fifo_8 modules. FIFO16 has two FIFO threshold attributes,
ALMOST_EMPTY_OFFSET and ALMOST_FULL_OFFSET, that are set to 7 and F,
respectively, in the RTL by default. These values can be altered to match the application.
For valid FIFO threshold offset values, refer to UG070 [Ref 7].
The controller also generates user commands, such as READ and WRITE.
The pattern_compare module registers the delay between the command and the data
received from the IOBs. This delay is then applied to the Rden signal generated from the
ddr_controller module during the actual read to register the valid data in the internal
FIFOs.
Test Bench
The MIG tool generates two RTL folders, example_design and user_design. The
example_design folder includes the synthesizable test bench, while user_design does not
include the test bench modules. The MIG test bench performs eight write commands and
eight read commands in an alternating fashion. The number of words in a write command
depends on the burst length. For a burst length of 4, the test bench writes a total of 32 data
words for all eight write commands (16 rise data words and 16 fall data words). For a burst
length of 8, the test bench writes a total of 64 data words. It writes the data pattern of FF,
00, AA, 55, 55 AA, 99, 66 in a sequence of which FF, AA, 55, and 99 are rise data words and
00, 55, AA, and 66 are fall data words for an 8-bit design. The falling edge data is the
complement of the rising edge data. For a burst length of 4, the data sequence for the first
write command is FF, 00, AA, 55, and the data sequence for the second write command is
55, AA, 99, 66. For a burst length of 8, the data pattern for the first write command is FF,
00, AA, 55, 55 AA, 99, 66 and the same pattern is repeated for all the remaining write
commands. This data pattern is repeated in the same order based on the number of data
Architecture
words written. For data widths greater than 8, the same data pattern is concatenated for
the other bits. For a 32-bit design and a burst length of 8, the data pattern for the first write
command is FFFFFFFF, 00000000, AAAAAAAA, 55555555, 55555555, AAAAAAAA,
99999999, 66666666.
Address generation logic generates eight different addresses for eight write commands.
The same eight address locations are repeated for the following eight read commands. The
read commands are performed at the same locations where the data is written. There are
total of 32 different address locations for 32 write commands, and the same address
locations are generated for 32 read commands. Upon completion of a total of 64
commands, including both writes and reads (eight writes and eight reads repeated four
times), address generation rolls back to the first address of the first write command and the
same address locations are repeated. The MIG test bench exercises only a certain memory
area. The address is formed such that all address bits are exercised. During writes, a new
address is generated for every burst operation on the column boundary.
During reads, comparison logic compares the read pattern with the pattern written, i.e., the
FF, 00, AA, 55, 55 AA, 99, 66 pattern. For example, for an 8-bit design of burst length 4, the
data written for a single write command is FF, 00, AA, 55. During reads, the read pattern is
compared with the FF, 00, AA, 55 pattern. Based on a comparison of the data, a status
signal error is generated. If the data read back is the same as the data written, the error
signal is 0, otherwise it is 1.
Infrastructure
The infrastructure module generates the FPGA clock and reset signals. When differential
clocking is used, sys_clk_p, sys_clk_n, clk_200_p, and clk_200_n signals appear. When
single-ended clocking is used, sys_clk and idly_clk_200 signals appear. In addition, clocks
are available for design use and a 200 MHz clock is provided for the idelayctrl primitive.
Differential and single-ended clocks are passed through buffers before connecting to a
DCM. For differential clocking, the output of the sys_clk_p/sys_clk_n buffer is single-
ended and is provided to the DCM input. Likewise, for single-ended clocking, sys_clk is
passed through a buffer and its output is provided to the DCM input. The clock outputs of
the DCM are clk_0 and clk_90. After the DCM is locked, the design is in the reset state for
at least 25 clocks. The infrastructure module also generates all of the reset signals required
for the design.
Idelay_ctrl
This module instantiates the IDELAYCTRL primitive of the Virtex-4 FPGA. The
IDELAYCTRL primitive is used to continuously calibrate the individual delay elements in
its region to reduce the effect of process, temperature, and voltage variations. A 200 MHz
clock has to be fed to this primitive.
The MIG tool instantiates the required number of IDELAYCTRLs in the RTL and uses the
LOC constraints in the UCF file to fix their locations. The number of IDELAYCTRLs is
defined by the IDELAYCTRL_NUM parameter in the idelay_ctrl module. In the RTL,
IDELAY_CTRL_RDY is generated by doing a logical AND of the RDY signals of every
IDELAYCTRL block.
IDELAYCTRL LOC constraints should be checked in the following cases:
• The MIG design is used with other IP cores or user designs that also require the use of
IDELAYCTRL and IDELAYs.
• Previous ISE® software releases 8.2.03i and 9.1i had an issue with IDELAYCTRL block
replication or trimming. When using these revisions of the ISE software, the user must
instantiate and constrain the location of each IDELAYCTRL individually.
See UG070 [Ref 7] for more information on the requirements of IDELAYCTRL placement.
IOBS Module
All DDR SDRAM address, control, and data signals are transmitted and received in the
through the input and output buffers.
Power Up
200 μs Delay
Clocking Scheme
Figure 2-9, page 111 shows the clocking scheme for this design. Global and local clock
resources are used.
The global clock resources consist of a DCM, two global clock buffers (BUFG) on DCM
output clocks, and one BUFG for clk_200. The local clock resources consist of regional I/O
clock networks (BUFIO). The global clock architecture is discussed in this section.
The MIG tool allows the user to customize the design such that the DCM is not included.
In this case, clk_0, clk_90 and IDELAYCTRL clock clk_200 must be supplied by the user.
Notes:
1. See “User Interface Accesses,” page 114 for timing requirements and restrictions on the user interface
signals.
Clocking Scheme
CLK_FB clk_90
CLK90
BUFG
ug086_c2_13_072108
Table 2-6: DDR SDRAM System Interface Signals for Designs with DCM
Signal Name Direction Description
sys_clk_p, sys_clk_n Input Differential input clock to the DCM. The DDR SDRAM controller
and memory operate on this frequency.
sys_reset_in_n Input Active-Low reset to the DDR SDRAM controller.
clk200_p, clk200_n Input Differential clock used in the idelay_ctrl logic.
Table 2-7 shows the system interface signals for designs without the DCM. The clk_0,
clk_90, and clk_200 signals are the single-ended input clocks. The clk_90 signal must have
a phase difference of 90° with respect to clk_0. The clk_200 signal is the clock used for the
IDELAYCTRL primitives in Virtex-4 FPGAs.
Table 2-7: System Interface Signals for Designs without the DCM
Signal Direction Description
clk_0 Input The DDR SDRAM controller and memory operate on this clock.
sys_reset_in_n Input Active-Low reset to the DDR SDRAM controller. This signal is used
to generate a synchronous system reset.
clk_90 Input 90° phase-shifted clock with the same frequency as clk0.
clk_200 Input 200 MHz differential input clock for the IDELAYCTRL primitive of
the Virtex-4 FPGA.
dcm_lock Input The status signal indicating whether the DCM is locked or not. It is
used to generate the synchronous system reset.
Table 2-8 describes the DDR SDRAM user interface signals for designs without the
testbench.
Table 2-8: DDR SDRAM User Interface Signals for Designs without the Testbench Case
Signal Name (1) Direction Description
clk_tb Output All user interface signals must be synchronized with respect to
clk_tb.
reset_tb Output Active-High system reset for the user interface.
burst_length_div2[2:0] Output Indicates the number of bursts that can be written to or read from
the memory.
001: burst length = 2
010: burst length = 4
100: burst length = 8
Table 2-8: DDR SDRAM User Interface Signals for Designs without the Testbench Case (Cont’d)
Signal Name (1) Direction Description
read_data_valid Output Status of the Read Data FIFO. This signal is asserted when read data
is available in the Read Data FIFO.
read_data_fifo_out Output Read data from memory, where n is the data width of the interface.
[2n–1:0] The read data is stored into the Read Data FIFO. This data can be
read from the FIFO depending upon the status of the FIFO.
wdf_almost_full Output ALMOST FULL status of the Write Data FIFO. When this signal is
asserted, the user can write 5 more locations into the FIFO in designs
generated with a testbench and 14 more locations in designs without
a testbench.
af_almost_full Output ALMOST FULL status of the Read Address FIFO. The user can issue
eight more locations into the FIFO after AF_ALMOST_FULL is
asserted.
app_af_addr[35:0] (2) Input Memory address and command. Bit 35 is used internally by the
controller. The controller ignores this bit from the user interface. Bits
[34:32] are used for dynamic commands as follows:
001: Auto Refresh
010: Precharge
100: Write
101: Read
Bits [31:0] form the memory chip select, bank address, row address,
and column address. The positioning of the chip, bank, row, and
column addresses changes based on the memory configuration.
app_af_wren Input Write-enable signal to the Write Address FIFO. This signal is
synchronized with the write address. The write address is written to
the Write Address FIFO only when this signal is asserted High.
app_mask_data[2m–1:0] Input User mask data, where m indicates the data mask width of the
interface. The mask data is twice the mask width of the interface.
The mask data is written into the Write Data FIFO along with the
write data.
app_wdf_data[2n–1:0] Input User write data to the memory, where n indicates the data width of
the interface. The user write data is twice the data width of the
interface. The most-significant bits contain the rising-edge data, and
the least-significant bits contain the falling-edge data. Memory write
data is written into the Write Data FIFO, and the write address is
written into the Write Address FIFO from the user interface. The
DDR SDRAM controller reads the Write Address FIFO and Write
Data FIFO.
Table 2-8: DDR SDRAM User Interface Signals for Designs without the Testbench Case (Cont’d)
Signal Name (1) Direction Description
app_wdf_wren Input Write-enable signal to the Write Data FIFO. This signal is
synchronized with the write data. The write data is written to the
Write Data FIFO only when this signal is asserted High.
Notes:
1. All user interface signal names are prepended with a controller number, for example, cntrl0_APP_WDF_DATA. DDR SDRAM
devices currently support only one controller. See “User Interface Accesses,” page 114 for timing requirements and restrictions on
the user interface signals.
2. Linear addressing is used, i.e., the row address immediately follows the column address bits, and the bank address follows the row
address bits, thus supporting more devices. The number of address bits used depends on the density of the memory part. The
controller ignores the unused bits, which can all be tied High.
Table 2-9 describes the status signals that are available to the user.
Write Interface
Figure 2-10 shows the user interface block diagram for write operations.
app_af_addr
User Interface af_addr
Address FIFO
app_af_wren (FIFO16) af_empty
512 x 36 Controller
af_almost_full ctrl_af_rden
ug086_c2_11_110607
asserted when Address FIFO is full, and similarly wdf_almost_full is asserted when
Write Data FIFO is full.
7. Both the Address FIFO and Write Data FIFO Full flags are deasserted with power-on.
8. The user should assert the Address FIFO write-enable signal app_af_wren along with
address app_af_addr to store the write address and write command into the Address
FIFO.
9. The user should assert the Data FIFO write-enable signal app_wdf_wren along with
write data app_wdf_data and mask data app_mask_data to store the write data and
mask data into the Write Data FIFO. The user should provide both rise and fall data
together for each write to the Data FIFO.
10. The controller reads the Address FIFO by issuing the ctrl_af_rden signal. The
controller reads the Write Data FIFO by issuing the ctrl_wdf_rden signal after the
Address FIFO is read. It decodes the command part after the Address FIFO is read.
11. The write command timing diagram in Figure 2-11 is derived from the MIG-generated
testbench. As shown (burst length of 4), each write to the Address FIFO must be
coupled with two writes to the Data FIFO. Similarly, for a burst length of 8, every write
to the Address FIFO must be coupled with four writes to the Data FIFO. Failure to
follow this rule can cause unpredictable behavior.
clk_tb
reset_tb
wdf_almost_full
init_done
app_wdf_wren
app_af_wren
app_af_addr A0 A1 A2 A3
ug086_c2_09_091708
Figure 2-11: DDR SDRAM Write Burst for Four Bursts (BL = 4)
Read Interface
Figure 2-12 shows a block diagram of the read interface.
app_af_addr
User Interface af_addr
Address FIFO
app_af_wren (FIFO16) af_empty
Controller
512 x 36
af_almost_full ctrl_af_rden
Read Data
FIFO
read_data_fifo_out RAM16 x 1D
read_data_rise
ug086_c2_12_110607
The following steps describe the architecture of the Read Data FIFOs and show how to
perform a burst read operation from DDR SDRAM from the user interface.
1. The read user interface consists of an Address FIFO and a Read Data FIFO. The
Address FIFO is common to both read and write operations. These FIFOs are
constructed using Virtex-4 FPGA Distributed RAMs with a 16 x 1 configuration. MIG
instantiates a number of RAM16Ds depending on the data width. For example, for
8-bit data width, MIG instantiates a total of 16 RAM16Ds, 8 for rising-edge data and 8
for falling-edge data. Similarly, for 72-bit data width, MIG instantiates a total of 144
RAM16Ds, 72 for rising-edge data and 72 for falling-edge data.
2. The user can initiate a read to memory by writing to the Address FIFO when the
FIFO Full flag af_almost_full is deasserted.
3. To write the read address and read command into the Address FIFO, the user should
issue the Address FIFO write-enable signal app_af_wren along with read address
app_af_addr.
4. The controller reads the Address FIFO containing the address and command. After
decoding the command, the controller generates the appropriate control signals to
memory.
5. Prior to the actual read and write commands, the design calibrates the latency (number
of clock cycles) from the time the read command is issued to the time data is received.
Using this pre-calibrated delay information, the controller generates the write-enable
signals to the Read Data FIFOs.
6. The read_data_valid signal is asserted when data is available in the Read Data FIFOs.
7. Figure 2-13 shows a user interface timing diagram for a burst length of 4, CAS latency
of 3 at 175 MHz, and a Trcd value of the memory part at 20 ns. The read latency is
calculated from the point when the Read command is given by the user to the point
when the data is available with the read_data_valid signal. The minimum latency in
this case is 26 clocks, where no precharge is required, no auto-refresh request is
pending, the user commands are issued after initialization is completed, and the first
command issued is a Read command. The controller executes the commands only
after initialization is done as indicated by the init_done signal.
CLK_TB
app_af_wren
app_af_addr A0 A1 A2 A3
read_data_valid
read_data_fifo_out
D2D3 D6D7 D10D11 D14D15 UG086_c2_10_050610
26 clocks
Figure 2-13: DDR SDRAM Read Burst for Four Bursts (BL = 4)
8. After the address and command are loaded into the Address FIFO, it takes 26 clock
cycles minimum for CL = 3 at a frequency of 175 MHz for the controller to assert the
read_data_valid signal.
9. Read data is available only when the read_data_valid signal is asserted. The user
should access the read data on every positive edge of the read_data_valid signal.
The read latency for the case where (1) CL = 3, (2) the read is written to an empty
address/command FIFO, (3) the read targets an unopened bank/row, and (4) the
frequency is 175 MHz, is broken down as indicated in Table 2-10.
Note: Timing has been verified for most of the MIG generated configurations. For the best timing
results, adjacent banks in the same column of the FPGA should be used. Banks that are separated
by unbonded banks should be avoided because these can cause timing violations.
Simulation Violations
There might be simulation violations for frequencies such as 150 MHz where the clock
period is not an integer value. At 150 MHz, the clock period value in the simulation
testbench is 6.66 ns and the MIG tool rounds it to 6.67 ns. Consider a memory TRCD value
of 20 ns. MIG calculates the TRCD count value based on the clock period,
RCD_COUNT_VALUE = 20/6.67 = 2.998 = 3 (after rounding off) in the design parameter
file. The TRCD value for 3 clock cycles is 3 × 6.66 = 19.98, which causes timing violations by
20 ps. The difference between the clock period in the external simulation testbench versus
the MIG tool causes timing violations. This is only one example case. There might be more
such scenarios. These are only simulation warnings. Functionally, there should be no
issues. To remove these warnings, the related count value can be increased by one.
Supported Devices
Supported Devices
The design generated out of MIG is independent of the memory package, hence the
package part of the memory component is replaced with XX or XXX, where XX or XXX to
indicate a don't care condition. The tables below list the components (Table 2-12) and
DIMMs (Table 2-13 through Table 2-15) supported by MIG for DDR SDRAM. In supported
devices, XX in the memory component column denotes either single or two alphanumeric
characters. For example, MT46V32M4XX-75 can be either MT46V32M4P-75 or
MT46V32M4BN-75. An X in the DIMM columns (for Unbuffered, Registered, and SO
DIMMs) denotes a single alphanumeric character. For example, MT9VDDF3272X-40B can
be either MT9VDDF3272G-40B or MT9VDDF3272Y-40B. Similarly MT4VDDT1664AX-40B
can be either MT4VDDT1664AG-40B or MT4VDDT1664AY-40B. Pin mapping for x4
RDIMMs is provided in Appendix G, “Low Power Options.”
Chapter 3
Interface Model
DDR2 SDRAM interfaces are source-synchronous and double data rate. They transfer data
on both edges of the clock cycle. A memory interface can be modularly represented as
shown in Figure 3-1. A modular interface has many advantages. It allows designs to be
ported easily and also makes it possible to share parts of the design across different types
of memory interfaces.
Xilinx FPGA
Control Layer
Physical Layer
Memories
UG086_c3_01_033105
Direct-Clocking Interface
Feature Summary
This section summarizes the supported and unsupported features of the direct-clocking
DDR2 SDRAM controller design.
Supported Features
The DDR2 SDRAM controller design supports the following:
• Burst lengths of four and eight
• Sequential and interleaved burst types
• CAS latencies of 3, 4, and 5
• Additive latencies of 0, 1, and 2
• Differential and single-ended DQS
• On-Die Termination (ODT)
• Up to four deep memories
• Memory components
• Registered DIMMs (up to 240 MHz)
• Unbuffered DIMMs (up to 200 MHz)
• Unbuffered SODIMMs (up to 200 MHz)
• Different memories (density/speed)
• Byte-wise data masking
• Precharge and auto refresh
• Linear addressing
• ECC support
• Verilog and VHDL
• With and without a testbench
• With and without a DCM
• Multicontrollers (up to eight)
• Data Mask
• System clock, differential and single-ended
The supported features are described in more detail in “Architecture.”
Direct-Clocking Interface
Unsupported Features
The DDR2 SDRAM controller design does not support:
• Additive latencies of 3 and 4
• Redundant DQS (RDQS)
• Unbuffered DIMMs (greater than 200 MHz)
• Unbuffered SODIMMs (greater than 200 MHz)
Architecture
Implemented Features
This section provides details on the supported features of the DDR2 SDRAM controller.
Burst Length
The DDR2 SDRAM controller supports burst lengths of four and eight. The burst length
can be selected through the Set mode register(s) option from the GUI. For a design
without a testbench (user design), the user has to provide bursts of the input data based on
the chosen burst length. Bits M2:M0 of the Mode Register define the burst length, and bit
M3 indicates the burst type (see the Micron data sheet). Read and write accesses to the
DDR2 SDRAM are burst-oriented. It determines the maximum number of column
locations accessed for a given READ or WRITE command.
CAS Latency
The DDR2 SDRAM controller supports CAS latencies (CLs) of three and four. CL can be
selected in the Set mode register(s) option from the GUI. The CAS latency is
implemented in the ddr2_controller module. During data write operations, the generation
of the ctrl_Dqs_En and ctrl_Dqs_Rst signals varies according to the CL in the
ddr2_controller module. During data read operations, the generation of the ctrl_RdEn
signal varies according to the CL in the ddr2_controller module. Bits M4:M6 of the Mode
Register define the CL (see the Micron data sheet). CL is the delay in clock cycles between
the registration of a READ command and the availability of the first bit of output data.
Additive Latency
DDR2 SDRAM devices support a feature called posted CAS additive latency (AL). The
DDR2 SDRAM supports additive latencies of 0, 1, and 2. AL can be selected in the Set
mode register(s) option from the GUI. Additive latency is implemented in the
ddr2_controller module. The ddr2_controller module issues READ/WRITE commands
prior to tRCD (minimum) depending on the user-selected AL value in the Extended Mode
Register. This feature allows the READ command to be issued prior to tRCD (minimum) by
delaying the internal command to the DDR2 SDRAM by AL clocks. Posted CAS AL makes
the command and data bus efficient for sustainable bandwidths in DDR2 SDRAM. Bits
E3:E5 of the Extended Mode Register define the value of AL (see the Micron data sheet).
Registered DIMMs
DDR2 SDRAM supports registered DIMMs. This feature is implemented in the
ddr2_controller module. For registered DIMMs, the READ and WRITE commands and
address have one additional clock latency than unbuffered DIMMs.
Multicontrollers
MIG supports multicontrollers for DDR2 SDRAMs. A maximum of eight controllers can be
selected by the user from the tool. In multicontroller designs, MIG supports the same
frequency for all the controllers.
TRTP READ to PRECHARGE command delay 7.5 7.5 7.5 7.5 7.5 7.5
Direct-Clocking Interface
TRAS ACTIVE to 40 40 40 40 40 40 40 40 40 40
PRECHARGE command
TRTP READ to PRECHARGE 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5 7.5
command delay
Note: For the latest timing information, refer to the vendor memory data sheets.
Data Masking
The DDR2 SDRAM design supports data masking per byte. Masking per nibble is not
supported due to the limitation of the internal block RAM based FIFOs. So, the masking of
data can be done on a per byte basis. The mask data is stored in the Data FIFO along with
the actual data.
MIG supports a data mask option. If this option is checked in the GUI, MIG generates a
design with data mask pins. This option can be chosen if the selected part has data
masking. DDR2 SDRAM designs do not support read-modify-write operations in ECC
mode. The mask bits to the SDRAM should never be asserted while in the ECC mode.
Thus, when ECC is selected, the data masking selection is disabled in the GUI.
Precharge
The PRECHARGE command is used to close the open row in a bank if there is a command
to be issued in the same bank. The PRECHARGE command checks the row address, bank
address, and chip address, and the Virtex-4 FPGA DDR2 controller issues a PRECHARGE
command if there is a change in any of the addresses where a read or write command is to
be issued. The auto precharge function is not supported.
Auto Refresh
The DDR2 SDRAM controller issues AUTO REFRESH commands at specified intervals for
the memory to refresh the charge required to retain the data in the memory. The user can
also issue a REFRESH command through the user interface by setting bits 34, 33, and 32 of
the app_af_addr signal in the user_interface module to 3’b001. If there is a refresh request
while there is an ongoing read or write burst, the controller issues a refresh command after
completing the current read or write burst command.
Linear Addressing
The DDR2 SDRAM controller supports linear addressing. Linear addressing refers to the
way the user provides the address of the memory to be accessed. For Virtex-4 FPGA DDR2
SDRAM controllers, the user provides address information through the app_af_addr bus.
As the densities of the memory devices vary, the number of column address bits and row
address bits also change. In any case, the row address bits in the app_af_addr bus always
start from the next higher bit, where the column address ends. This feature increases the
number of devices that can be supported with the design.
On-Die Termination
The DDR2 SDRAM controller supports on-die termination (ODT). Through the Set mode
register(s) option from the GUI, the user can disable ODT or can choose 75, 150, or 50.
ODT can turn the termination on and off as needed to improve signal integrity in the
system. Because DDR2 supports the deep memory maximum of four, a maximum of four
ODTs is supported. Several examples follow:
1. Four single-rank DIMMs or components populated in four different slots. If the user selects
deep memory = 4, the memory component sequence is 0, 1, 2, and 3. During write
operations, the ODT is enabled for component 3 when writing into 0, 1, or 2, otherwise
it is enabled for component 2 when writing into component 3. During read operations,
the ODT is enabled for component 3 when reading from 0, 1, or 2, otherwise it is
enabled for component 2 for reading from component 3.
Two dual-rank DIMMs populated in two different slots. Rank 1 and rank 2 of slot 1 are
referred to as CS0 and CS1. Rank 1 and rank 2 of slot 2 are referred to as CS2 and CS3.
ODT is enabled for CS0 when writing into CS2 or CS3 and enabled for CS2 when
writing into CS0 or CS1. ODT is enabled for CS0 when reading from CS2 or CS3 and
enabled for CS2 when reading from CS0 or CS1. ODT0, ODT1, ODT2, and ODT3
should be connected to the ODT signals of CS0, CS1, CS2, and CS3, respectively.
2. Three single-rank DIMMs or components populated in three different slots. If the user selects
deep memory = 3, the memory component sequence is 0, 1, and 2. During write
operations, the ODT is enabled for component 2 when writing into 0 or 1, otherwise it
is enabled for component 1 when writing into component 2. During read operations,
the ODT is enabled for component 2 when reading from 0 or 1, otherwise it is enabled
for component 1 for reading from component 2. ODT0, ODT1, and ODT2 should be
connected to the ODT signals of CS0, CS1, and CS2, respectively.
3. Two single-rank DIMMs or components populated in two different slots. If the user selects
deep memory = 2, the memory component sequence is 0 and 1. During write
operations, the ODT is enabled for component 1 when writing into 0, otherwise it is
enabled for component 0 when writing into component 1. During read operations, the
ODT is enabled for component 1 when reading from 0, otherwise it is enabled for
component 0 for reading from component 1.
One single dual-rank DIMM is populated in single slot. Rank 1 and rank 2 of slot 1 are
referred as CS0 and CS1. ODT is enabled for CS0 when writing into CS0 or CS1. During
read operations, the ODT is disabled. ODT0 and ODT1 should be connected to the
ODT signals of CS0 and CS1, respectively.
Direct-Clocking Interface
4. If the user selects deep memory = 1, the memory component sequence is 0. During
write operations, the ODT is enabled for component 0 when writing into 0. During
read operations, the ODT is disabled. ODT0 should be connected to the ODT signal of
CS0.
Table 3-4 shows ODT control during write operations.
Notes:
1. Dual rank.
2. Single rank.
Notes:
1. Dual rank.
2. Single rank.
Deep Memories
The MIG DDR2 SDRAM controller supports depths up to 4. Through the Depth option, the
user can select various deep values. For deep memory implementations, MIG generates
chip selects, CKE signals, and ODT signals for each memory. The clock widths (ck and
ck_n) are a multiple factor of the deep configuration chosen in MIG. This feature increases
the depth of the memory. For example, if the user selects a 256 Mb component and deep
memory = 4 from MIG, the tool generates a memory interface for a 1 Gb design.
Deep memory logic is implemented in the ddr2_ controller module. With deep memories,
DDR2 SDRAMs are initialized one after the other to avoid loading the address and control
buses, and the calibration is done on the last memory. Apart from initialization, the DDR2
SDRAM controller module also demultiplexes the column, row, and bank addresses from
the user address. The module also decodes the chip selects and rank addresses for
components and DIMMs.
The formats of user read/write addresses for a 256 Mb component and 2 GB and 4 GB
DIMMs are given in “Deep Memory Configurations.”
ECC Support
The DDR2 SDRAM controller supports ECC. ECC is supported for the following data
widths:
• 40-bit (32-bit data and a 0 prepended to 7-bit parity)
• 72-bit (64-bit data and 8-bit parity)
• 144-bit (128-bit data and 16-bit parity)
The user can completely disable the ECC or can generate the design for the above data
widths by choosing either the Unpipeline mode or the Pipeline mode from the GUI.
ECC is based on XAPP645 [Ref 17]. The design can detect and correct all single bit errors,
and it can detect double bit errors in the data. This design utilizes Hamming code for the
ECC operations. The Pipeline mode improves the frequency performance at the cost of an
extra pipeline stage.
System Clock
MIG supports differential and single-ended system clocks. Based on the selection in the
GUI, input system clocks and IDELAY clocks are differential or single-ended.
Direct-Clocking Interface
Hierarchy
Figure 3-2 shows the hierarchical structure of the DDR2 SDRAM controller.
<top_
module>
test_
top*
bench*
rd_wr_
v4_dq_ v4_dm_ v4_dqs_ tap_ data_ encoder rd_data pattern_ wr_data decoder
addr_
iob iob iob ctrl* tap_inc* _32/64 _fifo* compare _fifo _32/64
fifo*
Figure 3-2 shows the hierarchical structure of the DDR2 SDRAM design generated by MIG
with a testbench and a DCM. The modules are classified as follows:
• Design modules
• Testbench modules
• Clocks and reset generation modules
There is a parameter file generated with the design that has all the user input and design
parameters selected from MIG.
MIG can generate four different DDR2 SDRAM designs:
• With a testbench and a DCM
• Without a testbench and with a DCM
• With a testbench and without a DCM
• Without a testbench and without a DCM
A design without a testbench (user_design) does not have testbench modules. The
<top_module> module has the user interface signals for designs without a testbench. The
list of user interface signals is provided in Table 3-9, page 142.
Design clocks and resets are generated in the infrastructure module. The DCM clock is
instantiated in the infrastructure module for designs with a DCM. The inputs to this
module are the differential design clock and a 200 MHz differential clock for the
IDELAYCTRL module. A user reset is also input to this module. Using the input clocks and
reset signals, the system clocks and the system reset are generated in this module, which is
used in the design.
The DCM primitive is not instantiated in this module if the Use DCM option is unchecked.
So, the system operates on the user-provided clocks. The system reset is generated in the
infrastructure module using the dcm_lock input signal.
For ECC enabled designs, the corresponding pink shaded modules are present in the
design. ECC data is generated from these modules.
Direct-Clocking Interface
clk200
clk200_p sys_rst200 idelay_ctrl idelay_ctrl_rdy
clk200_n
System ddr2_ras_n
Clocks sys_clk_p
clk_0
and Reset Infrastructure ddr2_cas_n
sys_clk_n
clk_90
ddr2_we_n
sys_reset_in_n
sys_rst
ddr2_cs_n
sys_rst90
ddr2_odt
ddr2_cke
ddr2_dm
main_0
ddr2_ba Memory
ddr2_a Device
Status error
ddr2_ck
Signals init_done
ddr2_ck_n
ddr2_dq
ddr2_dqs
ddr2_dqs_n
ddr2_reset_n
UG086_c3_03_091508
Figure 3-3: Top-Level Block Diagram of the DDR2 SDRAM Design with a DCM and a Testbench
All Memory Device ports do not necessarily appear for all MIG-generated designs. For
example, port ddr2_reset_n appears in the port list for Registered DIMM designs only.
Similarly, ddr2_dqs_n does not appear for single-ended DQS designs. Port DDR2_DM
appears only for parts that contain a data mask; a few RDIMMs have no data mask, and
DDR2_DM does not appear in the port list for them.
Figure 3-4 shows a top-level block diagram of a DDR2 SDRAM design with a DCM but
without a testbench. The sys_clk_p and sys_clk_n pair are differential input system clocks.
A DCM is instantiated in the infrastructure module that generates the required design
clocks. The differential clk200_p and clk200_n are used for the idelay_ctrl element. The
active-Low system reset signal is sys_reset_in_n. All design resets are gated by the
dcm_lock signal. The user has to drive the user application signals. The design provides
the CLK_TB and RESET_TB signals to the user to synchronize with the design. The
INIT_DONE signal indicates the completion of initialization and calibration of the design.
CLK200
clk200_p sys_rst200 idelay_ctrl idelay_ctrl_rdy
clk200_n
System
Clocks sys_clk_p
Infrastructure clk_0 ddr2_ras_n
and Reset sys_clk_n
clk_90 ddr2_cas_n
sys_reset_in_n sys_rst ddr2_we_n
sys_rst90 ddr2_cs_n
ddr2_odt
app_af_addr
ddr2_cke
app_af_wren
ddr2_dm
app_wdf_data Memory
ddr2_ba
Device
app_mask_data
ddr2_a
app_wdf_wren top_0 ddr2_ck
wdf_almost_full
User ddr2_ck_n
Application af_almost_full
ddr2_dq
burst_length_div2
ddr2_dqs
read_data_valid
ddr2_dqs_n
read_data_fifo_out
ddr2_reset_n
clk_tb
reset_tb
init_done
UG086_c3_04_091508
Figure 3-4: Top-Level Block Diagram of the DDR2 SDRAM Design with a DCM but without a Testbench
Direct-Clocking Interface
Figure 3-5 shows a top-level block diagram of a DDR2 SDRAM design without a DCM or
a testbench. The user should provide all the clocks and the dcm_lock signal. These clocks
should be single-ended. The active-Low system reset signal is sys_reset_in_n. All design
resets are gated by the dcm_lock signal. The user application must have a DCM primitive
instantiated in the design. All user clocks should be driven through BUFGs. The user has to
drive the user application signals. The design provides the CLK_TB and RESET_TB signals
to the user to synchronize with the design. The INIT_DONE signal indicates the
completion of initialization and calibration of the design.
clk_200
System idelay_ctrl
sys_rst200 idelay_ctrl_rdy
Reset
and User clk_0
DCM
clk_90 Infrastructure
Clocks sys_rst ddr2_ras_n
sys_reset_in_n
sys_rst90 ddr2_cas_n
dcm_lock
ddr2_we_n
ddr2_cs_n
ddr2_odt
app_af_addr ddr2_cke
app_af_wren ddr2_dm
Memory
app_wdf_data ddr2_ba Device
app_mask_data ddr2_a
top_0
app_wdf_wren ddr2_ck
wdf_almost_full ddr2_ck_n
User af_almost_full ddr2_dq
Application ddr2_dqs
burst_length_div2
read_data_valid ddr2_dqs_n
read_data_fifo_out ddr2_reset_n
clk_tb
reset_tb
init_done
UG086_c3_05_091508
Figure 3-5: Top-Level Block Diagram of the DDR2 SDRAM Design without a DCM or a Testbench
Figure 3-6 shows a top-level block diagram of a DDR2 SDRAM design with a testbench but
without a DCM. The user should provide all the clocks and the dcm_lock signal. These
clocks should be single-ended. The active-Low system reset signal is sys_reset_in_n. All
design resets are gated by the dcm_lock signal. The user application must have a DCM
primitive instantiated in the design. All user clocks should be driven through BUFGs. The
ERROR output signal indicates whether the case passes or fails. The testbench module
does writes and reads, and also compares the read data with the written data. The ERROR
signal is driven High on data mismatches. The INIT_DONE signal indicates the
completion of initialization and calibration of the design.
clk_200
System idelay_ctrl
sys_rst200 idelay_ctrl_rdy
Reset
and User clk_0
DCM
clk_90 Infrastructure
Clocks sys_rst ddr2_ras_n
sys_reset_in_n
sys_rst90 ddr2_cas_n
dcm_lock
ddr2_we_n
ddr2_cs_n
ddr2_odt
ddr2_cke
ddr2_dm
main_0 ddr2_ba Memory
error ddr2_a Device
Status
Signals init_done ddr2_ck
ddr2_ck_n
ddr2_dq
ddr2_dqs
ddr2_dqs_n
ddr2_reset_n
UG086_c3_06_091508
Figure 3-6: Top-Level Block Diagram of the DDR2 SDRAM Design with a Testbench but without a DCM
Direct-Clocking Interface
Module burst_length_div2
init_done
clk_tb
reset_tb
Virtex-4 FPGA
UG086_c3_07_091508
Controller
The DDR2 SDRAM ddr2_controller accepts and decodes user commands and generates
read, write, and refresh commands. The DDR2 SDRAM controller also generates signals
for other modules. The memory is initialized and powered-up using a defined process. The
controller state machine handles the initialization process upon power-up. After memory
initialization, the controller issues dummy read commands. During dummy reads, the
tap_logic module calibrates and delays the data to center-align with the FPGA clock. After
the calibration is done, the controller issues a dummy write and pattern read commands to
get the delay between the read command and IOB output data.
The delay, calculated in number of clocks, is used as a write-enable signal to the read data
FIFOs. For deep designs, the DQ calibration and pattern calibration are done only on the
last memory. For example, for four deep designs, the fourth memory is used for
calibration. There is no reason to use the fourth memory because the controller retains the
last chip select during initialization of memory. Thus the same chip select is used for
calibration. XAPP701 [Ref 18] provides more details about the calibration architecture.
User Interface
This module stores write data, write addresses, and read addresses in FIFOs and receives
read data from the memory. The rd_data and rd_data_fifos modules capture the data in the
LUT-based RAMs. The rd_wr_addr_fifo and wr_data_fifo modules store the data and
address in block RAMs. The FIFOs are built using FIFO16 primitives in rd_wr_addr_fifo,
wr_data_fifo_16, and wr_data_fifo_8 modules. Each FIFO has two FIFO threshold
attributes, ALMOST_EMPTY_OFFSET and ALMOST_FULL_OFFSET, that are set to 7 and
F, respectively, in the RTL by default. These values can be changed as needed. For valid
FIFO threshold offset values, refer to UG070 [Ref 7].
Once the calibration is done, the controller issues a pattern_write command with a known
pattern (0xAA559966) to the memory. Then the controller issues a pattern_read command
from the same location and compares the read data with the known pattern in the
pattern_compare8 or the pattern_compare4 module. During the pattern_read command,
the controller generates the ctrl_rden signal, which is delayed in the pattern_compare
module to synchronize with the read data. This delay is applied to the ctrl_rden signal
generated from the ddr2_controller module during a normal read to register the valid data
in the internal FIFOs.
The FIRST_RISING logic is implemented in the pattern_compare module. FIRST_RISING
is asserted when the first data is captured with respect to the falling edge of FPGA clock.
This signal is used in rd_data_fifo to swap rise and fall data. In addition to the ODDR used
to register output data (DQ) in each I/O, a second ODDR in the IOB controls the 3-state
enable for the I/O. This is used to enable the write data output one-half clock cycle before
the first data word and disable the write data one-half clock cycle after the last data word.
Test Bench
The MIG tool generates two RTL folders, example_design and user_design. The
example_design folder includes the synthesizable test bench, while user_design does not
include the test bench modules. The MIG test bench performs eight write commands and
eight read commands in an alternating fashion. The number of words in a write command
depends on the burst length. For a burst length of 4, the test bench writes a total of 32 data
words for all eight write commands (16 rise data words and 16 fall data words). For a burst
length of 8, the test bench writes a total of 64 data words. It writes the data pattern of FF,
00, AA, 55, 55 AA, 99, 66 in a sequence of which FF, AA, 55, and 99 are rise data words and
00, 55, AA, and 66 are fall data words for an 8-bit design. The falling edge data is the
complement of the rising edge data. For a burst length of 4, the data sequence for the first
write command is FF, 00, AA, 55, and the data sequence for the second write command is
55, AA, 99, 66. For a burst length of 8, the data pattern for the first write command is FF,
00, AA, 55, 55 AA, 99, 66 and the same pattern is repeated for all the remaining write
commands. This data pattern is repeated in the same order based on the number of data
words written. For data widths greater than 8, the same data pattern is concatenated for
the other bits. For a 32-bit design and a burst length of 8, the data pattern for the first write
Direct-Clocking Interface
Infrastructure
The infrastructure module generates the FPGA clock and reset signals. When differential
clocking is used, sys_clk_p, sys_clk_n, clk_200_p, and clk_200_n signals appear. When
single-ended clocking is used, sys_clk and idly_clk_200 signals appear. In addition, clocks
are available for design use and a 200 MHz clock is provided for the IDELAYCTRL
primitive. Differential and single-ended clocks are passed through global clock buffers
before connecting to a DCM. For differential clocking, the output of the
sys_clk_p/sys_clk_n buffer is single-ended and is provided to the DCM input. Likewise,
for single-ended clocking, sys_clk is passed through a buffer and its output is provided to
the DCM input. The outputs of the DCM are clk_0 (0° phase-shifted version of the input
clock) and clk_90 (90° phase-shifted version of the input clock). After the DCM is locked,
the design is in the reset state for at least 25 clocks. The infrastructure module also
generates all of the reset signals required for the design.
Idelay_ctrl
This module instantiates the IDELAYCTRL primitive of the Virtex-4 FPGA. The
IDELAYCTRL primitive is used to continuously calibrate the individual delay elements in
its region to reduce the effect of process, temperature, and voltage variations. A 200 MHz
clock has to be fed to this primitive. For more information on IDELAYCTRLs, refer to
“Verify IDELAYCTRL Instantiation for Virtex-4 and Virtex-5 FPGA Designs” in Chapter
14.
The sel_done port in the tap_logic module indicates the completion of the per-bit
calibration. After the per-bit calibration is done, the controller does a read enable
calibration. This calibration is used to determine the delay from read command to read
data at rd_data_fifo. The delay between read command and read data is affected by the
CAS latency and additive latency parameters, the PCB traces, and the I/O buffer delays.
This in turn is used to generate a write enable to rd_data_fifo so that valid data is
registered. The controller writes a known fixed pattern and reads back the data. The read
data is compared against the known fixed pattern. The comp_done port in the rd_data
module indicates the completion of the read enable calibration.
The init_done port indicates the completion of both per-bit calibration and read enable
calibration. After initialization and calibration is done, the controller can start issuing user
commands to the memory.
Direct-Clocking Interface
Notes:
1. See “User Interface Accesses,” page 143 for timing requirements and restrictions on the user interface
signals.
CLK_FB clk_90
CLK90
BUFG
ug086_c3_13_072108
Table 3-7: DDR2 SDRAM Controller System Interface Signals (with a DCM)
Signal Name Direction Description
sys_clk_p, sys_clk_n Input Differential input clock to the DCM. The DDR2 controller and
memory operate at this frequency.
clk200_p, clk200_n Input Differential clock used in the idelay_ctrl logic.
sys_reset_in_n Input Active-Low reset to the DDR2 controller.
Table 3-8 shows the system interface signals for designs without a DCM. clk_0, clk_90, and
clk_200 are single-ended input clocks. The clk_90 signal must have a phase difference of
90° with respect to clk_0. The clk_200 signal is the clock used for the IDELAYCTRL
primitives in Virtex-4 FPGAs.
Table 3-8: DDR2 SDRAM Controller System Interface Signals (without a DCM)
Signal Direction Description
clk_0 Input The DDR2 SDRAM controller and memory operates on this clock.
sys_reset_in_n Input Active-Low reset to the DDR2 SDRAM controller. This signal is used to
generate the synchronous system reset.
clk_90 Input 90° phase-shifted clock with the same frequency as clk0.
clk_200 Input 200 MHz differential input clock for the IDELAYCTRL primitive of Virtex-4
FPGAs.
dcm_lock Input This status signal indicates whether the DCM is locked or not. It is used to
generate the synchronous system reset.
Direct-Clocking Interface
is used during calibration to hold the training patterns for the various stages of
calibration.
• When issuing a write command, the first write data word must be written to the Write
Data FIFO no more than two clock cycles after the write command is issued. This
restriction arises from the fact that the controller assumes write data is available when
it receives the write command from the user.
• The clk_tb signal is connected to clk_0 in the controller. If the user clock domain is
different from clk_0 / clk_tb of the MIG, the user should add FIFOs for all data inputs
and outputs of the controller in order to synchronize them to the clk_tb.
Write Interface
Figure 3-9 shows the user interface block diagram for write operations.
app_af_addr
User Interface af_addr
Address FIFO
app_af_wren (FIFO16) af_empty
512 x 36 Controller
af_almost_full ctrl_af_rden
ug086_c3_26_110707
The following steps describe the architecture of the Address and Write Data FIFOs and
show how to perform a write burst operation to DDR2 SDRAM from the user interface.
1. The user interface consists of an Address FIFO and a Write Data FIFO. These FIFOs are
constructed using Virtex-4 FPGA FIFO16 primitives with a 512 x 36 configuration. The
36-bit architecture comprises one 32-bit port and one 4-bit port. For Write Data FIFOs,
the 32-bit port is used for data bits and the 4-bit port is used for mask bits. Mask bits
are available only when supported by the memory part and when data mask is enabled
in the MIG GUI. Some memory parts, such as Registered DIMMs of x4 parts, do not
support mask bits.
2. The Common Address FIFO is used for both write and read commands, and comprises
a command part and an address part. Command bits discriminate between write and
read commands.
3. User interface data width app_wdf_data is twice that of the memory data width. For
an 8-bit memory width, the user interface is 16 bits consisting of rise data and fall data.
Direct-Clocking Interface
For every 8 bits of data, there is a mask bit. For 72-bit memory data, the user interface
data width app_wdf_data is 144 bits, and the mask data app_mask_data is 18 bits.
4. The minimum configuration of the Write Data FIFO is 512 x 36 for a memory data
width of 8 bits. For an 8-bit memory data width, the least-significant 16 bits of the data
port are used for write data and the least-significant two bits of the 4-bit port are used
for mask bits. The controller internally pads all zeros for the most-significant 16 bits of
the 32-bit port and the most-significant two bits of the 4-bit port.
5. Depending on the memory data width, MIG instantiates multiple FIFO16s to gain the
required width. For designs using 8-bit data width, one FIFO16 is instantiated; for
72-bit data width, a total of five FIFO16s are instantiated. The bit architecture
comprises 16 bits of rising-edge data, 2 bits of rising-edge mask, 16 bits of falling-edge
data, and 2 bits of falling-edge mask, which are all stored in a FIFO16. MIG routes the
app_wdf_data and app_mask_data to FIFO16s accordingly.
6. The user can initiate a write to memory by writing to the Address FIFO and the Write
Data FIFO when the FIFO Full flags are deasserted and after the init_done signal is
asserted. Status signal af_almost_full is asserted when Address FIFO is full, and
similarly wdf_almost_full is asserted when Write Data FIFO is full.
7. Both the Address FIFO and Write Data FIFO Full flags are deasserted with power-on.
8. The user should assert the Address FIFO write-enable signal app_af_wren along with
address app_af_addr to store the write address and write command into the Address
FIFO.
9. The user should assert the Data FIFO write-enable signal app_wdf_wren along with
write data app_wdf_data and mask data app_mask_data to store the write data and
mask data into the Write Data FIFO. The user should provide both rise and fall data
together for each write to the Data FIFO.
10. The controller reads the Address FIFO by issuing the ctrl_af_rden signal. The
controller reads the Write Data FIFO by issuing the ctrl_wdf_rden signal after the
Address FIFO is read. It decodes the command part after the Address FIFO is read.
clk_tb
reset_tb
wdf_almost_full
init_done
app_af_wren
app_af_addr[35:0] A0 A1 A2 A3
app_wdf_wren
Figure 3-10: DDR2 SDRAM Write Burst (BL = 4) for Four Bursts
11. The write command timing diagram in Figure 3-10 is derived from the MIG-generated
testbench. As shown (burst length of 4), each write to the Address FIFO must be
coupled with two writes to the Data FIFO. Similarly, for a burst length of 8, every write
to the Address FIFO must be coupled with four writes to the Data FIFO. Failure to
follow this rule can cause unpredictable behavior.
Note: The user can start filling the Write Data FIFO two clocks after the Address FIFO is
written, because there is a two-clock latency between the command fetch and reading the Data
FIFO. Using the terms shown in Figure 3-11, therefore, the user can assert the A0 address two
clocks before D0D1.
Direct-Clocking Interface
clk_tb
reset_tb
wdf_almost_full
init_done
app_af_wren
app_af_addr[35:0] A0 A1
app_wdf_wren
Figure 3-11: DDR2 SDRAM Write Burst (BL = 8) for Two Bursts
12. The write command timing diagram in Figure 3-11 is derived from the MIG-generated
testbench. As shown (burst length of 8), each write to the Address FIFO must be
coupled with four writes to the Data FIFO. Because the controller first reads the
address and command together, the address need not coincide with the last data. After
the command is analyzed (nearly two clocks later for a worst-case timing scenario), the
controller sequentially reads the data in four clocks. Thus, there are six clocks from the
time the address is read to the time the last data is read.
Read Interface
Figure 3-12 shows a block diagram of the read interface.
app_af_addr
User Interface af_addr
Address FIFO
app_af_wren (FIFO16) af_empty
Controller
512 x 36
af_almost_full ctrl_af_rden
Read Data
FIFO
read_data_fifo_out RAM16 x 1D
read_data_rise
ug086_c3_27_110607
The following steps describe the architecture of the Read Data FIFOs and show how to
perform a burst read operation from DDR SDRAM from the user interface.
1. The read user interface consists of an Address FIFO and a Read Data FIFO. The
Address FIFO is common to both read and write operations. The Read Data FIFOs are
constructed using Virtex-4 FPGA Distributed RAMs with a 16 x 1 configuration. MIG
instantiates a number of RAM16Ds depending on the data width. For example, for
8-bit data width, MIG instantiates a total of 16 RAM16Ds, 8 for rising-edge data and 8
for falling-edge data. Similarly, for 72-bit data width, MIG instantiates a total of 144
RAM16Ds, 72 for rising-edge data and 72 for falling-edge data.
2. The user can initiate a read to memory by writing to the Address FIFO when the
FIFO Full flag af_almost_full is deasserted and after init_done is asserted.
3. To write the read address and read command into the Address FIFO, the user should
issue the Address FIFO write-enable signal app_af_wren along with read address
app_af_addr.
4. The controller reads the Address FIFO containing the address and command. After
decoding the command, the controller generates the appropriate control signals to
memory.
5. Prior to the actual read and write commands, the design calibrates the latency (number
of clock cycles) from the time the read command is issued to the time data is received.
Using this pre-calibrated delay information, the controller generates the write-enable
signals to the Read Data FIFOs.
6. The read_data_valid signal is asserted when data is available in the Read Data FIFOs.
Direct-Clocking Interface
clk_tb
reset_tb
af_almost_full
app_af_wren
app_af_addr[35:0] A0 A1 A2 A3
read_data_valid
25 clocks UG086_c3_10_091508
Figure 3-13: DDR2 SDRAM Read Burst (BL = 4) for Four Bursts
7. Figure 3-13 shows the user interface timing diagram for a burst length of 4, and
Figure 3-14 shows the user interface timing diagram for a burst length of 8. Both the
cases shown here are for a CAS latency of 3 at 200 MHz. The read latency is calculated
from the point when the read command is given by the user to the point when the data
is available with the read_data_valid signal. The minimum latency in this case is
25 clocks, where no precharge is required, no auto-refresh request is pending, the user
commands are issued after initialization is completed, and the first command issued is
a Read command. Controller executes the commands only after initialization is done
as indicated by the init_done signal.
8. After the address and command are loaded into the Address FIFO, it takes 25 clock
cycles minimum for the controller to assert the read_data_valid signal.
9. Read data is available only when the read_data_valid signal is asserted. The user
should access the read data on every positive edge of the read_data_valid signal.
clk_tb
reset_tb
af_almost_full
app_af_wren
app_af_addr [35:0] A0 A1
read_data_valid
25 clocks UG086_c3_11_091508
Figure 3-14: DDR2 SDRAM Read Burst (BL = 8) for Two Bursts
The 25 clocks from the read command to the read data, as shown in Figure 3-13 and
Figure 3-14, are broken up as indicated in Table 3-10.
Direct-Clocking Interface
The memory address (af_addr) includes the column address, row address, bank address,
and chip-select width for deep memory interfaces.
Column Address
[column_address – 1:0]
Row Address
[column_address + row_address – 1:column_address]
Bank Address
[column_address + row_address + bank_address – 1:column_address +
row_address]
Chip Select
[column_address + row_address + bank_address + chip_address – 1:
column_address + row_address + bank_address]
Figure 3-15 describes four consecutive writes followed by four consecutive reads with a
burst length of 8.
top_00/clk_0
top_00/af_empty_w
top_00/ctrl_af_rden
top_00/ctrl_wr_df_rden
UG086_c3_12_042507
Figure 3-15: Consecutive Writes Followed by Consecutive Reads with Burst Length of 8
Direct-Clocking Interface
Table 3-13: Signals between the Controller and Physical Layer (Cont’d)
Port
Port Name Port Description Notes
Width
ctrl_Dqs_En 1 Output from the controller to the This signal is asserted for three clock
physical layer for a write strobe. cycles during a write with a burst length
of four and five clock cycles with a burst
length of 8. The CAS latency and AL
values determine how many clock cycles
after the first write or burst write state this
signal is asserted. Figure 3-16 shows the
timing waveform for this signal with CAS
latency of 3 and AL of 0 for four back-to-
back writes with a burst length of 8.
ctrl_WrEn 1 Output from the controller to the This signal is asserted for two clock cycles
physical layer for write data three- during a write with a burst length of 4 and
state control. for four clock cycles with a burst length of
8. The CAS latency and AL values
determine how many clock cycles after
the first write or burst write state this
signal is asserted. Figure 3-16 shows the
timing waveform for this signal with CAS
latency of 3 and AL of 0 for four back-to-
back writes with a burst length of 8.
Figure 3-16 describes the timing waveform for control signals from the controller to the
physical layer.
top_00/clk_0
Additive Latency 0
CAS Latency 3
top_00/ctrl_wr_en
top_00/ctrl_rden
top_00/ctrl_dqs_enable
top_00/ctrl_dqs_reset
UG086_c3_13_042507
Figure 3-16: Timing Waveform for Control Signals from the Controller to the Physical Layer
Components
Case 1: 256 Mb (x4 component)
35 32 31 29 28 27 26 25 24 12 11 10 9 0
Direct-Clocking Interface
35 32 31 28 27 26 25 24 23 11 10 9 0
35 32 31 30 29 28 27 25 24 11 10 9 0
Direct-Clocking Interface
DIMMs
Case 1: 2 GB
Density 1 GB (1 x 2 = 2 GB)
Depth 2
Row address 14
Column address 10
Bank address 3
Rank/chip + deep address 2
35 32 31 30 29 28 27 25 24 11 10 9 0
Case 2: (8 GB)
Density 4 GB (4 x 2 = 8 GB)
Depth 2
Row address 14
Column address 11
Bank address 3
Rank/chip + deep address 2
35 32 31 30 29 28 26 25 12 11 10 9 0
Note: Timing has been verified for most of the MIG generated configurations. For the best timing
results, adjacent banks in the same column of the FPGA should be used. Banks that are separated
by unbonded banks should be avoided because these can cause timing violations.
Direct-Clocking Interface
Deep-Design Violations
Simulation violations occur for a depth of 4 at frequencies less than and equal to 127 MHz.
An auto-refresh command is issued during memory initialization. The next auto-refresh
command is issued upon the auto-refresh request after completing calibration
(INIT_DONE asserted). Because the controller issues the refresh command to the memory
only after calibration is completed, even though there is a pending auto-refresh request, a
MAX TRFC violation occurs for CS0 for a depth of 4 and at frequencies between 125 MHz to
127 MHz. These violations can be ignored because there are no read and write commands
issued to the memory, i.e., CS0.
Simulation Violations
There might be simulation violations for frequencies such as 150 MHz where the clock
period is not an integer value. At 150 MHz, the clock period value in the simulation
testbench is 6.66 ns and the MIG tool rounds it to 6.67 ns. Consider a memory TRCD value
of 20 ns. MIG calculates the TRCD count value based on the clock period,
RCD_COUNT_VALUE = 20/6.67 = 2.998 = 3 (after rounding off) in the design parameter
file. The TRCD value for 3 clock cycles is 3 × 6.66 = 19.98, which causes timing violations by
20 ps. The difference between the clock period in the external simulation testbench versus
the MIG tool causes timing violations. This is only one example case. There might be more
such scenarios. These are only simulation warnings. Functionally, there should be no
issues. To remove these warnings, the related count value can be increased by one.
Supported Devices
The design generated out of MIG is independent of memory package, hence the package
part of the memory component is replaced with XX or XXX, where XX or XXX indicates a
don't care condition. The tables below list the components (Table 3-15) and DIMMs
(Table 3-16 through Table 3-18) supported by the tool for DDR2 direct-clocking designs. In
supported devices, an X in the components column (for Components and Unbuffered
DIMMs) denotes a single alphanumeric character. For example MT47H128M4XX-3 can be
either MT47H128M4BP-3 or MT47H128M4B6-3. Similarly MT16HTF25664AX-40E can be
either MT16HTF25664AY-40E or MT16HTF25664AG-40E. An XX for Registered DIMMs
denotes a single or two alphanumeric characters. For example, MT9HTF3272XX-667 can be
either MT9HTF3272Y-667 or MT9HTF3272DY-667. An XXX for Registered DIMMs denotes
two or three alphanumeric characters. For example, MT18HTF12872XXX-667 can be either
MT18HTF12872DY-667 or MT18HTF12872PDY-667. Pin mapping for x4 RDIMMs is
provided in Appendix G, “Low Power Options.”
Direct-Clocking Interface
Feature Summary
This section summarizes the supported and unsupported features of the SerDes clocking
DDR2 SDRAM controller design.
Supported Features
The DDR2 SDRAM controller design supports:
• Burst lengths of four and eight
• Sequential and Interleaved burst types
• CAS latencies of 4 and 5
• Different memories (density/speed)
• Components
• Additive latencies 0, 1, and 2
• Verilog and VHDL
• Differential and single-ended DQS
• Linear addressing
• Without a testbench
• On Die Termination (ODT)
• DIMMs (registered DIMMs up to 300 MHz and unbuffered DIMMs up to 266 MHz)
• Data mask
• System clock, differential and single-ended
The supported features are described in more detail in “Architecture.”
Unsupported Features
The DDR2 SDRAM controller design does not support:
• CAS latency of 3
• Additive latencies of 3 and 4
• Redundant DQS (RDQS)
• Auto precharge
• Deep memories
• ECC support
• Without a DCM
• Multicontroller
Architecture
Implemented Features
This section provides details on the supported features of the DDR2 SDRAM controller.
Burst Length
The DDR2 SDRAM controller supports burst lengths of four and eight. The burst length
can be selected through the Set mode register(s) option in MIG. For a design without a
testbench (user design), the user has to provide bursts of the input data based on the
chosen burst length. Bits M2:M0 of the Mode Register define the burst length, and bit M3
indicates the burst type (see the Micron data sheet). Read and write accesses to the DDR2
SDRAM are burst-oriented. It determines the maximum number of column locations
accessed for a given READ or WRITE command.
CAS Latency
The DDR2 SDRAM controller supports CAS latencies (CLs) of four and five. CL can be
selected in the Set mode register(s) option from the GUI. The CAS latency is
implemented in the ddr2_controller module. During data write operations, the generation
of the ctrl_WrEn, ctrl_WrEn_Dis, and ctrl_Odd_Latency signals varies according to the CL
in the ddr2_controller module. During data read operations, the generation of the
ctrl_RdEn_div0 signal varies according to the CL in the ddr2_controller module. Bits
M4:M6 of the Mode Register define the CL (see the Micron data sheet). CL is the delay in
clock cycles between the registration of a READ command and the availability of the first
bit of output data.
Additive Latency
DDR2 SDRAM devices support a feature called posted CAS additive latency (AL). The
DDR2 SDRAM supports additive latencies of 0, 1, and 2. AL can be selected in the Set
mode register(s) option. Additive latency is implemented in the ddr2_controller module.
The ddr2_controller module issues READ/WRITE commands prior to tRCD (minimum)
depending on the user-selected AL value in the Extended Mode Register. This feature
allows the READ command to be issued prior to tRCD (minimum) by delaying the internal
command to the DDR2 SDRAM by AL clocks. Posted CAS AL makes the command and
data bus efficient for sustainable bandwidths in DDR2 SDRAM. Bits E3:E5 of the Extended
Mode Register define the value of AL (see the Micron data sheet).
Registered DIMMs
DDR2 SDRAM supports registered DIMMs. This feature is implemented in the
ddr2_controller module. For registered DIMMs, the address and command signals are
registered at the DIMM and therefore have one additional clock latency than unbuffered
DIMMs.
Notes:
1. For the latest timing information, refer to the vendor memory data sheets.
Data Masking
The DDR2 SDRAM design supports data masking per byte. Masking per nibble is not
supported due to the limitation of the internal block RAM based FIFOs. So, the masking of
data can be done on per byte basis. The mask data is stored in the Data FIFO along with the
actual data.
MIG supports a data mask option. If this option is checked in the GUI, MIG generates
design with data mask pins. This option can be chosen if the selected part has data
masking.
Precharge
The PRECHARGE command is used to close the open row in a bank if there is a command
to be issued to a different row in the same bank. The PRECHARGE command checks the
row address, bank address, and chip address, and the Virtex-4 FPGA DDR2 controller
issues a PRECHARGE command if there is a change in any address where a read or write
command is to be issued. The auto-precharge function is not supported.
Auto Refresh
The DDR2 SDRAM controller issues AUTO REFRESH commands at specified intervals for
the memory to refresh the charge required to retain the data in the memory. The user can
also issue a REFRESH command through the user interface by setting bits 34, 33, and 32 of
the app_af_addr signal in the user_interface module to 3’b001. If there is a refresh request
while there is an ongoing read or write burst, the controller issues a REFRESH command
after completing the current read or write burst command.
Linear Addressing
The DDR2 SDRAM controller supports linear addressing. Linear addressing refers to the
way the user provides the address of the memory to be accessed. For Virtex-4 FPGA DDR2
SDRAM controllers, the user provides the address information through the app_af_addr
signal. As the densities of the memory devices vary, the number of column address bits
and row address bits also change. In any case, the row address bits in the app_af_addr
signal always start from the next higher bit, where the column address ends. This feature
increases the number of devices that can be supported with the design.
On-Die Termination
The DDR2 SDRAM controller supports on-die termination (ODT). Through the Set mode
register(s) option from the GUI, the user can disable ODT or can choose 75, 150, or 50.
ODT can turn the termination on and off as needed to improve the signal integrity in the
system. ODT is only enabled on writes to DDR2 memory. It is disabled on read operations.
System Clock
MIG supports differential and single-ended system clocks. Based on the selection in the
GUI, input system clocks and IDELAY clocks are differential or single-ended.
Hierarchy
Figure 3-17 shows the hierarchical structure of the DDR2 SDRAM controller.
<top_
module>
test_
top*
bench*
idelay_ rd_wr_
v4_dq_ v4_dm_ v4_dqs_ tap_ data_ wr_data rd_data
rd_en_ addr_
iob iob iob ctrl* tap_inc* _fifo _fifo*
io fifo*
Design Modules
Test Bench Modules RAM_D
Clocks and Reset Generation Modules
Note: A block with a * has a parameter file included. UG086_c3_14_091207
Figure 3-17: Hierarchical Structure of the DDR2 SDRAM Design (SerDes Clocking)
Figure 3-17 shows the hierarchical structure of the DDR2 SDRAM design generated by
MIG with a testbench and a DCM. The modules are classified as follows:
• Design modules
• Testbench modules
• Clocks and reset generation modules
There is a parameter file generated with the design that has all the user input and design
parameters selected from MIG.
MIG can generate two different DDR2 SDRAM designs:
• With a testbench and a DCM
• Without a testbench and with a DCM
A design without a testbench (user_design) does not have testbench modules. The
<top_module> module has the user interface signals for designs without a testbench. The
list of user interface signals is provided in Table 3-24.
Design clocks and resets are generated by using the DCM in the infrastructure module.
The inputs to this module are the differential design clock and a 200 MHz differential clock
for the IDELAYCTRL module. A user reset is also input to this module. Using the input
clocks and reset signals, the system clocks and the system reset are generated in this
module, which is used in the design.
clk200
clk200_p rst200 idelay_ctrl idelay_ctrl_rdy
clk200_n
System
Clocks sys_clk_p
clk ddr2_ras_n
and Reset Infrastructure
sys_clk_n
clk90 ddr2_cas_n
sys_reset_in_n
sys_rst_270 ddr2_we_n
sys_rst_90 ddr2_cs_n
sys_rst ddr2_odt
clkdiv_0 ddr2_cke
clkdiv_90 ddr2_dm
main_0 Memory
ddr2_ba Device
ddr2_a
ddr2_ck
ddr2_ck_n
ddr2_dq
UG086_c3_15_091508
Figure 3-18: Top-Level Block Diagram of the DDR2 SDRAM Design with a DCM and a Testbench
All Memory Device ports do not necessarily appear for all MIG-generated designs. For
example, port ddr2_reset_n appears in the port list for registered DIMM designs only.
Similarly, ddr2_dqs_n does not appear for single-ended DQS designs. Port DDR2_DM
appears only for parts that contain a data mask; a few RDIMMs have no data mask, and
DDR2_DM does not appear in the port list for them.
Figure 3-19 shows a top-level block diagram of a DDR2 SDRAM design with a DCM but
without a testbench. The sys_clk_p and sys_clk_n pair are differential input system clocks.
The DCM is instantiated in the infrastructure module that generates the required design
clocks. The differential clk200_p and clk200_n pair are used for the idelay_ctrl element. The
active-Low system reset signal is sys_reset_in_n. All design resets are gated by the
dcm_lock signal. The user has to drive the user application signals. The design provides
the clk_tb and reset_tb signals to the user to synchronize with the design. The
init_complete signal indicates the completion of initialization and calibration of the design.
clk200
clk200_p rst200 idelay_ctrl idelay_ctrl_rdy
clk200_n
System
Clocks sys_clk_p
clk ddr2_ras_n
and Reset Infrastructure
sys_clk_n
clk90 ddr2_cas_n
sys_reset_in_n
sys_rst_270 ddr2_we_n
sys_rst_90 ddr2_cs_n
sys_rst ddr2_odt
clkdiv_0 ddr2_cke
clkdiv_90 ddr2_dm
ddr2_ba Memory
app_af_addr
Device
app_af_wren ddr2_a
app_wdf_data ddr2_ck
top_0
app_mask_data ddr2_ck_n
app_wdf_wren ddr2_dq
wdf_almost_full ddr2_dqs
af_almost_full ddr2_dqs_n
Figure 3-19: Top-Level Block Diagram of the DDR2 SDRAM Design with a DCM but without a Testbench
User Backend
init_complete
User Interface
af_addr address/controls
app_af_addr Backend FIFOs DDR2
af_almost_empty SDRAM ctrl_wr_dis
app_af_wren Read/Write ctrl_waf_rden Controller ctrl_odd_latency
Address
Address FIFO ctrl_rden
and Data app_wdf_data
ctrl_wdf_rden
Generation ctrl_wren
app_wdf_wren
cntl_dummyread_start dp_dly_sel_done
app_mask_data
Write Data
wdf_almost_full FIFOs
af_almost_full wdf_data
burst_length_div2 dq DDR2
read_data0 SDRAM
read_data0_fifo_out dqs
Physical
read_data1
Layer
Read Data read_data1_fifo_out
read_data2
Compare
Module read_data2_fifo_out
Read Data
FIFOs read_data3
read_data3_fifo_out
ctrl_rden_valid
read_data_valid
UG086_c3_17_091608
Controller
The DDR2 SDRAM ddr2_controller accepts and decodes user commands and generates
read, write, and refresh commands. The DDR2 SDRAM controller also generates signals
for other modules. The memory is initialized and powered up using a defined process. The
controller state machine handles the initialization process upon power-up. When the
initialization is over, the controller starts doing a dummy write and continuous dummy
reads. During these dummy reads, the tap_logic module calibrates DQ and DQS by
varying the delay to center-align the data with the FPGA clock. Then the tap_logic module
asserts the dp_dqs_dq_calib_done signal. After this assertion, the controller does one more
write and read to the memory for read-enable calibration to determine the delay between
the read command and data. Then dp_dly_slct_done is asserted to start writing to and
reading from the memory.
The ddr2_controller is clocked at half the frequency of the interface using CLKDIV_0 and
CLKDIV_90 and CLK_90. Therefore the address and bank address are driven and the
command signals (RAS_L, CAS_L, and WE_L) are asserted for two clock cycles of the fast
memory interface clock. The control signals (CS_L, CKE, and ODT) are DDR of the half
frequency clock CLKDIV_0, ensuring that the control signals are asserted for just one clock
cycle of the fast memory interface clock. Figure 3-21 shows the command and control
timing diagram for unbuffered DIMMs and components in which CS_L is deasserted 3/4T
earlier when the write command is at the positive edge of the device clock to the memory.
For registered DIMMs, CS_L is deasserted T/2 earlier only.
clkdiv_0
clk
Memory Device
Clock
Control (cs_l)
3/4T UG086_c3_18_091508
Figure 3-21: Command and Control Timing from Controller to DDR2 Memory
Physical Layer
This module transmits data to and receives data from the memories. Its major functions
include processing the data in the write datapath, and calibrating the data in the read
datapath. The write datapath function is implemented in the data_write module and the
read datapath function is implemented in the tap_ctrl, data_tap_inc, and idelay_rd_en_io
modules.
To start calibration in the read datapath, the write datapath first generates the training
pattern (known data) and writes it to the memory during dummy writes. Calibration is
done during the dummy reads. The read datapath expects the training pattern. When the
received training pattern is correct, then DQ and DQS are aligned with the FPGA clock to
capture the data without errors during actual writes and reads. After this calibration is
finished, dp_dqs_dq_calib_done is asserted to start read-enable calibration to find the
delay between the read command and data at the input of the Read Data FIFO. So the read
enable generated from the controller with the read command is delayed by the same
amount and is used as the write enable to the Read Data FIFO for normal reads. Once this
read-enable calibration is complete, dp_dly_slct_done is asserted, which initiates writes
and reads to the memory.
User Interface
This module stores write data and write addresses, writes the data into a location specified
by the write address, stores read addresses used to read from a specific location, and also
stores data read from the memory in FIFOs. The rd_data and rd_data_fifos modules store
the data in LUT-based RAMs. The rd_wr_addr_fifo and wr_data_fifo modules store the
data and address in block RAMs.
The FIFOs are built using FIFO16 primitives in the rd_wr_addr_fifo and wr_data_fifo_16
modules. Each FIFO has a threshold attribute called ALMOST_FULL_OFFSET, whose
value is set to 7, by default, in the RTL. This value can be changed as needed. For valid
FIFO threshold offset values, refer to UG070 [Ref 7].
The width of the data stored by the wr_data_fifo module is four times the interface data
width, because the data corresponding to four edges is given in one clock cycle.
Test Bench
The MIG tool generates two RTL folders, example_design and user_design. The
example_design folder includes the synthesizable test bench, while user_design does not
include the test bench modules. The MIG test bench performs eight write commands and
eight read commands in an alternating fashion. The number of words in a write command
depends on the burst length. For a burst length of 4, the test bench writes a total of 32 data
words for all eight write commands (16 rise data words and 16 fall data words). For a burst
length of 8, the test bench writes a total of 64 data words. It writes the data pattern of FF,
00, AA, 55, 55 AA, 99, 66 in a sequence of which FF, AA, 55, and 99 are rise data words and
00, 55, AA, and 66 are fall data words for an 8-bit design. The falling edge data is the
complement of the rising edge data. For a burst length of 4, the data sequence for the first
write command is FF, 00, AA, 55, and the data sequence for the second write command is
55, AA, 99, 66. For a burst length of 8, the data pattern for the first write command is FF,
00, AA, 55, 55 AA, 99, 66 and the same pattern is repeated for all the remaining write
commands. This data pattern is repeated in the same order based on the number of data
words written. For data widths greater than 8, the same data pattern is concatenated for
the other bits. For a 32-bit design and a burst length of 8, the data pattern for the first write
command is FFFFFFFF, 00000000, AAAAAAAA, 55555555, 55555555, AAAAAAAA,
99999999, 66666666.
Address generation logic generates eight different addresses for eight write commands.
The same eight address locations are repeated for the following eight read commands. The
read commands are performed at the same locations where the data is written. There are
total of 32 different address locations for 32 write commands, and the same address
locations are generated for 32 read commands. Upon completion of a total of 64
commands, including both writes and reads (eight writes and eight reads repeated four
times), address generation rolls back to the first address of the first write command and the
same address locations are repeated. The MIG test bench exercises only a certain memory
area. The address is formed such that all address bits are exercised. During writes, a new
address is generated for every burst operation on the column boundary.
During reads, comparison logic compares the read pattern with the pattern written, i.e., the
FF, 00, AA, 55, 55 AA, 99, 66 pattern. For example, for an 8-bit design of burst length 4, the
data written for a single write command is FF, 00, AA, 55. During reads, the read pattern is
compared with the FF, 00, AA, 55 pattern. Based on a comparison of the data, a status
signal error is generated. If the data read back is the same as the data written, the error
signal is 0, otherwise it is 1.
Infrastructure Module
The infrastructure module generates the necessary FPGA clock and reset signals. The
clocking scheme used for this design includes one DCM and one PMCD, as shown in
Figure 3-22. When differential clocking is used, sys_clk_p, sys_clk_n, clk_200_p, and
clk_200_n signals appear. When single-ended clocking is used, sys_clk and idly_clk_200
signals appear. In addition, clocks are available for design use and a 200 MHz clock is
provided for the IDELAYCTRL primitive. Differential and single-ended clocks are passed
through global clock buffers before connecting to a DCM. For differential clocking, the
output of the sys_clk_p/sys_clk_n buffer is single-ended and is provided to the DCM
input. Likewise, for single-ended clocking, sys_clk is passed through a buffer and its
output is provided to the DCM input. The clock outputs of the DCM are passed through
the PMCD. The outputs of the PMCD are clk (0° phase-shifted version of the input clock),
clk_90 (90° phase-shifted version of the input clock), clkdiv_0 (half the frequency of the
input clock and phase-aligned with clk), and clkdiv_90 (half the frequency of the input
clock and phase-aligned with clk_90). The clock outputs of the DCM are passed through
PMCD. After the DCM is locked, the design is in the reset state for at least 25 clocks. The
infrastructure module also generates all of the reset signals required for the design.
DCM PMCD
sys_clk_in clk_90
CLK0 CLKB CLKA1
CLKIN
CLK90 CLKA clkdiv_90
sys_reset CLKA1D2
RST
CLKDV CLKC clk
CLKB1
CLKFB LOCKED REL clkdiv_0
CLKC1
RST
UG086_c3_19_091508
Figure 3-22: Clocking Scheme for the High-Performance Memory Interface Design
Note: SerDes design is not supported for FPGAs that do not have PMCDs. Unsupported FPGAs for
SerDes design are:
XC4VLX15-FF668 XC4VFX12-FF668 XC4VSX25-FF668
XC4VLX15-FF676 XC4VFX12-SF363 XC4VSX25-FF676
XC4VLX15-SF363 XC4VFX20-FF672
Idelay_ctrl
This module instantiates the IDELAYCTRL primitive of the Virtex-4 FPGA. The
IDELAYCTRL primitive is used to continuously calibrate the individual delay elements in
its region to reduce the effect of process, temperature, and voltage variations. A 200 MHz
clock has to be fed to this primitive. For more information on IDELAYCTRLs, refer to
“Verify IDELAYCTRL Instantiation for Virtex-4 and Virtex-5 FPGA Designs” in Chapter
14.
After the per-bit calibration is done, the controller does a read enable calibration. This
calibration is used to determine the delay from read command to read data at rd_data_fifo.
The delay between read command and read data is affected by the CAS latency and
additive latency parameters, the PCB traces, and the I/O buffer delays. This in turn is used
to generate a write enable to rd_data_fifo so that valid data is registered. The controller
issues a dummy read command and compares the read data with a fixed known pattern.
The training_done port in the tap_logic module indicates the completion of the read enable
calibration.
The init_complete port indicates the completion of DQS to FPGA clock calibration, per-bit
calibration, and read enable calibration. After initialization and calibration are done, the
controller can start issuing user commands to the memory.
Notes:
1. All user interface signal names are prepended with a controller number for the without testbench case, because SerDes clocking
supports only a single controller. See “User Interface Accesses,” page 143 for timing requirements and restrictions on the user
interface signals.
2. Linear addressing is used, i.e., the row address immediately follows the column address bits, and the bank address follows the row
address bits, thus supporting more devices. The number of address bits used depends on the density of the memory part. The
controller ignores the unused bits, which can all be tied High.
Write Interface
Figure 3-23 shows the user interface block diagram for write operations.
app_af_addr
User Interface af_addr
Address FIFO
app_af_wren (FIFO16) af_empty
512 x 36 Controller
af_almost_full ctrl_af_rden
Write Data
wdf_almost_full
FIFO ctrl_wdf_rden
(FIFO16)
app_wdf_wren 512 x 36
wdf_data
app_wdf_data
Write Data To Phy Layer
FIFO mask_data
app_mask_data (FIFO16)
512 x 36
ug086_c3_28_110707
are available only when supported by the memory part and when the data mask
option is enabled in the MIG GUI. Some memory parts, such as Registered DIMMs of
x4 parts, do not support mask bits.
2. The Common Address FIFO is used for both write and read commands, and comprises
a command part and an address part. Command bits discriminate between write and
read commands.
3. User interface data width app_wdf_data is four times that of the memory data width.
For an 8-bit memory width, the user interface is 32 bits consisting of two rising-edge
data and two falling-edge data. For every 8 bits of data, there is a mask bit. For 72-bit
memory data, the user interface data width app_wdf_data is 288 bits, and the mask
data app_mask_data is 36 bits.
4. The minimum configuration of the Write Data FIFO is 512 x 36 for a memory data
width of 8 bits.
5. Depending on the memory data width, MIG instantiates multiple FIFO16s to gain the
required width. For designs using 8-bit data width, one FIFO16 is instantiated; for
72-bit data width, a total of nine FIFO16s are instantiated. The bit architecture
comprises 16 bits of rising-edge data, 2 bits of rising-edge mask, 16 bits of falling-edge
data, and 2 bits of falling-edge mask, which are all stored in a FIFO16. MIG routes the
app_wdf_data and app_mask_data to FIFO16s accordingly.
6. The user can initiate a write to memory by writing to the Address FIFO and the Write
Data FIFO when the FIFO Full flags are deasserted. Status signal af_almost_full is
asserted when Address FIFO is full, and similarly wdf_almost_full is asserted when
Write Data FIFO is full.
7. Both the Address FIFO and Write Data FIFO Full flags are deasserted with power-on.
8. The user should assert the Address FIFO write-enable signal app_af_wren along with
address app_af_addr to store the write address and write command into the Address
FIFO.
9. The user should assert the Data FIFO write-enable signal app_wdf_wren along with
write data app_wdf_data and mask data app_mask_data to store the write data and
mask data into the Write Data FIFO. The user should provide two rising-edge and two
falling-edge data together for each write to the Data FIFO.
10. The controller reads the Address FIFO by issuing the ctrl_af_rden signal. The
controller reads the Write Data FIFO by issuing the ctrl_wdf_rden signal after the
Address FIFO is read. It decodes the command part after the Address FIFO is read.
clkdiv_0
reset0
af_almost_full
app_af_wren
app_af_addr[35:0] A1 A2 A3 A4
app_wdf_wren
app_wdf_data[4n-1:0] D0 D1 D2 D3 D0 D1 D2 D3 D0 D1 D2 D3 D0 D1 D2 D3
app_mask_data[4m-1:0] M0 M1 M2 M3 M0 M1 M2 M3 M0 M1 M2 M3 M0 M1 M2 M3
wdf_almost_full
UG086_c3_20_091508
Figure 3-24: DDR2 SDRAM Write Burst (BL = 4) for Four Bursts
11. The write command timing diagram in Figure 3-24 is derived from the MIG-generated
testbench. As shown (burst length of 4), each write to the Address FIFO must be
coupled with one write to the Data FIFO.
Note: The user can start filling the Write Data FIFO two clocks after the Address FIFO is
written, because there is a two-clock latency between the command fetch and reading the Data
FIFO. Using the terms shown in Figure 3-24 and Figure 3-25, therefore, the user can assert the
A1 address two clocks before D0D1D2D3. Similarly, A2, A3, and A4 can be advanced by two
clocks.
clkdiv_0
reset0
af_almost_full
app_af_wren
app_af_addr[35:0] A1 0 A2 0
app_wdf_wren
app_wdf_data[4n-1:0] D0 D1 D2 D3 D4 D5 D6 D7 D0 D1 D2 D3 D4 D5 D6 D7
app_mask_data[m-1:0] M0 M1 M2 M3 M4 M5 M6 M7 M0 M1 M2 M3 M4 M5 M6 M7
wdf_almost_full
UG086_c3_21_091508
Figure 3-25: DDR2 SDRAM Write Burst (BL = 8) for Two Bursts
12. The write command timing diagram in Figure 3-25 is derived from the MIG-generated
testbench. As shown (burst length of 8), each write to the Address FIFO must be
coupled with two writes to the Data FIFO. Because the controller first reads the address
and command together, the address need not coincide with the last data. After the
command is analyzed (nearly two clocks later for a worst-case timing scenario), the
controller sequentially reads the data in four clocks. Thus, there are six clocks from the
time the address is read to the time the last data is read.
Read Interface
Figure 3-26 shows a block diagram of the read interface.
app_af_addr
User Interface af_addr
Address FIFO
(FIFO16)
app_af_wren 512 x 36 af_empty
Controller
af_almost_full ctrl_af_rden
Read Data
FIFO 0
RAM16 x 1D
read_data_valid
Read Data
FIFO 0
RAM16 x 1D
read_data0_fifo_out read_data0_fifo_out
Read Data
FIFO 1
RAM16 x 1D
Read Data
FIFO 1
RAM16 x 1D
read_data1_fifo_out read_data1_fifo_out
Read Data
FIFO 2
RAM16 x 1D
Read Data
FIFO 3
RAM16 x 1D
ug086_c3_29_110607
The following steps describe the architecture of the Read Data FIFOs and show how to
perform a burst read operation from DDR SDRAM from the user interface.
1. The read user interface consists of an Address FIFO and a Read Data FIFO. The
Address FIFO is common to both read and write operations. These FIFOs are
constructed using Virtex-4 FPGA Distributed RAMs with a 16 x 1 configuration. MIG
instantiates a number of RAM16Ds depending on the data width. For example, for
8-bit data width, MIG instantiates a total of 32 RAM16Ds, 16 for first and second
rising-edge data and 16 for first and second falling-edge data. Similarly, for 72-bit data
width, MIG instantiates a total of 288 RAM16Ds, 144 for first and second rising-edge
data and 144 for first and second falling-edge data.
2. The user can initiate a read to memory by writing to the Address FIFO when the
FIFO Full flag af_almost_full is deasserted.
3. To write the read address and read command into the Address FIFO, the user should
issue the Address FIFO write-enable signal app_af_wren along with read address
app_af_addr.
4. The controller reads the Address FIFO containing the address and command. After
decoding the command, the controller generates the appropriate control signals to
memory.
5. Prior to the actual read and write commands, the design calibrates the latency (number
of clock cycles) from the time the read command is issued to the time data is received.
Using this pre-calibrated delay information, the controller generates the write-enable
signals to the Read Data FIFOs.
After the power-up calibration is done, dummy reads are executed to set up the delay
between the read command and read data from the memory. During the time these
dummy reads are in progress, the read enable is generated with each read command
and is delayed until the read data matches the write data. This delay includes CAS
latency, trace delay, and path delay. This precalculated delay is used for asserting the
read-enable signals that latch the data into the Read Data FIFOs. The delays are
calculated on a per-DQS basis. For example, if a bank has two DQS signals, there are
two read enables used to latch the read data to the FIFOs. The strobe (DQS), data (DQ),
and clock (CK/CK) signals should be matched in trace length from the FPGA to the
memory device. MIG ensures that a DQS and its corresponding DQ signals do not
cross a bank boundary.
6. The read_data_valid signal is asserted when data is available in the Read Data FIFOs.
clkdiv_0
reset0
af_almost_full
app_af_wren
app_af_addr[35:0] A1 A2
read_data_valid
read_data0_fifo_out[n-1:0] D0 D0
read_data1_fifo_out[n-1:0] D1 D1
read_data2_fifo_out[n-1:0] D2 D2
read_data3_fifo_out[n-1:0] D3 D3
25 Clocks UG086_c3_22_091508
Figure 3-27: DDR2 SDRAM Read Burst (BL = 4) for Two Bursts
clkdiv_0
reset0
af_almost_full
app_af_wren
app_af_addr[35:0] A1 A2
read_data_valid
read_data0_fifo_out[n-1:0] D0 D4 D0 D4
read_data1_fifo_out[n-1:0] D1 D5 D1 D5
read_data2_fifo_out[n-1:0] D2 D6 D2 D6
read_data3_fifo_out[n-1:0] D3 D7 D3 D7
25 Clocks
UG086_c3_23_091508
Figure 3-28: DDR2 SDRAM Read Burst (BL = 8) for Two Bursts
7. Figure 3-27 shows the user interface timing diagram for a burst length of 4, and
Figure 3-28 shows user interface timing diagram for a burst length of 8. Both the cases
shown here are for a CAS latency of 4 at 200 MHz. The read latency is calculated from
the point when the read command is given by the user to the point when the data is
available with the read_data_valid signal. The minimum latency in this case is 25
clocks, where no precharge is required, no auto-refresh request is pending, the user
commands are issued after initialization is completed, and the first command issued is
a Read command. Controller executes the commands only after initialization is done
as indicated by the init_done signal.
8. After the address and command are loaded into the Address FIFO, it takes 25 clock
cycles minimum for the controller to assert the read_data_valid signal.
9. Read data is available only when the read_data_valid signal is asserted. The user
should access the read data on every positive edge of the read_data_valid signal.
Table 3-25 shows how the 25 clocks from the read command to the read data are broken up.
The memory address (Waf_addr) includes the column address, row address, bank address,
and chip-select width for deep memory interfaces.
Column Address
[‘column_address - 1:0]
Row Address
[(‘row_address + ‘column_address) - 1:‘column_address]
Bank Address
[(‘bank_address + ‘row_address + ‘column_address) -
1:(‘column_address + ‘row_address)]
Chip Select
[‘cs_width + ‘bank_address + ‘row_address + ‘column_address -
1:‘bank_address + ‘row_address + ‘column_address]
Figure 3-29 describes two consecutive writes followed by two consecutive reads with a
burst length of 8. Table 3-28 lists the state signal values for Figure 3-29.
clkdiv_0
af_almost_empty
addr_controller/
03 04 07 08 0C 0C 0D 0C 16 09 0B 0A 0B
state
ctrl_waf_rden
ctrl_wdf_rden
UG086_c3_24_091508
Figure 3-29: Controller Read of Command and Data from User Interface FIFOs for a Burst Length of 8
Figure 3-30 describes the timing waveform for control signals from the controller to the
physical layer with a CAS latency of 4 and an additive latency of 0.
clkdiv_0
ctrl_wren
ctrl_wr_disable
ctrl_odd_latency
ctrl_rden_div0
UG086_c3_25_091508
Figure 3-30: Timing Waveform for Control Signals from the Controller to the Physical Layer
Note: Timing has been verified for most of the MIG generated configurations. For the best timing
results, adjacent banks in the same column of the FPGA should be used. Banks that are separated
by unbonded banks should be avoided because these can cause timing violations.
model. Allocating the full memory range might exceed the memory of the operating
system, thus causing memory allocation failure in simulations.
Simulation Violations
There might be simulation violations for frequencies such as 150 MHz where the clock
period is not an integer value. At 150 MHz, the clock period value in the simulation
testbench is 6.66 ns and the MIG tool rounds it to 6.67 ns. Consider a memory TRCD value
of 20 ns. MIG calculates the TRCD count value based on the clock period,
RCD_COUNT_VALUE = 20/6.67 = 2.998 = 3 (after rounding off) in the design parameter
file. The TRCD value for 3 clock cycles is 3 × 6.66 = 19.98, which causes timing violations by
20 ps. The difference between the clock period in the external simulation testbench versus
the MIG tool causes timing violations. This is only one example case. There might be more
such scenarios. These are only simulation warnings. Functionally, there should be no
issues. To remove these warnings, the related count value can be increased by one.
Supported Devices
The design generated out of MIG is independent of memory package, hence the package
part of the memory component is replaced with XX, where XX indicates a don't care
condition. The tables below list the components (Table 3-31) and DIMMs (Table 3-32
through Table 3-34) supported by the tool for DDR2 SerDes clocking designs.
In supported devices, an X in the component column denotes a single alphanumeric
character. For example MT47H128M4XX-3 can be either MT47H128M4BP-3 or
MT47H128M4B6-3. An XX for Registered DIMMs denotes a single or two alphanumeric
characters. For example, MT9HTF3272XX-667 can be either MT9HTF3272Y-667 or
MT9HTF3272DY-667. Pin mapping for x4 RDIMMs is provided in Appendix G, “Low
Power Options.”
Chapter 4
Feature Summary
The QDRII controller design supports the following:
• A maximum frequency of 250 MHz
• 9-bit, 18-bit, 36-bit, and 72-bit data widths
• Burst lengths of two and four
• Implementation using different Virtex-4 devices
• Operation with any 9-bit, 18-bit, and 36-bit memory component
• Verilog and VHDL
• With and without a testbench
• With and without a DCM
Limitations
Four different FIFOs are accessible from the user interface: the Read Address FIFO, Read
Data FIFO, Write Address FIFO, and Write Data FIFO. The Read Address FIFO is used to
store the read command and read address. The Write Address FIFO is used to store the
write command and write address. The Write Data FIFO is used to store the write data
from the user interface. The controller stores the read data from the memory to the Read
Data FIFO. The controller executes read commands only when the Read Address FIFO is
not empty and the Read Data FIFO is not full. Similarly, the controller executes write
commands only when the Write Address FIFO are Write Data FIFO are not empty. The
sequence of commands executed by the controller might not be the same as the sequence of
commands that are stored in the Read Address and Write Address FIFOs. The controller
executes write and read commands alternately when it finds valid write and read
commands, irrespective of the sequence of commands that are written to the FIFOs from
the user interface. Consider an example in which 10 write commands followed by 10 read
commands are issued from the user interface, but the controller executes write, read, write,
read… and so on. If the Read Address FIFO is empty or the Read Data FIFO is full, and the
Write Address FIFO is not empty, the controller executes all write commands sequentially.
Similarly, if the Write Address FIFO is empty, the Read Address FIFO is not empty, and the
Read Data FIFO is not full, the controller executes all read commands sequentially.
The controller remains in the IDLE state when the Write Address FIFO is empty, and either
the Read Address FIFO is empty or the Read Data FIFO is full.
Architecture
Figure 4-1 shows a top-level block diagram of the QDRII memory controller. One side of
the QDRII memory controller connects to the user interface denoted as Block Application.
The other side of the controller interfaces to QDRII memory. The memory interface data
width is selectable.
Data is double-pumped to QDRII SRAM on both the positive and the negative clock edges.
The HSTL_18 Class I I/O standard is used for the data, address, and control signals.
QDRII
QDRII
Memory
Memory
Controller
Block
Application
UG086_c4_01_042205
QDRII SRAM interfaces are source-synchronous and double data rate like DDR SDRAM
interfaces.
The key advantage to QDRII devices is they have separate data buses for reads and writes
to SRAM.
Interface Model
The memory interface is layered to simplify the design and make the design modular.
Figure 4-2 shows the layered memory interface in the QDRII memory controller. The three
layers are the application layer, the implementation layer, and the physical layer.
Architecture
User Interface
Implementation Layer
Physical Layer
UG086_c4_02_012507
The application layer comprises the user interface, which initiates memory writes and
reads by writing data and memory addresses to the User Interface FIFOs. The
implementation layer comprises the infrastructure, datapath, and control logic.
• The infrastructure logic consists of the DCM and reset logic generation circuitry.
• The datapath logic consists of the calibration logic by which the data from the
memory component is captured using the FPGA clock.
• The control logic determines the type of data transfer, that is, read/write with the
memory component, depending on the User Interface FIFO’s status signals.
The physical layer comprises the I/O elements of the FPGA. The controller communicates
with the memory component using this layer. The I/ O elements (such as IDDRs, ODDRs,
and IDELAY elements) are associated with this layer.
Hierarchy
Figure 4-3 shows the QDRII SRAM controller hierarchy.
<top_
module>
infrastructure_ idelay_
main*
top* ctrl
test_
top*
bench*
qdr_d_ qdr_q_ qdr_cq_ address_ qdr_rd_ bw_ dly_cal_ data_ wr_data_ wr_addr_ rd_addr_ rd_data_
iob* iob* iob* burst* enable* burst* sm tap_inc interface* interface* interface* interface*
Figure 4-3 shows the hierarchical structure of the QDRII SRAM design generated by MIG
with a testbench and a DCM. The modules are classified as follows:
• Design modules
• Testbench modules
• Clocks and reset generation modules
There is a parameter file generated with the design that has all the user input and design
parameters selected from MIG.
MIG can generate QDRII SRAM designs in four different ways:
• With a testbench and a DCM
• Without a testbench and with a DCM
• With a testbench and without a DCM
• Without a testbench and without a DCM
MIG outputs both an example_design and a user_design. The MIG-generated
example_design includes the entire memory controller design along with a synthesized
testbench (example user application). This testbench generates sample writes and reads
and then uses comparison logic to verify that the data patterns written are the same as
those received. This example_design can be used to test functionality both in simulation
and in hardware. The user_design includes the memory controller design only. This design
allows users to connect the MIG memory controller design to a user developed testbench
(user application). Refer to Table 4-5, page 213 for user interface signals and to “Write
Architecture
Interface,” page 215 and “Read Interface,” page 218 for timing restrictions on user interface
signals.
Design clocks and resets are generated in the infrastructure_top module. When the
Use DCM option is checked in MIG, a DCM primitive and the necessary clock buffers are
instantiated in the infrastructure_top module. The inputs to this module are the
differential design clock and a 200 MHz differential clock required for the IDELAYCTRL
module. A user reset is also input to this module. Using the input clocks and reset signals,
the system clocks and the system resets used in the design are generated in this module.
When the Use DCM option is unchecked in MIG, the infrastructure_top module does not
have the DCM and the corresponding clock buffer instantiations; therefore, the system
operates on the user-provided clocks. The system reset is generated in the
infrastructure_top module using the dcm_lock signal and the ready signal of the
IDELAYCTRL element. For more information on the clocking structure, refer to “Clocking
Scheme,” page 209.
Figure 4-4 shows a top-level block diagram of a QDRII SRAM design with a DCM and a
testbench. Inputs to the design are referenced to a differential clock pair (refclk_p and
refclk_n) for the controller design, a 200 MHz differential clock pair (dly_clk_200_p and
dly_clk_200_n) for the IDELAYCTRL element, and the system reset signal, sys_rst_n. All
design resets are generated using the dcm_locked signal, the sys_rst_n signal, and the
dly_ready signal of the IDELAYCTRL element. The compare_error output signal indicates
whether the design passes or fails. The dly_cal_done signal indicates the completion of
initialization and calibration of the design. Because the DCM is instantiated in the
infrastructure module, it generates the required clocks and reset signals for the design.
user_reset200
dly_ready
idelay_ctrl
clk_200_p
refclk_p clk_200
clk_200_n
refclk_n
Reference
Clocks dly_clk_200_p infrastructure IBUFGDS
and Reset dly_clk_200_n _top
clk_0 qdr_dll_off_n
sys_rst_n
clk_270 qdr_w_n
user_reset qdr_r_n
user_reset270 qdr_k
qdr_k_n
qdr_c Memory
main0
qdr_c_n Device
qdr_sa
qdr_bw_n
qdr_d
UG086_c4_04_071808
Figure 4-4: Top-Level Block Diagram of the QDRII SRAM Design with a DCM and a Testbench
Architecture
Figure 4-5 shows a top-level block diagram of a QDRII SRAM design without a DCM but
with a testbench. The user should provide all the clocks and the dcm_locked signal. These
clocks should be single-ended. sys_rst_n is the system reset signal. All design resets are
generated using the dcm_locked signal, the sys_rst_n signal, and the dly_ready signal of
the IDELAYCTRL element. The user application must have a DCM primitive instantiated
in the design, and all user clocks should be driven through BUFGs. The compare_error
signal, which is the output of the design, indicates whether the design passes or fails. The
testbench module does writes and reads, and also compares the read data with written
data. The compare_error signal is set High on data mismatches. The dly_cal_done signal
indicates the completion of initialization and calibration of the design.
clk_200
User idelay_ctrl
user_reset200 dly_ready
DCM
Clocks clk_0
and Infrastructure
Reset clk_270 _top user_reset270 qdr_dll_off_n
dcm_locked
user_reset qdr_w_n
sys_rst_n
qdr_r_n
qdr_k
qdr_k_n
qdr_c
Memory
qdr_c_n
Device
main0 qdr_sa
compare_error qdr_bw_n
Status
Signals dly_cal_done qdr_d
qdr_q
qdr_cq
UG086_c4_05_071808
Figure 4-5: Top-Level Block Diagram of the QDRII SRAM Design with a Testbench but without a DCM
Figure 4-6 shows a top-level block diagram of a QDRII SRAM design with a DCM but
without a testbench. refclk_p and refclk_n are differential input reference clocks. The DCM
is instantiated in the infrastructure module that generates the required design clocks.
dly_clk_200_p and dly_clk_200_n are used for the IDELAYCTRL element. sys_rst_n is the
system reset signal. All design resets are generated using the dcm_locked signal, the
sys_rst_n signal, and the dly_ready signal of IDELAYCTRL element. The user has to drive
the user application signals. The design provides the user_clk and user_rst signals to the
user to synchronize the user application signals with the design. The signal user_clk is
connected to clk0 clock signal in the controller. If the user clock domain is different from
clk0/user_clk, the user should add FIFOs for all the inputs and output of the controller
(user application signals), in order to synchronize them to user_clk clock. The
dly_cal_done signal indicates the completion of initialization and calibration of the design.
user_reset200
dly_ready
idelay_ctrl
clk_200_p
refclk_p clk_200
clk_200_n
refclk_n
Reference
Clocks dly_clk_200_p infrastructure
IBUFGDS
and Reset _top
dly_clk_200_n
clk_0 qdr_dll_off_n
sys_rst_n
clk_270 qdr_w_n
user_reset qdr_r_n
user_reset270 qdr_k
qdr_k_n
dly_cal_done qdr_c Memory
user_qen_n qdr_c_n Device
user_wr_full qdr_sa
user_rd_full qdr_bw_n
user_qr_empty qdr_d
user_wr_err qdr_q
user_rd_err main0 qdr_cq
user_qr_err
user_clk
User user_rst
Application
user_dwl
user_dwh
user_qrl
user_qrh
user_bwl_n
user_bwh_n
user_ad_wr
user_ad_rd
user_r_n
user_w_n
UG086_c4_06_071808
Figure 4-6: Top-Level Block Diagram of the QDRII SRAM Design with a DCM but without a Testbench
Architecture
Figure 4-7 shows a top-level block diagram of a QDRII SRAM design without a DCM or a
testbench. The user should provide all the clocks and the dcm_locked signal. These clocks
should be single-ended. sys_rst_n is the system reset signal. All design resets are generated
using the dcm_locked signal, the sys_rst_n signal, and the dly_ready signal of the
IDELAYCTRL element. The user application must have a DCM primitive instantiated in
the design, and all user clocks should be driven through BUFGs. The user has to drive the
user application signals. The design provides the user_clk and user_rst signals to the user
to synchronize the user application signals with the design. The signal user_clk is
connected to clk0 clock in the controller. If the user clock domain is different from
clk0/user_clk, the user should add FIFOs for all the inputs and output of the controller
(user application signals), in order to synchronize them to user_clk clock. The
dly_cal_done signal indicates the completion of initialization and calibration of the design.
clk_200
User idelay_ctrl
user_reset200 dly_ready
DCM
Clocks clk_0
and infrastructure
Reset clk_270 _top user_reset270 qdr_dll_off_n
sys_reset_in_n
user_reset qdr_w_n
dcm_lock
qdr_r_n
qdr_k
qdr_k_n
dly_cal_done qdr_c Memory
user_qen_n qdr_c_n Device
user_wr_full qdr_sa
user_rd_full qdr_bw_n
user_qr_empty qdr_d
user_wr_err qdr_q
user_rd_err qdr_cq
main0
user_qr_err
user_clk
User user_rst
Application user_dwl
user_dwh
user_qrl
user_qrh
user_bwl_n
user_bwh_n
user_ad_wr
user_ad_rd
user_r_n
user_w_n
UG086_c4_07_071808
Figure 4-7: Top-Level Block Diagram of the QDRII SRAM Design without a DCM or a Testbench
User_clk Infrastructure_top
QDRII
User_fifo_status Memory
Controller
UG086_c4_08_090607
Figure 4-9 shows the QDRII memory controller modules with a 36-bit interface.
user_bwl_n
user_bwh_n Write Path qdr_bw_n
user_dwl
qdr_d
user_dwh
Read Path
USER_QRL qdr_cq
USER_QRH qdr_q
USER_WR_FULL
Delay qdr_k
USER_RD_FULL clk_0
Calibration qdr_k_n
USER_QR_EMPTY State Machine
UG086_c4_09_071808
Architecture
Controller
The QDRII memory controller initiates alternate Write and Read commands to the
memory as long as the User Write Data FIFOs, the User Write Address FIFO, and the User
Read Address FIFO are not empty, and the User Read Data FIFOs are not full.
The user writes the write data and the write address into the User Write Data FIFOs and
the User Write Address FIFO, respectively. When neither the User Write Data FIFOs nor
the User Write Address FIFO is empty, the QDRII controller generates a write-enable signal
to the memory. When the write enable is asserted, the write data and the write address are
transferred to memory from the User Write Data FIFOs and the User Write Address FIFO,
respectively.
The read address from where the data is to be read from memory is stored by the user in
the User Read Address FIFO. The QDRII memory controller generates a read-enable signal
to the memory when the User Read Address FIFO is not empty and the User Read Data
FIFOs are not full. When the read enable is asserted, the read address from the Read
Address FIFO is transferred to memory. The captured read data from the memory
corresponding to the read address is stored in the User Read Data FIFOs. The user can
access the data read from memory by reading the User Read Data FIFOs.
Figure 4-10 shows the QDRII memory controller state machine for burst lengths of four.
The controller state machine is in the IDLE state when the calibration is complete. When
the User Write Data FIFO and the User Write Address FIFO are not empty (that is, when
there are user-written write data and write address bits in the corresponding FIFOs), the
state machine goes to the WRITE state, initiating a memory write of one complete burst.
IDLE
RD WR
READ WR WRITE
R_n=0 R_n=1
W_n=1 W_n=0
RD
UG086_c4_10_012507
Figure 4-10: QDRII Memory Controller State Machine with Burst Lengths of 4
When the User Read Address FIFO is not empty (that is, the user has written read address
bits into the User Read Address FIFO) and either Read Data FIFO is not full, the state
machine goes to the READ state, initiating a memory read of one burst.
From the IDLE state, the QDRII memory controller can go to either the WRITE or the
READ state depending on the not empty status of the Write Address FIFO and the Write
Data FIFOs or the Read Address FIFO, and not full status of the Read Data FIFOs,
respectively. Writes are given priority. In the WRITE state, a memory write is initiated, and
the User Read Address Not Empty and User Read Data FIFOs full status are checked to
transfer into the READ state. When the User Read Address FIFO is empty, or the User Read
Data FIFOs are full, the state machine goes to the IDLE state.
In the READ state, a memory read is initiated, and the User Write Data and the User Write
Address FIFO Not Empty status is checked before going to the WRITE state. If the FIFOs
are empty, the state machine goes to the IDLE state.
Figure 4-11 shows a state machine of the QDR II memory controller for burst lengths of
two. When calibration is complete, the state machine is in the IDLE state. When the User
Write Data FIFO or Write Address FIFO is not empty (that is, when there are user-written
write data and write address bits in the corresponding FIFOs), the state machine goes to
the READ_WRITE state, initiating a memory write of one complete burst, or when the
User Read Address FIFO is not empty, that is, the user has written read address bits into
the User Read Address FIFO, and the User Read Data FIFOs are not full, the state machine
goes to the READ_WRITE state, initiating a memory read of one complete burst.
IDLE
READ_
WRITE
R_n=0
W_n=0
UG086_c4_11_090607
Figure 4-11: QDRII Memory Controller State Machine with Burst Lengths of 2
From the IDLE state, the QDR II memory controller goes to READ_WRITE state if either:
• the User Write Address FIFO and the User Write Data FIFO are not empty or,
• the User Read Address FIFO is not empty and the User Read Data FIFOs are not full
In the READ_WRITE state, the User Read Address Not Empty and User Read Data FIFOs
Not Full status are checked to initiate a memory read. To initiate a memory write in the
READ_WRITE state, the User Write Data FIFOs and the User Write Address FIFO Not
Empty status are checked. If both the User Write Data FIFOs and User Write Address FIFO
are empty, and the User Read Address FIFO is empty, or the User Read Data FIFOs are full,
the state machine goes to the IDLE state. If the User Write Data FIFO and User Write
Address FIFO are not empty, or the User Read Address FIFO is not empty and the User
Read Data FIFO is not full, the state machine remains in the READ_WRITE state to issue
memory writes or reads. The FIFOs are built using FIFO16 primitives in the data_bw_fifo,
data_fifo_mem, rd_addr_interface, wr_addr_interface, and wr_data_fifo modules. Each
FIFO has a threshold attribute called ALMOST_FULL_OFFSET whose value is set to F, by
default, in the RTL. This value can be changed as needed. For valid FIFO threshold offset
values, refer to UG070 [Ref 7].
Refer to XAPP703 [Ref 20] for detailed design and timing analysis of the QDRII memory
controller module.
Datapath
The Datapath module transmits and receives data to and from the memories. Its major
functions are listed below:
• Asserts a write-enable signal for memories with burst lengths of two or four
• Asserts a read-enable signal to memory and a write-enable signal to the User Read
Data FIFO
Architecture
• Generates increment/decrement signals (tap count) for IDELAY elements in the IOBS
• Center-aligns the data window to the FPGA clock
Refer to XAPP703 [Ref 20] for techniques on data writes to memory and data captures from
memory. For burst lengths of two, the write-enable signal to memory is asserted at the
same time that write data is driven. For burst lengths of four, the write-enable signal is
asserted one clock before the write data is driven on the memory bus. The data is driven on
both edges of the clock. The address to memory is driven for one full clock cycle for burst
lengths of 4 and on both the edges of the clock cycle for burst lengths of 2.
Memory read data is edge-aligned with the source-synchronous clock, CQ. The QDRII
memory clock, to which data is synchronized, is a free-running strobe. The free-running
strobe from the memory CQ is captured using the FPGA clock. Thus the relation between
the CQ strobe and the FPGA clock is found, and the strobe CQ is center-aligned with the
FPGA clock. The same logic is applied to the read data Q window, which is center-aligned
with the same FPGA clock. This in turn means that the same amount of tap delays are
applied to both Q and CQ through IDELAY elements to center-align the Q and CQ
windows with respect to the FPGA clock. By center-aligning the Read Data window Q
with respect to the FPGA clock, the data capturing logic is complete.
The delay calibration circuit generates the delay reset, delay select, and delay increment
values for IDELAY elements used in delaying strobes and data read from memory. The
strobe is center-aligned with the FPGA clock, which results in the data window falling to
the center of the FPGA clock. Refer to XAPP703 [Ref 20] for details about the delay
calibration.
Infrastructure
The infrastructure (infrastructure_top) module generates the FPGA clock and reset signals.
When differential clocking is used, refclk_p, refclk_n, dly_clk_200_p, and dly_clk_200_n
signals appear. When single-ended clocking is used, refclk and idly_clk_200 signals
appear. In addition, clocks are available for design use and a 200 MHz clock is provided for
the IDELAYCTRL primitive. Differential and single-ended clocks are passed through
global clock buffers before connecting to a DCM. For differential clocking, the output of the
refclk_p/refclk_n buffer is single-ended and is provided to the DCM input. Likewise, for
single-ended clocking, refclk is passed through a buffer and its output is provided to the
DCM input. The outputs of the DCM are 0° and 270° phase-shifted versions of the input
clock. After the DCM is locked, the design is in the reset state for at least 25 clocks. The
infrastructure module also generates all of the reset signals required for the design.
Idelay_ctrl
This module instantiates the IDELAYCTRL primitive of the Virtex-4 FPGA. The
IDELAYCTRL primitive is used to continuously calibrate the individual delay elements in
its region to reduce the effect of process, temperature, and voltage variations. A 200 MHz
clock has to be fed to this primitive.
The MIG tool instantiates the required number of IDELAYCTRLs in the RTL and uses the
LOC constraints in the UCF file to fix their locations. The number of IDELAYCTRLs is
defined by the IDELAYCTRL_NUM parameter in the idelay_ctrl module. In the RTL,
DLY_READY is generated by doing a logical AND of the RDY signals of every
IDELAYCTRL block.
IDELAYCTRL LOC constraints should be checked in the following cases:
• The MIG design is used with other IP cores or user designs that also require the use of
IDELAYCTRL and IDELAYs.
• Previous ISE® software releases 8.2.03i and 9.1i had an issue with IDELAYCTRL block
replication or trimming. When using these revisions of the ISE software, the user must
instantiate and constrain the location of each IDELAYCTRL individually.
See UG070 [Ref 7] for more information on the requirements of IDELAYCTRL placement.
IOBS
All the input and output signals of the QDRII SRAM controller are implemented in the
IOBS module. All address and byte enable signals are registered in the IOBs and driven
out.
The IDELAY elements for the read strobe and data read from memory are implemented in
the IOBS. The IOBS also implement bidirectional buffers for write and read data. It
registers the output data (ODDR) before driving it out and registers the input data (IDDR).
Test Bench
The MIG tool generates two RTL folders, example_design and user_design. The
example_design folder includes the synthesizable test bench, while user_design does not
include the test bench modules. The MIG test bench performs one write command
followed by one read command in an alternating manner for designs with a burst length of
4. For a burst length of 2, the test bench performs one write command and one read
command in the same clock and repeats one write and one read command continuously.
The number of words in a write command depends on the burst length. For a burst length
of 4, the test bench writes a total of 4 data words for a single write command (2 rise data
words and 2 fall data words). For a burst length of 2, the test bench writes a total of 2 data
words. On every write command, the data pattern is incremented by one, and this is
repeated with each subsequent write command. The initial data pattern for the first write
command is 000. The test bench writes the 000, 001, 002, 003 data pattern in a sequence
in which 000 and 002 are rise data words and 001 and 003 are fall data words for a 9-bit
design. The falling edge data is always rising edge data plus one. For a burst length of 2,
the data sequence for the first write command is 000, 001. The data sequence for the
second write command is 002, 003. The pattern is then incremented for the next write
command. For data widths greater than 9, the same data pattern is concatenated for the
other bits. For a 36-bit design and a burst length of 4, the data pattern for the first write
command is 000000000, 008040201, 010080402, 0180C0603.
Address generation logic generates the address in an incremental pattern for each write
command. The same address location is repeated for the next read command. In Samsung
components, the burst address increments are done by the memory, so the address is
generated by the test bench in a linear incremental pattern. In Cypress parts, the MIG test
bench increments the address for burst operation. After the address reaches the maximum
value, it rolls back to the initial address, i.e., 00000.
During reads, comparison logic compares the read pattern with the pattern written, i.e., the
000, 001, 002, 003 pattern. For example, for a 9-bit design of burst length 4, the data
written for a single write command is 000, 001, 002, and 003. During reads, the read
pattern is compared with the 000, 001, 002, 003pattern. Based on a comparison of the
data, a status signal error is generated. If the data read back is the same as the data written,
the error signal is 0, otherwise it is 1.
Clocking Scheme
Clocking Scheme
Figure 4-12 shows the clocking scheme for this design. Global and local clock resources are
used.
The global clock resources consist of a DCM, two BUFGs on DCM output clocks, and one
BUFG for clk_200. The local clock resources consist of regional I/O clock networks
(BUFIO). The global clock architecture is discussed in this section.
The MIG tool allows the user to customize the design such that the DCM is not included.
In this case, clk_0 and clk_270 must be supplied by the user.
Notes:
1. See “QDRII Controller System and User Interface Signals,” page 212 for timing requirements and
restrictions on the user interface signals.
DCM
BUFG
GC I/O
SYSTEM CLK
CLK0 clk_0
CLKIN
UG086_c4_21_071808
Table 4-5: QDRII SRAM User Interface Signals (without a Testbench) (Cont’d)
Signal Name (1) Direction Description
user_qen_n Input This active-Low signal is the read enable for the User Read
Data FIFOs. The QDRII memory controller captures the data
read from memory and stores it in the Read Data FIFOs. The
user can access these FIFOs to get the data read from
memory.
user_w_n Input This active-Low signal is the write enable for the User Write
Data and User Write Address FIFOs. The user asserts this
signal to write new data to the FIFOs. The QDRII memory
controller reads the data from the User Write Data FIFO and
writes to memory at the address located in the User Write
Address FIFO.
user_r_n Input This active-Low signal is the write enable for the User Read
Address FIFO. The user asserts this signal to read new data
from memory. The QDRII memory controller reads the
address from the Read Address FIFO and does a memory
read to the corresponding memory address.
Notes:
1. All user interface signal names are prepended with a controller number, for example, cntrl0_QDR_Q. QDRII SRAM devices
currently support only one controller.
2. The user_clk is connected to clk_0 in the controller. If the user clock domain is different from clk_0 / user_clk of MIG, the user
should add FIFOs for all data inputs and outputs of the controller, in order to synchronize them to the user_clk.
3. The number of address bits used depends on the density of the memory part. The controller ignores the unused bits, which can all
be tied High.
Write Interface
Figure 4-13 illustrates the user interface block diagram for write operations.
user_ad_wr
User Interface
Address FIFO
(FIFO16)
user_w_n fifo_wr_empty
512 x 36 Controller
user_wr_full wr_init_n
Data FIFOs
user_dwl
Rise Data FIFO
(FIFO16)
512 x 36
user_dwh fifo_ad_wr
Fall Data FIFO
(FIFO16)
512 x 36
fifo_dwl
Data FIFOs
user_bwl_n fifo_bwh_n
ug086_c4_15_111507
The following steps describe the architecture of Address and Write Data FIFOs and how to
perform a write burst operation to QDRII memory from user interface.
1. The user interface consists of an Address FIFO, Data FIFOs and a byte write FIFO.
These FIFOs are built out of Virtex-4 FPGA FIFO16 primitives of configuration 512x 36.
2. The Address FIFO stores the QDRII memory address where the data is to be written
from the user interface. A single instantiation of a FIFO16 constitutes the Address
FIFO.
3. Two separate sets of Data FIFOs store the rising-edge and falling-edge data to be
written to QDRII memory from the user interface. For 9-bit, 18-bit, and 36-bit data
widths, two FIFO16s are required for storing rising-edge and falling-edge data. For a
72-bit data width, two FIFO16s are required for storing rising-edge data and two
FIFO16s for storing falling-edge data. MIG instantiates the required number of FIFOs
depending on the memory data width selected. For 9-bit and 18-bit configurations, the
controller pads the extra bits of the Data FIFO with 0s.
4. The Byte Write FIFO stores the Byte Write signals to QDRII memory from the user
interface. Extra bits are padded with zeros.
5. The user can initiate a write command to memory by writing to the Address FIFO,
Data FIFOs, and Byte Write FIFOs when FIFO Full flags are deasserted and after the
calibration done signal dly_cal_done is asserted. Users should not access any of these
FIFOs until dly_cal_done is asserted. The dly_cal_done signal assures that the clocks
are stable, the reset process is completed, and the controller is ready to accept
commands. Status signal user_wr_full is asserted when the Address FIFO, Data FIFOs,
or Byte Write FIFOs are full.
6. When user_w_n is asserted, user_ad_wr is stored in the Address FIFO, user_dwl and
user_dwh are stored in the Data FIFO, and user_bwl and user_bwh are stored in the
Byte Write FIFOs. A common write-enable signal is used to store the data into all three
FIFOs.
7. The controller reads the Address, Data, and Byte Write FIFOs when they are not empty
by issuing the wr_init_n signal. A QDRII memory write command is generated from
the wr_init_n signal by properly timing it.
user_clk
dly_cal_done
user_wr_full
user_w_n
user_ad_wr A0 A1 A2
user_wr_err
UG086_c4_16_111507
8. Figure 4-14 shows the timing diagram for a write command of BL = 4. The address
must be asserted for one clock cycle as shown. For burst lengths of four, each write to
the Address FIFO must have two writes to the Data FIFO consisting of two rising edge
data and two falling edge data.
9. Figure 4-15 shows the timing diagram for a write command of BL = 2. For a burst
length of two, each write to the Address FIFO is coupled to one write to the Data FIFO,
consisting of one rising edge data and one falling edge data. For BL = 2, commands can
be given in every clock.
user_clk
dly_cal_done
user_wr_full
user_w_n
user_ad_wr A0 A1 A2 A3 A4
user_wr_err
UG086_c4_17_010108
Read Interface
Figure 4-16 shows a block diagram for the read interface.
user_ad_rd
User Interface
fifo_rd_empty
Address FIFO
user_r_n
(FIFO16) rd_init_n
512 x 36 Controller
user_rd_full
fifo_qr_full
Component 0
Data FIFOs
fifo_drl
To/From IOBS
user_qr_empty
Component (n–1)
Data FIFOs
user_qrh Rise Data FIFO fifo_drh
(FIFO16)
512 x 36
ug086_c4_18_111507
The following steps describe the architecture of the Read Data FIFOs and show how to
perform a QDRII SRAM burst read operation from the user interface.
1. The read user interface consists of an Address FIFO and a Read Data FIFO. The
Address FIFO and Read Data FIFO are built from Virtex-4 FPGA FIFO16s of
configuration 512 x 36.
2. The size of the Address FIFO is always of 512 x 16.
3. The number of Read Data FIFOs required depends on the number of QDRII
components being used. Using 9-bit components for 36-bit data width, a total of eight
FIFOs are required, four for rising-edge data and four for falling-edge data. Although
each FIFO can accommodate 36-bit data, the requirement of having one FIFO per
component arises from CQ pattern calibration, where an internal pattern calibration is
done per CQ. The controller generates the Read Data FIFO write-enable signal for each
FIFO separately depending on the CQ pattern calibration.
4. To initiate a QDRII read command, the user must write the Address FIFO when the
FIFO full flag user_rd_full is deasserted and the calibration done signal dly_cal_done
is asserted. Writing to the Address FIFO indicates to the controller that it is a Read
command. The dly_cal_done signal assures that the controller clocks are stable, the
internal reset process is completed, and the controller is ready to accept commands.
5. The user must issue an Address FIFO write-enable signal user_r_n along with the read
address user_ad_rd to write the read address to the Address FIFO.
6. The controller reads the Address FIFO when status signal fifo_rd_empty is deasserted
and generates the appropriate control signals to QDRII memory required for a read
command.
7. Prior to the actual read and write commands, the design calibrates the latency (number
of clock cycles) from when the read command is issued to when the data is received.
Using this precalibrated delay information, the controller generates the write-enable
signals to the Read Data FIFOs. The delay calibration is done per QDRII component.
8. The Low state of user_qr_empty indicates read data is available. Asserting user_qen_n
reads rising-edge data and falling-edge data simultaneously on every rising edge of
the clock.
9. Figure 4-17 and Figure 4-18 show the user interface timing diagrams for BL = 4 and
BL = 2.
10. After the address is loaded into the Address FIFO, it can take 18 clock cycles (worst
case) for the controller to write the Data FIFOs.
user_clk
dly_cal_done
user_rd_full
user_r_n
user_ad_rd A0 A1 A2 A3 A4
user_rd_err
user_qen_n
user_qr_err
UG086_c4_19_111907
user_clk
dly_cal_done
user_rd_full
user_r_n
user_ad_rd A0 A1 A2 A3 A4
user_rd_err
user_qen_n
user_qr_err
UG086_c4_20_010208
When the Address box is checked in a bank, the address, qdr_w_n, qdr_r_n, and
qdr_dll_off_n bits are assigned to that particular bank.
When the Data Write box is checked in a bank, the memory data write and memory byte
write are assigned to that particular bank.
When the Data Read box is checked in a bank, the memory data read, memory read clocks,
memory write clocks, and memory input clock for the output data are assigned to that
particular bank.
When the System Control box is checked in a bank, the sys_rst_n, compare_error, and
dly_cal_done bits are assigned to that particular bank.
When the System Clock box is checked in a bank, the refclk_p, refclk_n, dly_clk_200_p,
and dly_clk_200_n bits are assigned to that particular bank.
For special cases, such as without a testbench and without a DCM, the corresponding
input and output ports are not assigned to any FPGA pins in the design UCF because the
user can connect these ports to the FPGA pins or can connect to some logic internal to the
same FPGA.
Note: Timing has been verified for most of the MIG generated configurations. For the best timing
results, adjacent banks in the same column of the FPGA should be used. Banks that are separated
by unbonded banks should be avoided because these can cause timing violations.
Supported Devices
The design generated out of MIG is independent of the memory package, hence the
package part of the memory component is replaced with X, where X indicates a don't care
condition. Table 4-9 shows the list of components supported by MIG.
Chapter 5
Feature Summary
This section summarizes the supported and unsupported features of the DDRII SRAM
controller design.
Supported Features
The DDRII SRAM controller design supports:
• A maximum frequency of 250 MHz
• Data widths of 9, 18, 36, and 72 bits
• Burst lengths of two and four
• Implementation using different Virtex-4 devices
• Operation with any 9-bit, 18-bit, and 36-bit memory component
• Verilog and VHDL
• With and without a testbench
• With and without a DCM
Unsupported Features
The DDRII SRAM controller design does not support:
• DDR SIO memory
Architecture
Figure 5-1 shows a top-level block diagram of the DDRII SRAM controller interface. One
side of the DDRII SRAM controller connects to the user interface denoted as Block
Application. The other side of the controller interfaces to DDRII memory. The memory
interface data width is selectable.
Block
Application
UG086_c5_01_012507
Data is double-pumped to DDRII memory on both the positive and the negative edges of
the clock. The HSTL_18 Class II I/O standard is used for data, and the HSTL_18 Class I
I/O standard is used for address, control, and memory clock signals.
DDRII memory interfaces are source-synchronous and double data rate like DDR SDRAM
interfaces.
Interface Model
The Memory interface is layered to simplify the design and make the design modular.
Figure 5-2 shows the layered memory interface used in the DDRII SRAM controller. The
three layers are the application layer, the implementation layer, and the physical layer.
User Interface
Implementation Layer
Physical Layer
UG086_c5_02_012507
Architecture
The application layer comprises the user interface, which initiates memory writes and
reads by writing data and memory addresses to the User Interface FIFOs. The
implementation layer comprises the infrastructure, datapath, and control logic.
• The infrastructure logic consists of the DCM and reset logic generation circuitry.
• The datapath logic consists of the calibration logic by which the data from the
memory component is captured using the FPGA clock.
• The control logic determines the type of data transfer, that is, read/write with the
memory component, depending on the User Interface FIFO’s status signals.
The physical layer comprises the I/O elements of the FPGA. The controller communicates
with the memory component using this layer. I/ O elements (such as IDDRs, ODDRs,
IDELAY, and OFLOPs) are associated with this layer.
Hierarchy
Figure 5-3 shows the hierarchical structure of the DDRII SRAM design generated by MIG
with a testbench and a DCM.
<top_
module>
infrastructure_ idelay_
main*
top* ctrl
test_
top*
bench*
rd_wr_
data_path clock_ ctrl_iobs* read_ tap_ write_ wr_data_ rd_data_
addr_
_iobs* forward* ctrl* logic* burst* interface* interface*
interface*
Design Modules
Test Bench Modules
DCM and Reset Generation Modules
Note: A block with a * has a parameter file included. UG086_c5_03_112907
There is a parameter file generated with the design that has all the user input and design
parameters selected from MIG.
MIG can generate DDRII SRAM designs in four different ways:
• With a testbench and a DCM
• Without a testbench and with a DCM
• With a testbench and without a DCM
• Without a testbench and without a DCM
MIG outputs both an example_design and a user_design. The MIG-generated
example_design includes the entire memory controller design along with a synthesized
testbench (example user application). This testbench generates sample writes and reads
and then uses comparison logic to verify that the data patterns written are the same as
those received. This example_design can be used to test functionality both in simulation
and in hardware. The user_design includes the memory controller design only. This design
allows users to connect the MIG memory controller design to a user developed testbench
(user application). Refer to Table 5-5 for user interface signals, “Write Interface,” page 243
and “Read Interface,” page 246 for timing restrictions on user interface signals, and
Figure 5-12, page 244 and Figure 5-13, page 245 for write interface timing.
Design clocks and resets are generated in the infrastructure_top module. When Use DCM
option is checked in MIG, a DCM primitive and the necessary clock buffers are instantiated
in the infrastructure_top module. The inputs to this module are the differential design
clock and a 200 MHz differential clock required for the IDELAYCTRL module. A user reset
is also input to this module. Using the input clocks and reset signals, the system clocks and
system resets used in the design are generated in this module.
When the Use DCM option is unchecked in MIG, the infrastructure_top module does not
have the DCM and the corresponding clock buffer instantiations. Therefore, the system
operates on the user-provided clocks. The system reset is generated in the
infrastructure_top module using the dcm_lock signal and the ready signal of the
IDELAYCTRL element. For more information on the clocking structure, refer to “Clocking
Scheme,” page 237.
Architecture
Figure 5-4 shows a top-level block diagram of a DDRII SRAM design with a DCM and a
testbench. refclk_p and refclk_n are differential input reference clocks. The DCM is
instantiated in the infrastructure module that generates the required design clocks.
dly_clk_200_p and dly_clk_200_n are used for the IDELAYCTRL element. sys_rst_n is the
system reset signal. All design resets are generated using the dcm_locked signal, the
sys_rst_n signal, and the dly_ready signal of the IDELAYCTRL element. The
compare_error output signal indicates whether the design passes or fails. The
dly_cal_done signal indicates the completion of initialization and calibration of the design.
Because the DCM is instantiated in the infrastructure module, it generates the required
clocks and resets signals for the design.
user_reset200
dly_ready
idelay_ctrl
clk_200_p
refclk_p clk_200
clk_200_n
refclk_n
Reference
Clocks dly_clk_200_p infrastructure IBUFGDS
and Reset dly_clk_200_n _top
clk_0 ddr_dll_off_n
sys_rst_n
clk_270 ddr_ld_n
user_reset ddr_rw_n
user_reset270 ddr_k
ddr_k_n
Memory
ddr_c Device
main0
ddr_c_n
ddr_sa
ddr_bw_n
ddr_dq
UG086_c5_04_071808
Figure 5-4: Top-Level Block Diagram of the DDRII SRAM Design with a DCM and a Testbench
Figure 5-5 shows a top-level block diagram of a DDRII SRAM design with a testbench but
without a DCM. The user should provide all the clocks and the dcm_locked signal. These
clocks should be single-ended. sys_rst_n is the system reset signal. All design resets are
generated using the dcm_locked signal, the sys_rst_n signal, and the dly_ready signal of
the IDELAYCTRL element. The user application must have a DCM primitive instantiated
in the design, and all user clocks should be driven through BUFGs. The compare_error
output signal indicates whether the design passes or fails. The testbench module does
writes and reads, and also compares the read data with the written data. The
compare_error signal is driven High on data mismatches. The dly_cal_done signal
indicates the completion of initialization and calibration of the design.
clk_200
User idelay_ctrl
user_reset200 dly_ready
DCM
Clocks clk_0
and infrastructure
Reset clk_270 _top user_reset270 ddr_dll_off_n
dcm_locked
user_reset ddr_dl_n
sys_rst_n
ddr_rw_n
ddr_k
ddr_k_n
ddr_c Memory
main0 Device
ddr_c_n
ddr_sa
compare_error ddr_bw_n
Status
Signals dly_cal_done ddr_dq
ddr_cq
UG086_c5_05_071808
Figure 5-5: Top-Level Block Diagram of the DDRII SRAM Design without a DCM but with a Testbench
Figure 5-6, page 231 shows a top-level block diagram of a DDRII SRAM design with a
DCM but without a testbench. refclk_p and refclk_n are differential input reference clocks.
The DCM is instantiated in the infrastructure module that generates the required design
clocks. dly_clk_200_p and dly_clk_200_n are used for the IDELAYCTRL element.
sys_rst_n is the system reset signal. All design resets are generated using the dcm_locked
signal, the sys_rst_n signal, and the dly_ready signal of the IDELAYCTRL element. The
user has to drive the user application signals. The design provides the user_clk and
user_rst signals to the user to synchronize the user application signals with the design. The
signal user_clk is connected to clk0 clock signal in the controller. If the user clock domain
is different from clk0/user_clk, the user should add FIFOs for all the inputs and output of
the controller (user application signals), in order to synchronize them to user_clk clock.
The dly_cal_done signal indicates the completion of initialization and calibration of the
design.
Architecture
dly_ready
user_reset200
idelay_ctrl
clk_200_p
refclk_p clk_200
clk_200_n
refclk_n
Reference
Clocks dly_clk_200_p infrastructure
IBUFGDS
and Reset _top
dly_clk_200_n
clk_0 ddr_dll_off_n
sys_rst_n
clk_270 ddr_ld_n
user_reset ddr_rw_n
user_reset270 ddr_k
ddr_k_n
dly_cal_done Memory
ddr_c Device
user_qen_n ddr_c_n
wr_data_full ddr_sa
addr_full ddr_bw_n
rd_data_valid ddr_dq
wr_data_wrerr ddr_cq
addr_wrerr main0
rd_data_rderr
user_clk
user_rst
User
Application user_dwl
user_dwh
user_qrl
user_qrh
user_bwl_n
user_bwh_n
user_addr_cmd
user_data_wr_ena_n
user_addr_wr_ena_n
rd_data_empty
UG086_c5_06_071808
Figure 5-6: Top-Level Block Diagram of the DDRII SRAM Design with a DCM but without a Testbench
Figure 5-7 shows a top-level block diagram of a DDRII SRAM design without a DCM or a
testbench. The user should provide all the clocks and the dcm_locked signal. These clocks
should be single-ended. sys_rst_n is the system reset signal. All design resets are generated
using the dcm_locked signal, the sys_rst_n signal, and the dly_ready signal of the
IDELAYCTRL element. The user application must have a DCM primitive instantiated in
the design, and all user clocks should be driven through BUFGs. The user has to drive the
user application signals. The design provides the user_clk and user_rst signals to the user
to synchronize the user application signals with the design. The signal user_clk is
connected to clk0 clock signal in the controller. If the user clock domain is different from
clk0/user_clk, the user should add FIFOs for all the inputs and output of the controller
(user application signals), in order to synchronize them to user_clk clock.
The dly_cal_done signal indicates the completion of initialization and calibration of the
design.
dly_ready
CLK_200
User idelay_ctrl
DCM USER_RESET200
Clocks CLK_0
and Infrastructure
Reset CLK_270 _top USER_RESET270 DDR_DLL_OFF_n
DCM_LOCKED
USER_RESET DDR_LD_N
SYS_RST_N
DDR_RW_N
DDR_K
DDR_K_N
DLY_CAL_DONE DDR_C Memory
Device
USER_QEN_n DDR_C_N
WR_DATA_FULL DDR_SA
ADDR_FULL DDR_BW_N
RD_DATA_VALID DDR_DQ
WR_DATA_WRERR DDR_CQ
ADDR_WRERR main0
RD_DATA_RDERR
USER_CLK
USER_RST
User
Application USER_DWL
USER_DWH
USER_QRL
USER_QRH
USER_BWL_n
USER_BWH_n
USER_ADDR_CMD
USER_DATA_WR_ENA_n
USER_ADDR_WR_ENA_n
RD_DATA_EMPTY
UG086_c5_07_121907
Figure 5-7: Top-Level Block Diagram of the DDRII SRAM Design without a DCM or a Testbench
Architecture
user_clk infrastructure_top
DDRII
user_data data_path IOBS SRAM
Interface
UG086_c5_08_071808
Figure 5-9 shows the DDRII SRAM controller modules with a 36-bit interface.
user_bwl_n
Write Path ddr_bw_n
user_bwh_n
ddr_dq
user_dwl
user_dwh
Read Path
user_qrl
ddr_cq
user_qrh
wr_data_full
ddr_k
addr_full
Delay ddr_k_n
rd_data_valid clk_0
Calibration
rd_data_empty
State Machine
UG086_c5_09_071808
Controller
The DDRII SRAM controller initializes the memory, accepts and decodes the user
commands, and generates the READ and WRITE commands. It also generates control
signals for other modules. After power on it starts the calibration, after the calibration is
completed it process the READ or WRITE commands.
Datapath
The Datapath module transmits and receives data to and from the memories. Its major
functions are listed below:
• Asserts a write-enable signal for memories with burst lengths of two or four
• Asserts a read-enable signal to memory and a write-enable signal to the User Read
Data FIFO
• Generates increment/decrement signals (tap count) for IDELAY elements in the IOBS
• Center-aligns the data window to the FPGA clock
Refer to XAPP703 [Ref 20] for techniques on data writes to memory and data captures from
memory. For burst lengths of four and two, the write-enable signal is asserted one clock
before the write data is driven on the memory bus. The data is driven on both edges of the
clock. The address to memory is driven for one full clock cycle.
Memory read data is edge-aligned with the source-synchronous clock, CQ. The DDRII
clock, CQ, to which read data is synchronized, is a free-running strobe. The free-running
strobe from the memory CQ is captured using the FPGA clock. Thus the relation between
the CQ strobe and FPGA clock is found, and the strobe CQ is center-aligned with the FPGA
clock by delaying the CQ strobe in the IDELAY element. The same logic is applied to the
read data window. The read data window is center-aligned with the same FPGA clock.
This in turn means that the same amount of tap delays are applied on both the read data
window and the strobe CQ through the IDELAY elements to center-align the read data and
strobe CQ windows with respect to the FPGA clock. Center-aligning the read data window
with respect to the FPGA clock completes the data capturing logic.
The delay calibration circuit generates the delay reset, delay select, and delay increment
values for IDELAY elements used in delaying strobes and data read from memory. The
strobe is center-aligned with the FPGA clock, which results in the data window falling to
the center of the FPGA clock. Refer to XAPP703 [Ref 20] for details about the delay
calibration.
Architecture
Infrastructure
The infrastructure (infrastructure_top) module generates the FPGA clock and reset signals.
When differential clocking is used, refclk_p, refclk_n, dly_clk_200_p, and dly_clk_200_n
signals appear. When single-ended clocking is used, refclk and idly_clk_200 signals
appear. In addition, clocks are available for design use and a 200 MHz clock is provided for
the IDELAYCTRL primitive. Differential and single-ended clocks are passed through
global clock buffers before connecting to a DCM. For differential clocking, the output of the
refclk_p/refclk_n buffer is single-ended and is provided to the DCM input. Likewise, for
single-ended clocking, refclk is passed through a buffer and its output is provided to the
DCM input. The outputs of the DCM are 0° and 270° phase-shifted versions of the input
clock. After the DCM is locked, the design is in the reset state for at least 25 clocks. The
infrastructure module also generates all of the reset signals required for the design.
Idelay_ctrl
This module instantiates the IDELAYCTRL primitive of the Virtex-4 FPGA. The
IDELAYCTRL primitive is used to continuously calibrate the individual delay elements in
its region to reduce the effect of process, temperature, and voltage variations. A 200 MHz
clock has to be fed to this primitive.
The MIG tool instantiates the required number of IDELAYCTRLs in the RTL and uses the
LOC constraints in the UCF file to fix their locations. The number of IDELAYCTRLs is
defined by the IDELAYCTRL_NUM parameter in the idelay_ctrl module. In the RTL,
DLY_READY is generated by doing a logical AND of the RDY signals of every
IDELAYCTRL block.
IDELAYCTRL LOC constraints should be checked in the following cases:
• The MIG design is used with other IP cores or user designs that also require the use of
IDELAYCTRL and IDELAYs.
• Previous ISE® software releases 8.2.03i and 9.1i had an issue with IDELAYCTRL block
replication or trimming. When using these revisions of the ISE software, the user must
instantiate and constrain the location of each IDELAYCTRL individually.
See UG070 [Ref 7] for more information on the requirements of IDELAYCTRL placement.
IOBS
All the input and output signals of the DDRII SRAM controller are implemented in the
IOBS module. All address and byte enable signals are registered in the IOBs and driven
out.
The IDELAY elements for the read strobe and data read from memory are implemented in
the IOBS. The IOBS also implements bidirectional buffers for write and read data. The
IOBS registers the output data (ODDR) before driving it out and also registers the input
data (IDDR).
Test Bench
The MIG tool generates two RTL folders, example_design and user_design. The
example_design folder includes the synthesizable test bench, while user_design does not
include the test bench modules. The MIG test bench performs one write command
followed by one read command in an alternating manner. The number of words in a write
command depends on the burst length. For a burst length of 4, the test bench writes a total
of 4 data words for a single write command (2 rise data words and 2 fall data words). For
a burst length of 2, the test bench writes a total of 2 data words. On every write command,
the data pattern is incremented by one, and this is repeated with each subsequent write
command. The initial data pattern for the first write command is 000. The test bench
writes the 000, 001, 002, 003 data pattern in a sequence in which 000 and 002 are rise
data words and 001 and 003 are fall data words for a 9-bit design. The falling edge data is
always rising edge data plus one. For a burst length of 2, the data sequence for the first
write command is 000, 001. The data sequence for the second write command is 002, 003.
The pattern is then incremented for the next write command. For data widths greater than
9, the same data pattern is concatenated for the other bits. For a 36-bit design and burst
length of 4, the data pattern for the first write command is 000000000, 008040201,
010080402, 0180C0603.
Address generation logic generates the address in an incremental pattern for each write
command. The same address location is repeated for the next read command. In Samsung
components, the burst address increments are done by the memory, so the address is
generated by the test bench in a linear incremental pattern. In Cypress parts, the MIG test
bench increments the address for burst operation. After the address reaches the maximum
value, it rolls back to the initial address, i.e., 00000.
During reads, comparison logic compares the read pattern with the pattern written, i.e., the
000, 001, 002, 003 pattern. For example, for a 9-bit design of burst length 4, the data
written for a single write command is 000, 001, 002, and 003. During reads, the read
pattern is compared with the 000, 001, 002, 003pattern. Based on a comparison of the
data, a status signal error is generated. If the data read back is the same as the data written,
the error signal is 0, otherwise it is 1.
Clocking Scheme
2. In the second stage of calibration, the write enable signal for the Read Data FIFO is
determined by delaying the controller-issued read command. This delay is calibrated
based on the delay between the read command and the corresponding read data at the
Read Data FIFO. For this delay calibration, the controller writes a known fixed pattern
of data into a memory location and reads back from the same location. This read data
is compared against the known fixed pattern. The delay between the read command
and the correct pattern read data comparison is the delay calibration.
The final_dly_cal_done port in the data_path module indicates the status of the second
stage calibration. When final_dly_cal_done is asserted High, it indicates the
completion of second stage calibration, which implies the completion of the whole
initialization and calibration process. After the initialization and calibration is done
(i.e., the dly_cal_done signal in design_top is asserted High), the controller can start
issuing user commands to the memory.
In the second stage calibration, when the pattern read data does not match with the
pattern write data, the controller does not issue any further pattern read commands
and the controller gets stuck in the calibration state. The design must be restarted for
the calibration to start from the beginning.
Clocking Scheme
Figure 5-10, page 238 shows the clocking scheme for this design. Global and local clock
resources are used.
The global clock resources consist of a DCM, two BUFGs on DCM output clocks, and one
BUFG for clk_200.The local clock resources consist of regional I/O clock networks
(BUFIO). The global clock architecture is discussed in this section.
The MIG tool allows the user to customize the design such that the DCM is not included.
In this case, clk_0 and clk_270 must be supplied by the user.
and user design. For designs with out DCM instantiation, DCM and the BUFGs should be
instantiated at user end to generate the required clocks.
Notes:
1. See “DDRII SRAM Controller Interface Signals,” page 239 for timing requirements and restrictions on
the user interface signals.
DCM
BUFG
GC I/O
SYSTEM CLK
CLK0 clk_0
CLKIN
UG086_c5_21_071808
User Interface
The user interface consists of seven FIFOs. The User Write interface has four FIFOs: one
FIFO is used for the memory address, two FIFOs contain positive-edge and negative-edge
data for memory, and the remaining FIFO is used for Byte Writes. The DDRII SRAM
controller checks the not empty status of these FIFOs and initiates a memory write. The
user interface is single data rate (SDR). The controller handles the conversion from the SDR
user interface to the DDR Memory interface and vice versa.
The User Read interface has three FIFOs, where one FIFO is used for the memory address
and the remaining two FIFOs contain positive-edge and negative-edge data read from
memory. The user writes to the User Read Address FIFO the memory address from which
data is to be read. The DDRII SRAM controller checks the status of this FIFO and initiates
a memory read burst. The data read is stored in the User Read Data FIFOs. The user reads
these FIFOs to access the data read from memory. The FIFOs are built using FIFO16
primitives in the rd_data_interface, rd_wr_addr_interface, and wr_data_interface
modules. Each FIFO has a threshold attribute called ALMOST_FULL_OFFSET whose
value is set to F, by default, in the RTL. This value can be changed as needed. For valid
FIFO threshold offset values, refer to UG070 [Ref 7].
Refer to Table 5-3 for how the user can access these FIFOs.
Table 5-5: DDRII SRAM User Interface Signals (without a Testbench) (Cont’d)
Signal Name (1) Direction Description
user_dwh [(data_width–1):0] Input Negative-edge data for memory writes. The data bus is
valid when the WRITE command (DDR_LD_N=0 &&
DDR_RW_N=0) is asserted.
user_qrl [(data_width–1):0] Output Positive-edge data read from memory. This data is output
when user_qen_n is asserted.
user_qrh [(data_width–1):0] Output Negative-edge data read from memory. This data is output
when user_qen_n is asserted.
user_bwl_n [(bw_width–1):0] Input Byte enables for DDRII memory positive-edge write data.
The byte enables are valid when the WRITE command
(DDR_LD_N=0 && DDR_RW_N=0) is asserted.
user_bwh_n[(bw_width–1):0] Input Byte enables for DDRII memory negative-edge write data.
The byte enables are valid when the WRITE command
(DDR_LD_N=0 && DDR_RW_N=0) is asserted.
usr_addr_cmd[addr_width:0] (3) Input DDRII memory address for read or write operation. This
address is valid when USER_DATA_WR_ENA_n is
asserted. An extra bit is driven by the user to represent the
command.
user_qen_n Input This active-Low signal is the read enable for the User Read
Data FIFOs. The DDRII memory controller captures the
data read from memory and stores it in the Read Data
FIFOs. The user can access these FIFOs to get the data read
from memory.
user_data_wr_ena_n Input This active-Low signal is the write enable for the User Write
Data FIFOs. The user asserts this signal to write new data to
the FIFOs. The DDRII SRAM controller reads the data from
the User Write Data FIFO and writes to memory.
user_addr_wr_ena_n Input This active-Low signal is the write enable for the User Read
Write Address FIFO. The user asserts this signal to write
write/read address and command in to user read write
address FIFO.
Notes:
1. All user interface signal names are prepended with a controller number, for example, cntrl0_ddr_dq. DDRII SRAM devices
currently support only one controller.
2. The user_clk signal is connected to clk_0 in the controller. If the user clock domain is different from clk_0 / user_clk of the MIG, the
user should add FIFOs for all data inputs and outputs of the controller in order to synchronize them to the user_clk.
3. The number of address bits used depends on the density of the memory part. The controller ignores the unused bits, which can all
be tied High.
Write Interface
Figure 5-11 shows the user interface block diagram for write operations.
user_add_cmd
User Interface
addr_empty
Address FIFO fifo_addr_rd_ena_n
(FIFO16)
512 x 36 wr_rd_cmd Controller
user_addr_wr_ena_n
wr_init_n
Data FIFOs
addr_full
Rise Data FIFO
(FIFO16)
512 x 36
user_dwl
fifo_dwh
user_dwh
Fall Data FIFO
To IOBS
(FIFO16)
512 x 36
fifo_bwl
user_bwl_n
fifo_bwh
Byte Write FIFO
(FIFO16)
user_bwh_n
512 x 36
ug086_c5_15_010208
The following steps describe the architecture of the Address and Write Data FIFOs and
show how to perform a write burst operation to DDRII memory from the user interface.
1. The user interface consists of an Address FIFO, Data FIFOs, and a Byte Write FIFO.
These FIFOs are constructed using Virtex-4 FPGA FIFO16 primitives with a 512 x 36
configuration.
2. The common Address FIFO is used for both write and read commands, and comprises
a command part and an address part. The command bit (bit 0 of the Address FIFO)
discriminates between write and read commands; the address starts at bit 1. The
command bit should be set to 0 for writes and to 1 for reads.
3. Two separate sets of Data FIFOs are used for storing the rising-edge and falling-edge
data to be written to DDRII memory from the user interface. For 9-bit, 18-bit, and 36-bit
data widths, two FIFO16s are required for storing rising-edge and falling-edge data.
For 72-bit data width, two FIFO16s are required for rising-edge data and two for
falling-edge data. MIG instantiates the required number of FIFOs to gain the required
data width. For 9-bit and 18-bit configurations, the controller pads the extra bits of the
Data FIFO with 0s.
4. The Byte Write FIFO is used to store the Byte Write signals to DDRII memory from the
user interface. The controller internally pads all zeros for the unused bits.
5. The user can initiate a write to memory by writing to the Address FIFO, Data FIFOs,
and Byte Write FIFO when the FIFO full flags are deasserted and after dly_cal_done is
asserted. The user should not access any of these FIFOs until dly_cal_done is asserted.
The dly_cal_done signal assures that the clocks are stable, the reset process is
completed, and the controller is ready to accept commands. Status signals addr_full
and wr_data_full are asserted when the Address FIFO and Data FIFOs or Byte Write
FIFO are full.
6. When user_addr_wr_ena_n is asserted, the user address is stored in the Address FIFO.
Similarly, when user_data_wr_ena_n is asserted, user_dwl, user_dwh, user_bwl, and
user_bwh are stored into corresponding FIFOs. A common write-enable signal is used
to enable both the Data FIFO and the Byte Write FIFO.
7. The controller reads the address and decodes the command bit. The write command
wr_init_n is issued if the command bit is 0 when the Address FIFO is not empty. This
command acts as a read-enable to the Data and Byte Write FIFOs. The DDRII memory
write command is generated from the wr_init_n signal by properly timing it.
8. Figure 5-12 shows the timing diagram for a write command of BL = 4. The address
should be asserted for one clock cycle as shown. For burst lengths of four, each write to
the Address FIFO should have two writes to the Data FIFO consisting of two rising-
edge data and two falling-edge data.
user_clk
dly_cal_done
addr_full
user_addr_wr_ena_n
addr_wrerr
user_data_wr_ena__n
wr_data_wrerr
Figure 5-12: Write User Interface Timing diagram for BL = 4
9. Figure 5-13 shows the timing diagram for a write command of BL = 2. For burst length
of two, each write to Address FIFO has one write to Data FIFO, consisting of one
rising-edge data and one falling-edge data. For burst length of two, commands can be
given in every clock.
user_clk
dly_cal_done
addr_full
user_addr_wr_ena_n
addr_wrerr
user_data_wr_ena__n
wr_data_wrerr
UG086_c5_17_112907
Read Interface
Figure 5-14 shows the user interface block diagram for read operations.
user_addr_cmd
User Interface addr_empty
fifo_addr_rd_ena_n
Address FIFO
user_addr_wr_ena_n wr_rd_cmd
(FIFO16)
512 x 36 Controller
addr_full rd_data_full
wr_init_n
Component 0
Data FIFOs
fifo_drl
To/From IOBS
Component (n–1)
Data FIFOs
user_qrh Rise Data FIFO fifo_drh
(FIFO16)
512 x 36
ug086_c5_18_010108
The following steps describe the architecture of Read Data FIFOs and show how to
perform a burst read operation from DDRII SRAM from the user interface.
1. The read user interface consists of a common Address FIFO and a Read Data FIFO. The
Address FIFO and Read Data FIFO are constructed using FIFO16s with a 512 x 16
configuration.
2. The number of Read Data FIFOs required depends on the number of DDRII
components used. Using 9-bit components for 36-bit data width, a total of eight FIFOs
are required, four FIFOs for rising-edge data and four FIFOs for falling-edge data.
Though each FIFO can accommodate 36-bit data, the requirement of having one FIFO
per component arises from the CQ pattern calibration. Internal pattern calibration is
done per CQ. Controller generates the Read Data FIFO write-enable signal for each
FIFO separately, depending on the CQ pattern calibration.
3. To initiate a DDRII read command, the user should write the Address FIFO with the
command bit set to logic 1 when the FIFO addr_full flag is deasserted and the
dly_cal_done signal is asserted. The dly_cal_done signal assures the controller clocks
are stable, the internal reset process is completed, and the controller is ready to accept
commands.
4. The user should issue the Address FIFO write enable signal user_addr_wr_ena_n
along with user_addr_cmd to write the address to the Address FIFO.
5. When status signal addr_empty is deasserted, the controller reads the Address FIFO.
If the command bit is 1 when the Read Data FIFO is not full, the appropriate control
signal required for a read command is sent to the DDRII memory.
6. Prior to the actual read and write commands, the design calibrates the latency from the
time the read command is issued to the time data is received in terms of the number of
clock cycles. Using the precalibrated delay information between the read commands to
read data, the controller generates the write-enable signals to the Read Data FIFOs.The
delay calibration is done per DDRII component.
7. The Low state of rd_data_empty indicates read data is available. Asserting user_qen_n
reads rising-edge data and falling-edge data simultaneously on every rising edge of
the clock.
8. Figure 5-15 and Figure 5-16 shows the user interface timing diagrams for BL = 2 and
BL = 4.
user_clk
dly_cal_done
user_rd_full
user_addr_wr_ena_n
user_ad_rd A0 A1 A2 A3 A4
addr_wrerr
rd_data_valid
user_qen_n
user_rd_err
UG086_c5_19_121907
user_clk
dly_cal_done
user_rd_full
user_addr_wr_ena_n
user_ad_rd A0 A1 A2 A3 A4
addr_wrerr
rd_data_valid
user_qen_n
user_rd_err
UG086_c5_20_010208
Table 5-7 shows the maximum read latency of the controller. Maximum latency occurs
when the read command is given to an empty FIFO.
When the Address box is checked in a bank, the address, ddr_ld_n, ddr_rw_n,
ddr_dll_off_n bits are assigned to that particular bank.
When the Data box is checked in a particular bank, the memory data, the memory byte
write, the memory read clocks, the memory write clocks, and the memory input clock for
the output data are assigned to that particular bank.
When the System Control box is checked in a bank, the sys_rst_n, compare_error, and
dly_cal_done bits are assigned to that particular bank.
When the System Clock box is checked in a bank, the refclk_p, refclk_n, dly_clk_200_p,
and dly_clk_200_n bits are assigned to that particular bank.
For special cases, such as without a testbench and without a DCM, the corresponding
input and output ports are not assigned to any pins of the FPGA in the design UCF because
the user can connect these ports to the FPGA pins or can connect to some logic internal to
the same FPGA.
Note: Timing has been verified for most of the MIG generated configurations. For the best timing
results, adjacent banks in the same column of the FPGA should be used. Banks that are separated
by unbonded banks should be avoided because these can cause timing violations.
Supported Devices
The design generated out of MIG is independent of the memory package, hence the
package part of the memory component is replaced with X, where X indicates a don't care
condition. Table 5-9 shows the list of components supported by MIG.
Chapter 6
Feature Summary
This section summarizes the supported and unsupported features of the RLDRAM II
controller design.
Supported Features
The RLDRAM II controller design supports the following:
• A maximum frequency of 250 MHz
• Both SIO and CIO memories
• Multiplexed and non-multiplexed addresses
• All configurations (Config1, Config2, and Config3)
• x9, x18, and x36 components
• Data widths of 9, 18, 36, and 72 bits
• Back-to-back read and write operations
• Write followed by read operations
• Read followed by write operations
• All combinations of the Mode Register
• XST and Synplicity synthesis tools
• Verilog and VHDL
• With and without a testbench
• With or without a DCM
Unsupported Features
The RLDRAM II controller design does not support:
• Commands in successive clocks with a burst length of 2. The controller processes
these commands with one extra clock latency. For example, a READ or WRITE
sequence of commands, BL = 2, Configuration = Any, CIO/SIO.
Architecture
Architecture
Figure 6-1 shows a top-level block diagram of the RLDRAM II memory controller.
Memory Controller
Infrastructure_top RLDRAM II
User
CIO/SIO
Application
Memory
Top
UG086_c6_01_012007
Figure 6-2 shows the hierarchical structure of the RLDRAM II design generated by MIG
with a testbench and a DCM.
<top_
module>
infrastructure_
main*
top*
test_ clk_
top* rld_rst*
bench* module
byte_ data_ infrastructure controller rld_ data_ data_ tap_ rld_ rld_ rld_
compare path_ _iobs* rld_ctl*
_iobs* conf* read* write* logic* rdfifo* wdfifo* mergedfifo
iobs*
Design Modules
Test Bench Modules
Clock Module and Reset Generation Module
Note: A block with a * has a parameter file included.
UG086_c6_02_091307
Architecture
Figure 6-3 shows a block diagram representation of an RLDRAM II design with a DCM
and a testbench. The design inputs are the system clocks and the user reset. sysreset_n is
the system reset signal. All design resets are generated using the dcm_locked signal, the
sysreset_n signal, and the idelay_ctrl_rdy signal of the IDELAYCTRL element. The
pass_fail output signal indicates whether the design passes or fails. The init_done signal
indicates the completion of initialization and calibration of the design. Required clocks and
reset signals for the design are generated from the clk_module and the rld_rst modules,
respectively. The clk_module instantiates the DCM primitive. The infrastructure_top
module instantiates the clk_module and the rld_rst modules.
clk_200_p
clk200_in
clk_200_n
clkglob
clk90
clk200_p rld2_we_n
rsthard_clk200
clk200_n rld2_ref_n
System rsthard
Clocks sysclk_p infrastructure rld2_cs_n
rsthard_180
and Reset sysclk_n _top
rsthard_270 rld2_ba
sysreset_n rld2_a
rstconfig
rst_init rld2_ck
main_0
rld2_ck_n Memory
rld2_dk Device
idelay_ctrl_rdy rld2_dk_n
calibration_done rld2_dm
pass_fail rld2_dq
init_done rld2_qvld
rld2_qk
rld2_qk_n
UG086_c6_03_071808
Figure 6-3: Top-Level Block Diagram of the RLDRAM II Design with a DCM and a Testbench
Figure 6-4 shows a block diagram representation of the top-level RLDRAM II module
without a DCM but with a testbench. Design inputs are the user clocks and the user reset.
sysreset_n is the system reset signal. All design resets are generated using the dcm_locked
signal, the sysreset_n signal, and the idelay_ctrl_rdy signal of the IDELAYCTRL element.
The design uses the user input clocks. These clocks should be single-ended. The user
application must have a DCM primitive instantiated in the design, and all user clocks
should be driven through BUFGs. The pass_fail output signal indicates whether the design
passes or fails. The init_done signal indicates the completion of initialization and
calibration of the design.
rsthard_clk200
clk_200 rsthard rld2_we_n
System infrastructure
clkglob _top rsthard_180 rld2_ref_n
Reset
and User clk90 rsthard_270 rld2_cs_n
DCM sysreset_n rstconfig rld2_ba
Clocks rst_init
dcm_locked rld2_a
rld2_ck
main_0
rld2_ck_n
Memory
rld2_dk Device
rld2_dk_n
rld2_dm
idelay_ctrl_rdy rld2_dq
calibration_done rld2_qvld
pass_fail rld2_qk
Init_done rld2_qk_n
UG086_c6_04_071808
Figure 6-4: Top-Level Block Diagram of the RLDRAM II Design without a DCM but with a Testbench
Figure 6-5, page 259 shows a block diagram representation of the top-level RLDRAM II
module with a DCM but without a testbench. Design inputs are the system clocks and
reset. sysreset_n is the system reset signal. All design resets are generated using the
dcm_locked signal, the sysreset_n signal, and the idelay_ctrl_rdy signal of the
IDELAYCTRL element. User must drive the user application signals. The design provides
the clkglob_tb and rsthard_tb signals to the user to synchronize the user application
signals with the design. The signal clkglob_tb is connected to clkglob clock signal in the
controller. If the user clock domain is different from clkglob/clkglob_tb, the user should
add FIFOs for all the inputs and output of the controller (user application signals), in order
to synchronize them to clkglob_tb clock. The required design clocks and design reset
signals for the design are generated from the clk_module and the rld_rst modules,
respectively. The clk_module instantiates the DCM primitive. The infrastructure_top
module instantiates the clk_module and rld_rst modules. The init_done signal indicates
the completion of initialization and calibration of the design.
Architecture
clk_200_p
clk200_in
clk_200_n
clkglob
clk200_p clk90
clk200_n rsthard_clK200
System
clocks sysclk_p Infrastructure rsthard
and reset sysclk_n _top
rsthard_180
sysreset_n rsthard_270
rstconfig
rst_init
rld2_we_n
idelay_ctrl_rdy rld2_ref_n
calibration_done rld2_cs_n
init_done rld2_ba
rlwdfull rld2_a
rlaffull
rlafempty rld2_ck
rlrdfempty rld2_ck_n Memory
rlwdfempty top_0 Device
rld2_dk
apconfrd
burstlength rld2_dk_n
rldreaddata rld2_dm
clkglob_tb
rsthard_tb rld2_dq
User init_done_tb rld2_qvld
Application apaddr rld2_qk
apvalid
apwritedvalid rld2_qk_n
apconfa
apconfwrd
apconfrd
apconfwr
aprdrden
apwritedata
apwritedm
issuemrs_tb
UG086_c6_05_071808
Figure 6-5: Top-Level Block Diagram of the RLDRAM II Design with a DCM but without a Testbench
Figure 6-6 shows a block diagram representation of the top-level RLDRAM II module
without a DCM or a testbench. Design inputs are the user clocks and the user reset.
sysreset_n is the system reset signal. All design resets are generated using the dcm_locked
signal, the sysreset_n signal, and the idelay_ctrl_rdy signal of the IDELAYCTRL. The
design uses the user input clocks, which should be single-ended. The user application
must have a DCM primitive instantiated in the design, and all user clocks should be driven
through BUFGs. User must drive the user application signals. The design provides the
glob and rsthard_tb signals to the user to synchronize the user application signals with the
design. The signal clkglob_tb is connected to clkglob clock signal in the controller. If the
user clock domain is different from clkglob/clkglob_tb, the user should add FIFOs for all
the inputs and output of the controller (user application signals), in order to synchronize
them to clkglob_tb clock. The Init_done signal indicates the completion of initialization
and calibration of the design.
clk_200 rsthard_clk200
System clkglob rsthard
Reset rsthard_180
and User clk90 infrastructure
DCM sysreset_n _top rsthard_270
Clocks rstconfig
dcm_locked
rst_init
rld2_we_n
rld2_ref_n
rld2_cs_n
idelay_ctrl_rdy
rld2_ba
calibration_done
rld2_a
init_done
rld2_ck
rlwdfull rld2_ck_n
rlaffull Memory
top_0 rld2_dk Device
rlafempty
rlrdfempty rld2_dk_n
rlwdfempty rld2_dm
apConfrd
burstlength rld2_dq
rldreaddata rld2_qvld
clkglob_tb rld2_qk
rsthard_tb
init_done_tb rld2_qk_n
User
Application apaddr
apvalid
apwritedvalid
apConfa
apConfwrd
apConfrd
apConfwr
aprdrden
apwritedata
apwritedm
issuemrs_tb
UG086_c6_06_071808
Figure 6-6: Top-Level Block Diagram of the RLDRAM II Design without a DCM or a Testbench
Architecture
The RLDRAM II memory controller processes the user commands to generate the
RLDRAM II interface signals. The RLDRAM II memory controller has a built-in
synthesizable testbench to generate all the RLDRAM commands. The built-in testbench
enables simulation and validation of the design in hardware. To interface with the user
application, the RLDRAM II memory controller must be separated from the built-in
testbench. MIG generates designs with and without a testbench. The following parameters
are selectable through the GUI: the type of the RLDRAM (SIO or CIO), the data width, the
burst length, multiplexed or non-multiplexed address, memory component, and other
configuration values.
The design can use any selected banks of the Virtex-4 FPGAs. It can use different banks or
the same banks for data, address, and control signals.
The HSTL_II_18 I/O standard is used for address, control, and data signals, and the
DIFF_HSTL_II_DCI_18 I/O standard is used for clock signals.
Similar to other DRAM architectures, the RLDRAM II requires its entire content to be
refreshed periodically. The AREF command initiates a refresh for the device and must be
used each time a refresh is required. The RLDRAM II memory controller has an option to
enable the execution of auto-refresh commands periodically. If this option is OFF, the user
has to provide the auto-refresh commands at regular intervals.
Implemented Features
This section provides details on the supported features of the RLDRAM II controller.
Address Multiplexing
The RLDRAM II memory controller supports multiplexed and non-multiplexed address
modes. Bit A5 of the Mode Register determines whether the address mode is multiplexed
(A5 = 1) or non-multiplexed (A5 = 0). In multiplexed address mode, the address is
provided to the RLDRAM II memory in two cycles, which are latched into the memory on
two consecutive rising clock edges. The advantage of this approach is a maximum of 11
address bits are required to control the RLDRAM II memory.
In multiplexed address mode, the controller outputs an 11-bit address. The user has to
properly connect the addresses to the RLDRAM II devices. Table 6-3 provides the address
mapping between the controller and the RLDRAM II devices for the multiplexed address
mode.
CIO/SIO
The RLDRAM II memory controller supports both CIO and SIO memory components. The
GUI provides an option to select the required memory components. The separate
RLDRAM I/O interface transfers two 18-bit or 9-bit data words per clock cycle at the I/O
balls. The read port has dedicated data outputs to support read operations, while the write
port has dedicated input balls to support write operations. Output data is referenced to the
free-running output data clock. This architecture eliminates the need for high-speed bus
turnarounds.
Architecture
• Previous ISE® software releases 8.2.03i and 9.1i had an issue with IDELAYCTRL block
replication or trimming. When using these revisions of the ISE software, the user must
instantiate and constrain the location of each IDELAYCTRL individually.
See UG070 [Ref 7] for more information on the requirements of IDELAYCTRL placement.
Memory Initialization
The RLDRAM II device must be powered up and initialized in a predefined manner. The
controller handles the initialization sequence as described in this section.
After all power supply and reference voltages are stable and the master clock (rld_ck and
rld_ck_n) is stable, the RLDRAM II device requires a 200 μs (minimum) delay prior to
applying an executable command. After the 200 μs (minimum) delay has passed, three
MODE REGISTER SET (MRS) commands are issued. For non-multiplexed addressing, two
dummy commands and one valid MRS command are issued. For multiplexed addressing,
four MODE REGISTER SET (MRS) commands are issued, consisting of two dummy
commands and two valid MRS commands.
Six clock cycles (tMRSC) after the valid MRS commands, eight AUTO REFRESH commands
are issued, one on each bank, separated by 2048 cycles.
Initialization is complete after tRC. The number of clock cycles (tRC) after auto refresh
depends on the Mode Register configuration parameter. The RLDRAM II memory
controller takes care of the tRC value for different configurations. The device is ready for
normal operation as indicated by the init_done outputs to the application.
Address
FIFO
RLDRAM II
Address, Write Control and Address,
Data, Data Data Signals Data,
and FIFO and
User Control Control RLDRAM II
Application (Physical Layer) SIO/CIO
(Synthesizable Read Memory
Test bench) Data Device
FIFO
Reset
Generator
Control
Logic
Clock
Generator
UG086_c6_09_092608
User Interface
The user interface of the RLDRAM II memory controller is a FIFO-based implementation.
Three FIFOs are used: an Address FIFO, a Write Data FIFO, and a Read Data FIFO.
FIFO generator v4.2 is used in this design. It can be generated using XCO files located in
the par folder or by using the FIFO generator tool of CORE Generator™ software. FIFO
generator v4.2 is used in the rld_mergedfifo, rld_rdfifo, and rld_wrfifo modules. The FIFO
has various threshold attributes whose offset values can be changed based on the
requirement in the XCO files provided in the par folder. Alternatively, the FIFO generator
can be regenerated using the FIFO generator tool of CORE Generator software with
various threshold offset values. For valid FIFO threshold offset values, refer to
DS317 [Ref 37].
Test Bench
The MIG tool generates two different RTL folders, example_design and user_design. The
example_design includes the synthesizable test bench, while user_design does not include
the test bench modules. The MIG test bench performs eight write commands and eight
read commands in an alternating fashion. The number of words in a write command
depends on the burst length. For a burst length of 4, the test bench writes a total of 32 data
words for all eight write commands (16 rise data words and 16 fall data words). For a burst
length of 8, the test bench writes a total of 64 data words. It writes the data pattern of 1EE,
011, 1AA, 055 in a sequence in which 1EE and 1AA are rise data words and 011 and 055
are fall data words for a 9-bit design. The falling edge data is the complement of the rising
edge data. For a burst length of 2, the data sequence for the first write command is 1EE,
011 and the data sequence for the second write command is 1AA, 055. For a burst length
of 4, the data pattern for the first write command is 1EE, 011, 1AA, 055, and the same
pattern is repeated for all the remaining write commands. For a burst length of 8, the data
pattern for the first write command is 1EE, 011, 1AA, 055, 1EE, 011, 1AA, 055, and the
same pattern is repeated for all the remaining write commands. This data pattern is
repeated in the same order based on the number of data words written. For data widths
greater than 9, the same data pattern is concatenated for the other bits. For a 36-bit design
and burst length of 4, the data pattern for the first write command is F77BBDDEE,
088442211, D56AB55AA, 2A954AA55.
Address generation logic generates eight different addresses for eight write commands.
The same eight address locations are repeated for the following eight read commands. The
read commands are performed at the same locations where the data is written. There are a
total of 16 different address locations for 16 write commands, and the same address
locations are generated for 16 read commands. Upon completion of a total of 32
commands, including both writes and reads (8 writes, 8 reads, 8 writes, 8 reads), address
generation rolls back to the first address of the first write command, and the same address
locations are repeated. The MIG test bench exercises only a certain memory area. The
address is formed such that all address bits are exercised. During writes, a new address is
generated for every burst operation on the column boundary.
During reads, comparison logic compares the read pattern with the pattern written, i.e., the
1EE, 011, 1AA, 055 pattern. For example, for a 9-bit design of burst length 4, the data
written for a single write command is 1EE, 011, 1AA, 055. During reads, the read pattern
is compared with the 1EE, 011, 1AA, 055 pattern. Based on a comparison of the data, a
PASS_FAIL status signal is generated. If the data read back is the same as the data written,
the PASS_FAIL signal value is 010, otherwise it is 100. A PASS_FAIL signal value of 001
indicates that the calibration process is not yet complete.
Architecture
Address FIFO
This FIFO serves as the buffer for the user interface to store addresses corresponding to the
read and write data as well as the user-controlled refreshes. All reads, writes, and user
refreshes are scheduled in this FIFO. This synchronous FIFO is 26 bits wide and 16 words
deep. Table 6-4 defines the configuration of the 26 bits.
Table 6-5: Write Data FIFO Bit Configuration for 36-bit Data Width
Bit Configuration Description
[73:72] Write Data Mask
[71:0] Write Data
Table 6-6: Read Data FIFO Bit Configuration for a 36-bit Data Width
Bit Configuration Description
[71:0] Read Data
Clock Generator
The clock generator module generates the FPGA clock and reset signals. When differential
clocking is used, sysclk_p, sysclk_n, clk200_p, and clk200_n signals appear. When single-
ended clocking is used, sysclk and idly_clk_200 signals appear. In addition, clocks are
available for design use and a 200 MHz clock is provided for the IDELAYCTRL primitive.
Differential and single-ended clocks are passed through global clock buffers before
connecting to a DCM. For differential clocking, the output of the sysclk_p/sysclk_n buffer
is single-ended and is provided to the DCM input. Likewise, for single-ended clocking,
sysclk is passed through a buffer and its output is provided to the DCM input. The outputs
of the DCM are clkglob (0° phase-shifted version of the input clock) and clk90 (90° phase-
shifted version of the input clock). After the DCM is locked, the design is in the reset state
for at least 25 clocks. The RLDRAM II controller works using these clocks.
Reset Generator
This block generates different reset signals. It also performs the initialization and
configuration (MRS) of the RLDRAM II memories.
Control Logic
The logic in this block controls NOP, READ, WRITE, and USER REFRESH operations with
the memories. The RLDRAM II memory controller is triggered with data in the Address
FIFO. Bit 24 of the Address FIFO discriminates between read and write commands. Bit 25
is the USER REFRESH command. If the auto refresh bit is ON, the controller generates the
AUTO REFRESH command periodically. The controller issues a read or a write grant only
when there is no user refresh request command or no pending internal refresh request. If
there is a pending refresh request, the RLDRAM II memory controller issues the read or the
write grant after the refresh is done.
Clocking Scheme
Figure 6-8, page 267 shows the clocking scheme for this design. Global and local clock
resources are used.
The global clock resources consist of a DCM, two BUFGs on DCM output clocks, and one
BUFG for clk_200. The local clock resources consist of regional I/O clock networks
(BUFIO). The global clock architecture is discussed in this section.
The MIG tool allows the user to customize the design such that the DCM is not included.
In this case, clkglob and clk90 must be supplied by the user.
Architecture
These clocks can be either single ended or differential. User can select single ended or
differential clock input option from MIG GUI. Differential clocks are connected to the
IBUFGDS and single ended clock is connected to IBUFG.
The system clock from the output of the IBUFGDS or the IBUFG is connected to the DCM
to generate the various clocks used by the memory interface logic.
The clk_200 output of the IBUFGDS or the IBUFG is connected to the BUFG. The output of
the BUFG is used for IDELAY IOB delay blocks for aligning read capture data.
The DCM generates three separate synchronous clocks for use in the design. This is shown
in Table 6-7 and Figure 6-8, page 267. The clock structure is same for both example design
and user design. For designs with out DCM instantiation, DCM and the BUFGs should be
instantiated at user end to generate the required clocks.
Notes:
1. See “User Interface Accesses,” page 271 for timing requirements and restrictions on the user interface
signals.
DCM
BUFG
GC I/O
SYSTEM CLK
CLK0 clkglob
CLKIN
UG086_c6_15_071808
Virtex-4 FPGA. The FPGA clock samples both the data and clock (for calibration) and the
data itself to capture it in the same clock domain. Refer to XAPP701 [Ref 18] for more
details.
Notes:
1. All user interface signal names are prepended with a controller number, for example, cntrl0_apWriteData. RLDRAM II devices
currently support only one controller. See “User Interface Accesses” for timing requirements and restrictions on the user interface
signals.
2. The number of address bits used depends on the density of the memory part. The controller ignores the unused bits, which can all
be tied High.
Write Interface
Figure 6-9 shows the user interface block diagram for write operations.
ctlafrden
rlaffull
User Interface
rlafempty
afa
apvalid
apaddr Address FIFO rlafempty Controller
(Distributed RAM)
16 x 26
apwritevalid
ctlwdfrden
apwritedata
apwritedm wdfd
Write Data FIFO
(Block RAM)
rlwdffull 16 x (2 * [Data Width + To Phy Layer
Data Mask Width]) wdfd
rlwdfempty
ug086_c6_13_120407
The following steps describe the architecture of the Address and Write Data FIFOs and
show how to perform a write burst operation to RLDRAM II from the user interface.
1. The user interface consists of an Address FIFO and a Write Data FIFO. These FIFOs are
constructed using the FIFO generator module of the CORE Generator software.
Address FIFO is a distributed RAM with 16 x 26 configuration. Data FIFO is a block
RAM, with a depth of 16 locations and width equal to two times the data width and
data mask width together.
2. The Common Address FIFO is used for both write and read commands, and comprises
a command part and an address part. Command bits discriminate between write and
read commands.
3. User interface data width apwritedata is twice that of the memory data width. For
every memory component there is a mask bit. For 9-bit memory width, the user
interface is 20 bits consisting of rising-edge data, falling-edge data, rising-edge mask
bit, and falling-edge mask bit.
4. For a 9-bit memory component with 72-bit data, the user interface data width
apwritedata is 144 bits, and the mask data apwritedm is 8 bits.
5. The user can initiate a write to memory by writing to the Address FIFO and the Write
Data FIFO when the FIFO Full flags are deasserted and after the init_done signal is
asserted. Status signal rlaffull is asserted when Address FIFO is full, and similarly
rlwdffull is asserted when Write Data FIFO is full.
6. Both the Address FIFO and Write Data FIFO Full flags are deasserted with power-on.
7. The user should assert the Address FIFO write-enable signal apvalid along with
address apaddr to store the write address and write command into the Address FIFO.
8. The user should assert the Data FIFO write-enable signal apwritedvalid along with
write data apwritedata and mask data apwritedm to store the write data and mask
data into the Write Data FIFO. The user should provide both rise and fall data together
for each write to the Data FIFO.
9. The controller reads the Address FIFO by issuing the ctlafrden signal. The controller
reads the Write Data FIFO by issuing the ctlwdfrden signal after the Address FIFO is
read. It decodes the command part after the Address FIFO is read.
CLK
rlWdfFull
rlafFull
apWriteDValid
apValid
ApAddr A0 A1 A2 A3
UG086_c6_10_012807
Figure 6-10: RLDRAM II Write Burst Timing Diagram (BL = 4), Four Bursts
10. The write command timing diagram in Figure 6-10 is derived from the MIG-generated
testbench. As shown (burst length of 4), each write to the Address FIFO must be
coupled with two writes to the Data FIFO. Similarly, for a burst length of 8, every write
to the Address FIFO must be coupled with four writes to the Data FIFO. Failure to
follow this rule can cause unpredictable behavior.
Note: The user can start filling the Write Data FIFO two clocks after the Address FIFO is
written, because there is a two-clock latency between the command fetch and reading the Data
FIFO. Using the terms shown in Figure 6-9, therefore, the user can assert the A0 address two
clocks before D0D1.
11. The write command timing diagram in Figure 6-11, page 274 is derived from the MIG-
generated testbench. As shown (burst length of 8), each write to the Address FIFO
must be coupled with four writes to the Data FIFO. Because the controller first reads
the address and command together, the address need not coincide with the last data.
After the command is analyzed (nearly two clocks later for a worst-case timing
scenario), the controller sequentially reads the data in four clocks. Thus, there are six
clocks from the time the address is read to the time the last data is read.
CLK
rlWdfFull
rlafFull
apWriteDValid
apValid
ApAddr A0 A1
UG086_c6_11_012807
Figure 6-11: RLDRAM II Write Burst Timing Diagram (BL = 8), Two Bursts
Read Interface
Figure 6-12 shows a block diagram of the read interface.
rlaffull ctlafrden
User Interface
rlafempty
afa
apvalid Address FIFO
(Distributed RAM)
apaddr 16 x 26 rlafempty Controller
The following steps describe the architecture of the Read Data FIFOs and show how to
perform a burst read operation from RLDRAM II from the user interface.
1. The read user interface consists of an Address FIFO and a Read Data FIFO. The
Address FIFO is common to both read and write operations. The Read Data FIFOs are
constructed using the CORE Generator FIFO generator module. The Read Data FIFO is
a Distributed RAM with depth of 16 locations and width equal to two times the
memory device width, consisting of rising-edge data and falling-edge data. For
example, for a 9-bit memory component, the Read Data FIFO configuration is 16 x 18.
MIG instantiates a number of Read Data FIFO modules depending on the QK signal
width of the design. For example, for 9-bit memory component and 72-bit data width
designs, MIG instantiates a total of nine Read Data FIFO modules.
2. The user can initiate a read to memory by writing to the Address FIFO when the FIFO
Full flag rlaffull is deasserted and after init_done is asserted.
3. To write the read address and read command into the Address FIFO, the user should
issue the Address FIFO write-enable signal apvalid along with read address apaddr.
4. The controller reads the Address FIFO containing the address and command. After
decoding the command, the controller generates the appropriate control signals to
memory.
5. Prior to the actual read and write commands, the design calibrates the latency (number
of clock cycles) from the time the read command is issued to the time data is received.
Using this pre-calibrated delay information, the controller generates the write-enable
signals to the Read Data FIFOs.
6. The rlrdfempty signal is deasserted when data is available in the Read Data FIFOs.
7. The user can read the read data from the Read Data FIFOs by asserting aprdfrden to
High.
CLK
rlRdfEmpty
rlafEmpty
apValid
apAddr A0 A1
apRdfRdEn
UG086_c6_12_012807
Figure 6-13: RLDRAM II Read Burst Timing Diagram (BL = 8), Two Bursts
8. Figure 6-13 shows the user interface timing diagram for a burst length of 8. The read
latency is calculated from the point when the read command is given by the user to the
point when the rlrdfempty signal is deasserted. The minimum latency in this case is
21 clocks. Where no auto-refresh request is pending, the user commands are issued
after initialization is completed, and the first command issued is a Read command.
The controller executes the commands only after initialization is done, as indicated by
the init_done signal.
9. After the address and command are loaded into the Address FIFO, it takes 21 clock
cycles minimum for the controller to deassert the rlrdfempty signal.
10. Read data is available only when the rlrdfempty signal is deasserted. The user can
access the read data by asserting the aprdfrden signal, a read enable signal to the Read
Data FIFOs, to High.
Note: The RLDRAM controller does not check the status of the Read Data FIFO, and can issue
read commands even when the Read Data FIFO is full. The user must make this determination and
ensure that read commands are not issued by the controller when the Read Data FIFO is full.
When the Address box is checked in a particular bank, the bank address, the address, the
WE_N, the REF_N, and the CS_N bits are assigned to that particular bank.
When the Data box is checked in a particular bank for a CIO design, the memory data, the
memory data mask, the memory data valid (QVLD), the memory read clock, the memory
write clock, the memory address, and the command clock bits are assigned to that
particular bank.
When the Data_Write box is checked in a particular bank for an SIO design, the memory
data write, the memory data mask, and the memory write clock bits are assigned to that
particular bank.
When the Data_Read box is checked in a particular bank for an SIO design, the memory
data read, the memory data valid (QVLD), the memory read clock, the memory address,
and the command clock bits are assigned to that particular bank.
When the System Control box is checked in a particular bank, the sysreset_n, the pass_fail,
and the Init_done bits are assigned to that particular bank.
When the System Clock box is checked in a particular bank, the sysclk_p, sysclk_n,
CLK200_p, and CLK200_n bits are assigned to that particular bank.
For special cases, such as without a testbench and without a DCM, the corresponding
input and output ports are not assigned to any FPGA pins in the design UCF because the
user can connect these ports to the FPGA pins or can connect to some logic internal to the
same FPGA.
Note: Timing has been verified for most of the MIG generated configurations. For the best timing
results, adjacent banks in the same column of the FPGA should be used. Banks that are separated
by unbonded banks should be avoided because these can cause timing violations.
Chapter 7
Feature Summary
This section summarizes the supported and unsupported features of the DDR SDRAM
controller design.
Supported Features
The DDR SDRAM controller design supports the following:
• Burst lengths of two, four, and eight
• CAS latencies of 2, 2.5, and 3
• Sequential and interleaved burst types
• Auto refresh
• Data mask enable/disable option
• System clock, differential and single-ended
• Linear addressing
• Spartan-3 FPGA maximum frequency:
• 133 MHz with a -4 speed grade device
• 166 MHz with a -5 speed grade device
• Spartan-3E FPGA maximum frequency:
• 133 MHz with a -4 speed grade device
• 166 MHz with a -5 speed grade device
• Spartan-3A, Spartan-3AN, and Spartan-3A DSP FPGA maximum frequency:
• 133 MHz with a -4 speed grade device
• 166 MHz with a -5 speed grade device
• Components, unbuffered DIMMs, registered DIMMs, and SODIMMs
• With and without a testbench
• With or without a DCM
• All Spartan-3, Spartan-3E, Spartan-3A, Spartan-3AN, and Spartan-3A DSP FPGAs
• Verilog and VHDL
• XST and Synplicity synthesis tools
Unsupported Features
Single burst of burst length two.
Controller Architecture
Controller Architecture
Xilinx FPGA
Control Layer
Physical Layer
Memories
ug086_c7_01_060910
Hierarchy
Figure 7-2 shows the hierarchical structure of the DDR SDRAM design generated by MIG
with a testbench and a DCM. In the figure, the physical and control layers are clearly
separated. MIG generates the entire controller, as shown in this hierarchy, including the
testbench. The user can replace the testbench with a design that makes use of the DDR
SDRAM interface.
<top_
module>
infrastructure_
main*
top*
test_
top* clk_dcm cal_top
bench*
data_ data_
data_ data_ infrastructure controller_
read_ path_
read* write* _iobs* iobs*
controller* iobs*
Design Modules
Test Bench Modules
Clocks, Reset Generation, and Calibration Modules
Note: A block with a * has a parameter file included.
UG086_c7_02_010108
Figure 7-2: Hierarchical Structure of the DDR SDRAM Design with a Testbench
Controller Architecture
For designs generated without a testbench, the testbench modules in Figure 7-2 are not
present in the design. In this case, the user interface signals appear in the <top_module>
module. The list of user interface signals is in Table 7-9.
The infrastructure_top module has the clock and the reset generation module of the
design. It instantiates a DCM in the module when selected by MIG. The differential design
clock is an input to this module. A user reset is also input to this module. Using the input
clocks and reset signals, system clocks and system reset are generated in this module
which is used in the design. Infrastructure_top also consists of calibration logic.
The DCM primitive is not instantiated in the infrastructure_top module if the Use DCM
option is unchecked. Therefore, the system operates on the user-provided clocks. The
system reset is generated in the infrastructure module using the dcm_lock input signal.
sys_rst
sys_clk sys_rst90 cntrl0_ddr_ras_n
System
Clocks sys_clkb sys_rst180 cntrl0_ddr_cas_n
and Reset infrastructure_top
reset_in_n clk90_0 cntrl0_ddr_we_n
clk_0 cntrl0_ddr_cs_n
cntrl0_ddr_cke
cntrl0_ddr_dm
main_0 Memory
cntrl0_ddr_ba Device
cntrl0_ddr_a
cntrl0_ddr_ck
cntrl0_ddr_ck_n
cntrl0_ddr_dqs
cntrl0_init_done cntrl0_ddr_dq
Status cntrl0_led_error_output1 cntrl0_ddr_reset_n
Signals
cntrl0_data_valid_out
UG086_c7_03_071608
Figure 7-3: MIG Output of the DDR SDRAM Controller Design with a DCM and a Testbench
Controller Architecture
Figure 7-4 shows a block diagram representation of the top-level module for a DDR
SDRAM design with a DCM but without a testbench. “Clocking Scheme,” page 294
describes how various clocks are generated using the DCM. The input clocks can be
differential or single-ended based on system clock selection in GUI.
For differential, differential clocks sys_clk and sys_clkb appear as input ports, whereas for
single-ended sys_clk_in appears as input port.
The DCM clock is instantiated in the infrastructure_top module that generates the required
design clocks. reset_in_n is the active-Low system reset signal. All design resets are gated
by the dcm_lock signal.
The user interface signals are listed in Figure 7-4. The design provides the clk_tb, clk90_tb,
sys_rst_tb, sys_rst90_tb, and sys_rst180_tb signals to the user in order to synchronize with
the design. The signals clk_tb, clk90_tb, sys_rst_tb, sys_rst90_tb, and sys_rst180_tb are
connected to clocks clk_0 and clk90_0 and reset signals sys_rst, sys_rst90, and sys_rst180,
respectively, in the controller. If the user clock domain is different from clk_tb/clk90_tb,
then the user should add FIFOs for all the inputs and outputs of the controller (user
application signals) in order to synchronize them to clk_tb/clk90_tb.
sys_rst
sys_clk sys_rst90
System cntrl0_ddr_ras_n
Clocks sys_clkb sys_rst180
infrastructure_top cntrl0_ddr_cas_n
and Reset reset_in_n clk90_0
cntrl0_ddr_we_n
clk_0
cntrl0_ddr_cs_n
cntrl0_ddr_cke
cntrl0_ddr_dm
cntrl0_burst_done top_0
cntrl0_ddr_ba Memory
cntrl0_user_command_register Device
cntrl0_ddr_a
cntrl0_user_data_mask
cntrl0_ddr_ck
cntrl0_user_input_data
cntrl0_ddr_ck_n
cntrl0_user_input_address
cntrl0_ddr_dqs
cntrl0_init_done
cntrl0_ddr_dq
cntrl0_ar_done
User cntrl0_ddr_reset_n
cntrl0_auto_ref_req
Interface
Signals cntrl0_user_cmd_ack
cntrl0_clk_tb
cntrl0_clk90_tb
cntrl0_sys_rst_tb
cntrl0_sys_rst90_tb
cntrl0_sys_rst180_tb
cntrl0_user_data_valid
cntrl0_user_output_data
UG086_c7_04_071608
Figure 7-4: MIG Output of the DDR SDRAM Controller Design with a DCM but without a Testbench
Figure 7-5 shows a block diagram representation of the top-level module for a DDR
SDRAM design without a DCM or a testbench. “Clocking Scheme,” page 294 describes
how various clocks are generated using the DCM. The user should provide all the clocks
and the dcm_lock signal. These clocks should be single-ended. reset_in_n is the active-Low
system reset signal. All design resets are gated by the dcm_lock signal.
The user interface signals are listed in Figure 7-5. The design provides the clk_tb, clk90_tb,
sys_rst_tb, sys_rst90_tb, and sys_rst180_tb signals to the user in order to synchronize with
the design. The signals clk_tb, clk90_tb, sys_rst_tb, sys_rst90_tb, and sys_rst180_tb are
connected to clocks clk_0 and clk90_0 and reset signals sys_rst, sys_rst90, and sys_rst180,
respectively, in the controller. If the user clock domain is different from clk_tb/clk90_tb,
then the user should add FIFOs for all the inputs and outputs of the controller (user
application signals) in order to synchronize them to clk_tb/clk90_tb.
clk_int
System sys_rst
Reset clk90_int
and User sys_rst90
dcm_lock infrastructure_top
DCM sys_rst180
Clocks reset_in_n
cntrl0_ddr_ras_n
cntrl0_ddr_cas_n
cntrl0_burst_done
cntrl0_ddr_we_n
cntrl0_user_command_register
cntrl0_ddr_cs_n
cntrl0_user_data_mask
cntrl0_ddr_cke
cntrl0_user_input_data
cntrl0_ddr_dm
cntrl0_user_input_address top_0 Memory
cntrl0_ddr_ba
cntrl0_init_done Device
cntrl0_ddr_a
cntrl0_ar_done
cntrl0_ddr_ck
User cntrl0_auto_ref_req
cntrl0_ddr_ck_n
Interface cntrl0_user_cmd_ack
Signals cntrl0_ddr_dqs
cntrl0_clk_tb
cntrl0_ddr_dq
cntrl0_clk90_tb
cntrl0_ddr_reset_n
cntrl0_sys_rst_tb
cntrl0_sys_rst90_tb
cntrl0_sys_rst180_tb
cntrl0_user_data_valid
cntrl0_user_output_data
UG086_c7_05_071608
Figure 7-5: MIG Output of the DDR SDRAM Controller Design without a DCM or a Testbench
Controller Architecture
Figure 7-6 shows a block diagram representation of the top-level module of a DDR
SDRAM design without a DCM but with a testbench. “Clocking Scheme,” page 294
describes how various clocks are generated using the DCM. The user should provide all
the clocks and the dcm_lock signal. These clocks should be single-ended. reset_in_n is the
active-Low system reset signal. All design resets are gated by the dcm_lock signal.
The cntrl0_led_error_output1 output signal indicates whether the test passes or fails. The
testbench module does writes and reads, and also compares the read data with the written
data. The cntrl0_led_error_output1 signal is driven High on data mismatches. The
cntrl0_data_valid_out signal indicates whether the read data is valid or not.
clk_int cntrl0_ddr_ras_n
System sys_rst180
Reset clk90_int cntrl0_ddr_cas_n
and User sys_rst90
reset_in_n infrastructure_top cntrl0_ddr_we_n
DCM sys_rst
Clocks dcm_lock cntrl0_ddr_cs_n
cntrl0_ddr_cke
cntrl0_ddr_dm
main_0 Memory
cntrl0_ddr_ba Device
cntrl0_ddr_a
cntrl0_led_error_output1 cntrl0_ddr_ck
Status
cntrl0_data_valid_out cntrl0_ddr_ck_n
Signals
cntrl0_init_done cntrl0_ddr_dqs
cntrl0_ddr_dq
cntrl0_ddr_reset_n
UG086_c7_06_071608
Figure 7-6: MIG Output of the DDR SDRAM Controller Design without a DCM but with a Testbench
All the memory device interface signals shown in Figure 7-3 through Figure 7-6 might not
necessarily appear for all designs generated from MIG. For example, the
cntrl0_ddr_reset_n port appears in the port list for Registered DIMM designs only.
Similarly, cntrl0_ddr_dm appears only for parts that have data mask signals. A few
RDIMMs do not have data mask, and cntrl0_ddr_dm does not appear in the port list for
these parts.
Figure 7-7 shows a detailed block diagram of the DDR SDRAM controller. All four blocks
shown are subblocks of the ddr1_top module. The functionality of these blocks is
explained in following sections.
cntrl0_ddr_ck
user_clk cntrl0_ddr_ck_n
cntrl0_ddr_cke
Infrastructure_top
cntrl0_ddr_dqs
Datapath cntrl0_ddr_dq
cntrl0_ddr_dm
user_data cntrl0_ddr_cs_n
IOBs
cntrl0_ddr_ba
cntrl0_ddr_a
user_command_register
cntrl0_ddr_ras_n
user_address Controller
cntrl0_ddr_cas_n
cntrl0_ddr_we_n
cntrl0_ddr_reset_n
UG086_c7_07_071608
Controller
The controller module accepts and decodes user commands and generates read, write, and
memory initialization commands. The controller also generates signals for other modules.
The memory is initialized and powered up using a defined process. The controller state
machine handles the initialization process upon receiving an initialization command.
Datapath
This module transmits and receives data to and from the memories. Major functions
include storing the read data and transferring write data and write enable to the IOBS
module. The data_read, data_write, data_path_IOBs, and data_read_controller modules
perform the actual read and write functions. For more information, refer to XAPP768c
[Ref 24].
Data Read
The data_read module contains the read datapaths for the DDR SDRAM interface. Details
for this module are described in XAPP768c [Ref 24].
Data Write
This module contains the write datapath for the DDR SDRAM interface. The write data
and write enable signals are forwarded together to the DDR SDRAM through IOB flip-
flops. The IOBs are implemented in the data_path_iobs module.
Controller Architecture
infrastructure_top
The infrastructure module generates the FPGA clock and reset signals. For differential
clocking, sys_clk and syc_clkb ports are used as inputs to the IBUFGDS_LVDS_25 buffer
and the output of the buffer is driven to the DCM input. For single-ended clocking, the
sys_clk_in port is used as an input to the IBUFG buffer; the output of the buffer is driven to
the DCM input. A DCM generates the clock and its inverted version. The infrastructure
module also generates all of the reset signals required for the design.
The calibration circuit is also implemented in this module. If there is no DCM, the clocks
are driven from the user interface.
IOBs
All input and output signals of the FPGA are implemented in the IOB registers.
Test Bench
MIG generates two different RTL folders, example_design and user_design. The
example_design includes the synthesizable test bench, while user_design does not include
the test bench modules. The MIG test bench performs five write commands and five read
commands in an alternating fashion. The number of words in a write command depends
on the burst length. For a burst length of 4, every write command writes four data words.
For all five write commands, the test bench writes a total of 20 data words (10 rise data
words and 10 fall data words). For a burst length of 8, the test bench writes a total of 40 data
words. The pattern data is shown in Table 7-2, Table 7-3, and Table 7-4 for burst lengths of
2, 4, and 8, respectively.
The falling edge data is the complement of the rising edge data. The data pattern is
repeated for the next set of five burst write commands based on the selected burst length,
as shown in Table 7-2, Table 7-3, and Table 7-4. This data pattern is repeated in the same
order based on the number of data words written. For data widths greater than 8, the same
data pattern is concatenated for the other bits. For a 32-bit design and a burst length of 8,
the data pattern for the first write command is 96969696, 69696969, 2C2C2C2C,
D3D3D3D3, 58585858, A7A7A7A7, B1B1B1B1, 4E4E4E4E.
Controller Architecture
For all five write commands, five different address locations are generated, as shown in
Table 7-5. Read commands read the data from the same locations where writes are
performed. The column address is incremented based on the burst length from one write
command to the next write command. The row address is the same for all five write
commands. For the next five write commands, the row address is incremented by 2, and
this continues for each subsequent group of five write commands. Only five bits are used
for row address generation. The row address rolls back to the initial value on reaching the
terminal value. The bank address is the same for all five write commands, but it gets
incremented for the next five write commands. This continues until the terminal count
value is reached, depending on whether the selected memory part has a 4- or 8-bank
architecture. The MIG test bench exercises only a certain memory area. Table 7-5 provides
the details of how the bank, row, and column address are incremented in the test bench.
During reads, the read data is compared with the pattern written. For example, for an 8-bit
data width and a burst length of 4, the write data for a single write command is 96, 69, 2C,
D3. During reads, the read pattern is compared with the 96, 69, 2C, D3 pattern. If the data
read back matches with the data written, the led_error_output1 signal is set to 0,
otherwise, it is set to 1 to indicate an error condition.
Clocking Scheme
Figure 7-8, page 295 shows the clocking scheme for this design. Global and local clock
resources are used.
The global clock resources consist of a DCM and two BUFGs. The local clock resources
consist of regional I/O clock networks (BUFIO). The global clock architecture is discussed
in this section.
The MIG tool allows the user to customize the design such that the DCM is not included.
In this case, clk_0 and clk90_0 must be supplied by the user.
Interface Signals
CLK_FB clk_90
CLK90
BUFG
ug086_c7_11_071608
Interface Signals
Table 7-7 lists the DDR SDRAM interface signals, directions, and descriptions to and from
DDR SDRAM controller. The signal direction is with respect to the DDR SDRAM
controller. Active-Low polarity is indicated with _n appended to the signal name. Table 7-7
is common for designs with and without testbenches. The signal cntrl0_ddr_reset_n is
present only for registered DIMMs.
Table 7-8 lists the DDR SDRAM clock, reset, and status signals for designs with and
without testbenches. Except for the contrl0_led_error_ouput1 signal, all other signals in
Table 7-8 are present in designs either with or without testbenches. The
contrl0_led_error_ouput1 signal is present only in designs with a testbench.
Table 7-9 describes the DDR SDRAM controller user interface signals used between the
ddr1_top (design top-level module) and user application modules in designs without a
testbench. These signals are buried one level down the hierarchy from memory interface
top for with testbench design.
Interface Signals
Table 7-9: DDR SDRAM Controller User Interface Signals (without a Testbench)
Signal Names Direction (1) Description
This bus is the write data to the DDR SDRAM from the user
interface, where n is the width of the DDR SDRAM data bus.
The DDR SDRAM controller converts single data rate to double
cntrl0_user_input_data[(2n–1):0] Input
data rate on the physical layer side. The data is valid on the
DDR SDRAM write command. In 2n, the MSB is rising-edge
data and the LSB is falling-edge data.
This bus is the data mask for write data. Like user_input_data,
it is twice the size of the data mask bus at memory, where m is
cntrl0_user_data_mask[(2m–1):0] Input the size of the data mask at the memory interface. In 2m, the
MSB applies to rising-edge data and the LSB applies to falling-
edge data.
This bus is the DDR SDRAM row, column, and bank address.
This bus is the combination of row, column, and bank addresses
cntrl0_user_input_address for DDR SDRAM writes and reads. For example, for a given
[(ROW_ADDRESS + memory if row_address = 13, column_address = 11,
Input bank_address = 2, and the user_input_address = 26, then:
COLUMN_ADDRESS +
BANK_ADDRESS –1):0] (2) • Bank Address from the user interface = A[1:0]
• Column Address from the user interface = A[12:2]
• Row Address part from the user interface = A[25:13]
Supported user commands for the DDR SDRAM controller:
Table 7-9: DDR SDRAM Controller User Interface Signals (without a Testbench) (Cont’d)
Signal Names Direction (1) Description
This is the acknowledgement signal for a user read or write
command. It is asserted by the DDR SDRAM controller during
cntrl0_user_cmd_ack Output a write or read to/from the DDR SDRAM. The user should not
issue any new commands to the controller until this signal is
deasserted.
The DDR SDRAM controller asserts this signal to indicate that
cntrl0_init_done Output
the DDR SDRAM initialization is complete.
This signal is asserted based on the frequency. For example, for
a frequency of 166 MHz, the signal is asserted every ~7.6 µs
until the controller issues an auto-refresh command to the
memory. Upon seeing this signal, the user should terminate
any ongoing command after completion of the current burst
cycle by asserting the cntrl0_burst_done signal. To ensure
cntrl0_auto_ref_req (3) Output
reliable operation, users should terminate the current
command within 15 to 20 clock cycles after cntrl0_auto_ref_req
is asserted. The frequency with which this signal is asserted is
determined by the MAX_REF_CNT value in the parameter file.
The MAX_REF_CNT value is set in the parameter file based on
the frequency selected from the tool.
This indicates that the auto-refresh command was completed to
DDR SDRAM. The DDR SDRAM controller asserts this signal
for one clock after giving an auto-refresh command to the DDR
SDRAM and completion of TRFC time. The TRFC time is
determined by the rfc_count_value in the parameter file. TRFC
cntrl0_ar_done (3) Output
is the minimum time required for the DDR SDRAM to
complete the refresh command. The Refresh command is
completed only after the assertion of the cntrl0_ar_done signal.
The user can assert the next command any time after the
assertion of the cntrl0_ar_done signal.
Notes:
1. All of the signal directions are with respect to the DDR SDRAM controller.
2. Linear addressing is used, i.e., the row address immediately follows the column address bits, and the bank address follows the row
address bits, thus supporting more devices. The number of address bits used depends on the density of the memory part. The
controller ignores the unused bits, which can all be tied High.
3. For more information on auto refresh refer to “Auto Refresh,” page 303.
Resource Utilization
Resource Utilization
A local inversion clocking technique is used in this design. The DCM generates only clk0
and clk90. One DCM and two BUFGMUXs are used. The Spartan-3 generation FPGA
designs operate at 166 MHz and below.
clk0
sys_rst180
1 3
init_done 2
UG086_c7_08_080409
1. When the 200 μs timer expires, all the system resets (sys_rst, sys_rst90, and sys_rst180)
go down. After sys_rst180 is deasserted, the user can place the initialization command
on user_command_register[2:0] on a falling edge of clk0 for one clock cycle. This starts
the initialization sequence.
2. The DDR SDRAM controller indicates that the initialization is complete by asserting
the init_done signal on a falling edge of clk0. The init_done signal is asserted
throughout the period.
3. After init_done is asserted, the user can pass the next command at any time.
Write
Figure 7-10 shows the timing diagram for a write to DDR SDRAM with a burst length of
four. The user initiates the write command by sending a Write command to the DDR
SDRAM controller. To terminate a write burst, the user asserts the burst_done signal for
two clocks after the last user_input_address. For a burst length of two, the burst_done
signal should be asserted for one clock. For a burst length of four, the burst_done signal
should be asserted for two clocks. For a burst length of eight, the burst_done signal should
be asserted for four clock cycles.
The write command is asserted on the falling edge of clk0. In response to a write
command, the DDR SDRAM controller acknowledges with the usr_cmd_ack signal on a
falling edge of clk0. The usr_cmd_ack signal is generated in the next clock after the write
command is asserted, if the controller is not busy. If there is an ongoing refresh command,
the usr_cmd_ack signal is asserted after completion of the refresh command. The user
asserts the first address (row + column + bank address) with the write command and
keeps it asserted for three clocks after usr_cmd_ack assertion. Any subsequent write
addresses are asserted on an alternate falling edge of clk0 after deasserting the first
memory address. For a burst length of two, subsequent addresses are asserted on each
clock cycle, and for a burst length of eight, subsequent addresses are asserted once every
four clock cycles. The first user data is asserted on a rising edge of clk90 after usr_cmd_ack
is asserted. As the SDR data is converted to DDR data, the width of this bus is 2n, where n
is data width of DDR SDRAM data bus.
For a burst length of four, only two data words (each of 2n) are given to the DDR SDRAM
controller for each user address. For a burst length of two, one data word is passed for each
burst. For a burst length of eight, four data words are passed for each burst. Internally, for
Burst Length = 4, the DDR SDRAM controller converts into four data words, each of n bits.
To terminate the write burst, the user asserts burst_done on a falling edge of clk0 for two
clocks. The burst_done signal is asserted after the last memory address. Any further
commands to the DDR SDRAM controller are given only after the usr_cmd_ack signal is
deasserted. After burst_done is asserted, the controller terminates the burst and issues a
precharge to the memory. The usr_cmd_ack signal is deasserted after completion of the
precharge.
Resource Utilization
clk0
clk90
1
6
2
user_cmd_ack 7
3 Clks 4
burst_done 5
3 UG086_c7_09_010108
Figure 7-10: DDR SDRAM Write Burst, Burst Lengths of Four and Two Bursts
1. A memory write is initiated by issuing a write command to the DDR SDRAM
controller. The write command must be asserted on a falling edge of clk0.
2. The DDR SDRAM controller acknowledges the write command by asserting the
user_cmd_ack signal on a falling edge of clk0. The earliest this signal is asserted is one
clock after the command. The maximum number of clock cycles it takes to assert
cmd_ack signal depends on the refresh period.
3. The first user_input_address must be placed along with the command. The input data
is asserted with the clk90 signal after the user_cmd_ack signal is asserted.
4. The user asserts the first address (row + column + bank address) with the write
command and keeps it asserted for three clocks after usr_cmd_ack assertion. The
user_input_address signal is asserted on a falling edge of clk0. All subsequent
addresses are asserted on alternate falling edges of clk0 for burst lengths of four, on
each clock for burst lengths of two, and once in four clocks for burst lengths of eight.
5. To terminate the write burst, burst_done is asserted after the last user_input_address.
The burst_done signal is asserted for two clock cycles with respect to the falling edge
of clk0 for burst lengths of four.
6. The user command is deasserted after burst_done is asserted.
7. The controller deasserts the user_cmd_ack signal after completion of precharge to the
memory. The next command must be given only after user_cmd_ack is deasserted.
Back-to-back write operations are supported only within the same bank and row.
Read
The user initiates a memory read with a read command to the DDR SDRAM controller.
Figure 7-11 shows the memory read timing diagram for a burst length of four.
clk0
clk90
1
2 7
user_cmd_ack 3 8
3 Clks
user_data_valid 5
4 UG086_c7_10_022108
Figure 7-11: DDR SDRAM Read, Burst Lengths of Four and Two Bursts
The user provides the first memory address with the read command, and subsequent
memory addresses upon receiving the usr_cmd_ack signal. Data is available on the user
data bus with the user_data_valid signal. To terminate read burst, the user asserts the
burst_done signal on a falling edge of clk0 for two clocks with the deassertion of the last
user_input_address. The burst_done signal is asserted for one clock for burst lengths of
two, two clocks for burst lengths of four, and four clocks for burst lengths of eight. The
controller does not support single burst read operation for burst length of two.
The read command flow is similar to the write command flow.
1. A memory read is initiated by issuing a read command to the DDR SDRAM controller.
The read command is accepted on a falling edge of clk0.
2. The first read address must be placed along with the read command. In response to the
read command, the DDR SDRAM controller asserts the user_cmd_ack signal on a
falling edge of clk0. The usr_cmd_ack signal is asserted a minimum of one clock cycle
after the read command is asserted. This signal is delayed if there is an ongoing refresh
cycle, in which case it is asserted after the current refresh command completes.
3. The user asserts the first address (row + column + bank address) with the read
command and keeps it asserted for three clocks after usr_cmd_ack is asserted. The
user_input_address signal is then accepted on the falling edge of clk0. All subsequent
memory read addresses are asserted on alternate falling edges of clk0 for burst lengths
of four. The subsequent addresses are changed on every clock for burst lengths of two,
on alternate clocks for burst lengths of four, and once in four clocks for burst lengths of
eight.
4. The data on user_output_data is valid only when the user_data_valid signal is
asserted.
5. The data read from the DDR SDRAM is available on user_output_data, which is
asserted with clk90. Because the DDR SDRAM data is converted to SDR data, the
width of this bus is 2n, where n is the data width of the DDR SDRAMs. For a read burst
length of four, the DDR SDRAM controller outputs only two data words with each
user address. For a burst length of two, the controller outputs one data word, and for
a burst length of eight, the controller outputs four data words.
6. To terminate the read burst, burst_done is asserted for two clocks on the falling edge of
clk0. The burst_done signal is asserted after the last memory address.
Resource Utilization
Auto Refresh
The DDR SDRAM controller does a memory refresh at intervals determined by the
frequency. For example, for a frequency of 166 MHz, an auto-refresh request is raised every
~7.6 µs. The user must terminate any ongoing commands within 20 clock cycles, when
auto_ref_req flag is asserted. The user must assert the burst_done signal at the end of the
current burst transaction when sensing the auto_ref_req flag for terminating the current
transaction. The auto_ref_req flag is asserted until the controller issues a refresh command
to the memory. The user must wait for completion of the auto-refresh command before
giving any commands to the controller when auto_ref_req is asserted.
The ar_done signal is asserted by the controller on completion of the auto-refresh
command—i.e., after TRFC time. The ar_done signal is asserted with clk180 for one clock
cycle.
The controller sets the MAX_REF_CNT value in the parameter file according to the
frequency selected for a refresh interval (7.7 µs). The rfc_count_value value in the
parameter file defines TRFC, the time between the refresh command to Active or another
refresh command.
After completion of the auto-refresh command, the next command can be given any time
after ar_done is asserted.
The current testbench generates five consecutive write bursts followed by five read burst
commands. For every group of five write/read commands, the controller issues an active
command followed by five write/read commands, and then a precharge command to the
memory. All five burst commands take up a maximum of 20 clock cycles. After every
precharge command, the controller state machine goes to an idle state and checks for an
Load Mode
MIG does not support the user LOAD MODE command. The mode register values from
the parameter file are loaded into the Load Mode register during initialization.
Resource Utilization
UCF Constraints
Some constraints are required to successfully create the design. The following examples
explain the different constraints in the UCF.
The I/O standards for all the memory interface signals are required to be specified.
MAXDELAY Constraints
The MAXDELAY constraints define the maximum allowable delay on the net. Following
are the list of MAXDELAY constraints used in Spartan FPGA designs in the UCF on
different nets. The values provided here vary depending on FPGA family and the device
type. Some values are dependent on frequency. The constraints shown here are from
example_design. The hierarchy paths of the nets are different between
example_design and user_design.
NET "infrastructure_top0/cal_top0/tap_dly0/tap[7]" MAXDELAY = 350ps;
NET "infrastructure_top0/cal_top0/tap_dly0/tap[15]" MAXDELAY = 350ps;
NET "infrastructure_top0/cal_top0/tap_dly0/tap[23]" MAXDELAY = 350ps;
These constraints are used to minimize the tap delay inverter connection wire length. This
delay should be minimized to calibrate the delay of a tap (LUT element) accurately. These
values are independent of frequency and vary from family to family and device to device.
Without these constraints, the tool might synthesize longer routes between the tap
connections. Inappropriate delays in this circuit could cause the design to fail in hardware.
NET "main_00/top0/dqs_int_delay_in*" MAXDELAY = 675ps;
This constraint is used for the DQS nets from the I/O pad to the input of the LUT delay
chain. Without this constraint, the nets take unpredictable delays that affect the Data Valid
window. In Spartan-3 generation FPGA designs, data is latched using the DQS signal. In
order to latch the correct data, DQS is delayed using LUT delay elements to center-align
with respect to the input read data. Incorrect data could be latched if the delays on this net
are unpredictable. Unpredictable delays might also cause the design to have intermittent
failures, which are difficult to debug in hardware.
NET "main_00/top0/dqs_div_rst" MAXDELAY = 460ps;
The net dqs_div_rst is the loopback signal. This signal is used as an enable for read data
FIFOs and FIFO write pointers after it is delayed using the LUT delay elements. The
overall delay on this net should be comparable with the delay on the DQS signal. This net
is constrained to control the overall delay. Both the dqs_div_rst and DQS signals take
similar paths. If the delay on the dqs_div_rst signal is higher, the first read data from
memory might be missed.
NET
"main_00/top0/data_path0/data_read_controller0/gen_delay*dqs_delay_col
*/delay*" MAXDELAY = 140ps;
NET
"main_00/top0/data_path0/data_read_controller0/rst_dqs_div_delayed/
delay*" MAXDELAY = 140 ps;
These constraints are required to minimize the wire delays between the LUT elements of a
LUT delay chain that is used to delay the DQS and rst_dqs_div loopback signal. Higher
wire delays between LUT delay elements can shift the data valid window, which in turn
can cause incorrect data to be latched. Therefore, the MAXDELAY constraint is required for
these nets.
NET "main_00/top0/data_path0/data_read_controller0/rst_dqs_div"
MAXDELAY = 3383 ps;
NET "main_00/top0/data_path0/data_read0/fifo*_wr_en*"
MAXDELAY = 3007ps;
These constraints are required because these paths are not constrained otherwise. The total
delay on the rst_dqs_div and fifo_wr_en nets must not exceed the clock period. The total
delay on both the nets is set to 85% of the clock period, leaving 15% as margin. These
delays vary with frequency.
NET "main_00/top0/data_path0/data_read0/fifo*_wr_addr[*]"
MAXDELAY = 5610ps;
The MAXDELAY constraint is required on FIFO write address because this path is not
constrained otherwise. This is a single clock cycle path. It is set to 80% of the clock period,
leaving 20% as margin because this net generally meets the required constraint.
Design Notes
Design Notes
Supported Devices
This section provides tables for the memory components supported by Spartan-3,
Spartan-3A, Spartan-3AN, Spartan-3A DSP, and Spartan-3E devices.
The design generated out of MIG is independent of memory speed grade, hence the
package part of the memory component is replaced with X, where X indicates a don't care
condition. Pin mapping for x4 RDIMMs is provided in Appendix G, “Low Power
Options.”
The tables below list the components (Table 7-12) and DIMMs (Table 7-13 through
Table 7-15) supported by the tool for Spartan-3 FPGA DDR local clocking designs.
Table 7-12: Supported Components for DDR SDRAM Local Clocking
(Spartan-3 FPGAs)
Components Packages (XX) Components Packages (XX)
MT46V32M4XX-5B - MT46V32M4XX-75 P,TG
MT46V64M4XX-5B BG,FG,P,TG MT46V64M4XX-75 FG,P,TG
MT46V128M4XX-5B BN,FN,P,TG MT46V128M4XX-75 BN,FN,P,TG
MT46V256M4XX-5B P,TG MT46V256M4XX-75 P,TG
MT46V16M8XX-5B TG,P MT46V16M8XX-75 P,TG
MT46V32M8XX-5B BG,FG,P,TG MT46V32M8XX-75 FG,P,TG
MT46V64M8XX-5B BN,FN,P,TG MT46V64M8XX-75 BN,FN,P,TG
MT46V128M8XX-5B - MT46V128M8XX-75 P,TG
MT46V8M16XX-5B TG,P MT46V8M16XX-75 P,TG
MT46V16M16XX-5B BG,FG,P,TG MT46V16M16XX-75 BG,FG,P,TG
MT46V32M16XX-5B BN,FN,P,TG MT46V32M16XX-75 -
MT46V64M16XX-5B - MT46V64M16XX-75 P,TG
Table 7-13: Supported Unbuffered DIMMs for DDR SDRAM Local Clocking
(Spartan-3 FPGAs)
Unbuffered DIMMs Packages (X) Unbuffered DIMMs Packages (X)
MT4VDDT1664AX-40B G,Y MT8VDDT3264AX-40B G,Y
MT4VDDT3264AX-40B G,Y MT9VDDT3272AX-40B Y
Table 7-14: Supported Registered DIMMs for DDR SDRAM Local Clocking
(Spartan-3 FPGAs)
Registered DIMMs Packages (X) Registered DIMMs Packages (X)
MT9VDDF3272X-40B G,Y MT18VDDF3272X-40B G,Y
MT9VDDF3272X-40B G,Y MT18VDDF12872X-40B G,Y
Table 7-15: Supported SODIMMs for DDR SDRAM Local Clocking (Spartan-3 FPGAs)
SODIMMs Packages (X) SODIMMs Packages (X)
MT4VDDT3264HX-40B G,Y MT9VDDT3272HX-40B
MT4VDDT1664HX-40B Y MT9VDDT6472HX-40B G,Y
MT8VDDT3264HX-40B - MT9VDDT12872HX-40B -
MT8VDDT6464HX-40B G,Y
The tables below list the components (Table 7-16) and DIMMs (Table 7-17 through
Table 7-19, page 309) supported by the tool for Spartan-3A/3AN FPGA DDR local clocking
designs.
Supported Devices
Table 7-17: Supported Unbuffered DIMMs for DDR SDRAM Local Clocking
(Spartan-3A/3AN FPGAs)
Unbuffered DIMMs Packages (X) Unbuffered DIMMs Packages (X)
MT4VDDT1664AX-40B G,Y MT8VDDT3264AX-40B G,Y
MT4VDDT3264AX-40B G,Y MT9VDDT3272AX-40B Y
Table 7-18: Supported Registered DIMMs for DDR SDRAM Local Clocking
(Spartan-3A/3AN FPGAs)
Registered DIMMs Packages (X) Registered DIMMs Packages (X)
MT9VDDF3272X-40B G,Y MT9VDDF3272X-40B G,Y
The tables below list the components (Table 7-20) and DIMMs (Table 7-21 and Table 7-22)
supported by the tool for Spartan-3A DSP FPGA DDR local clocking designs.
Table 7-20: Supported Components for DDR SDRAM Local Clocking
(Spartan-3A DSP FPGAs)
Components Packages (XX) Components Packages (XX)
MT46V32M4XX-5B - MT46V32M4XX-75 P,TG
MT46V64M4XX-5B BG,FG,P,TG MT46V64M4XX-75 FG,P,TG
Table 7-21: Supported Unbuffered DIMMs for DDR SDRAM Local Clocking
(Spartan-3A DSP FPGAs)
Unbuffered DIMMs Packages (X) Unbuffered DIMMs Packages (X)
MT4VDDT1664AX-40B G,Y MT8VDDT3264AX-40B G,Y
MT4VDDT3264AX-40B G,Y
Table 7-23 lists the components supported by the tool for Spartan-3E FPGA DDR local
clocking designs.
Table 7-23: Supported Components for DDR SDRAM Local Clocking
(Spartan-3E FPGAs)
Components Packages (XX) Components Packages (XX)
MT46V32M4XX-5B - MT46V32M4XX-75 P,TG
MT46V64M4XX-5B BG,FG,P,TG MT46V64M4XX-75 FG,P,TG
MT46V128M4XX-5B BN,FN,P,TG MT46V128M4XX-75 BN,FN,P,TG
MT46V256M4XX-5B P,TG MT46V256M4XX-75 P,TG
MT46V16M8XX-5B TG,P MT46V16M8XX-75 P,TG
MT46V32M8XX-5B BG,FG,P,TG MT46V32M8XX-75 FG,P,TG
MT46V64M8XX-5B BN,FN,P,TG MT46V64M8XX-75 BN,FN,P,TG
MT46V128M8XX-5B - MT46V128M8XX-75 P,TG
MT46V8M16XX-5B TG,P MT46V8M16XX-75 P,TG
MT46V16M16XX-5B BG,FG,P,TG MT46V16M16XX-75 BG,FG,P,TG
Table 7-24: Hardware Tested Configurations for Spartan-3 FPGA DDR SDRAM
Designs
Synthesis Tools XST
HDL Verilog and VHDL
FPGA Device XC3S1500FG676-5
Burst Lengths 2 and 8
CAS Latency (CL) 2 and 2.5
64-bit Design Tested on 16-bit Component “MT46V16M16XX-75”
64-bit DIMM “MT4VDDT3264AX”
Table 7-24: Hardware Tested Configurations for Spartan-3 FPGA DDR SDRAM
Designs
Synthesis Tools XST
Frequency Range 67 MHz to 170 MHz for CL = 2
40 MHz to 190 MHz for CL = 2.5
Table 7-25: Hardware Tested Configurations for Spartan-3E FPGA DDR SDRAM
Designs
Synthesis Tools XST
HDL Verilog and VHDL
FPGA Device XC3S500EFG320-4
Burst Lengths 2 and 4
CAS Latency (CL) 2 and 2.5
16-bit Design Tested on 16-bit Component “MT46V32M16XX-6T”
Frequency Range 80 MHz to 170 MHz for CL = 2
80 MHz to 170 MHz for CL = 2.5
Chapter 8
Feature Summary
This section summarizes the supported and unsupported features of the DDR2 SDRAM
controller design.
Supported Features
The DDR2 SDRAM controller design supports the following:
• Burst lengths of four and eight
• Sequential and interleaved burst types
• CAS latency of 3
• Auto refresh
• Data mask enable/disable option
• System clock, differential and single-ended
• Linear addressing
• Spartan-3 FPGA maximum frequency:
• 133 MHz with a -4 speed grade device
• 166 MHz with a -5 speed grade device
• Spartan-3A, Spartan-3AN, and Spartan-3A DSP FPGA maximum frequency:
• 133 MHz with a -4 speed grade device
• 166 MHz with a -5 speed grade device
• Components, unbuffered DIMMs, and registered DIMMs
• Verilog and VHDL
• XST and Synplicity synthesis tools
• With and without a testbench
• With or without a DCM
Unsupported Features
The DDR2 SDRAM controller design does not support:
• CAS Latencies of 4 and 5
• Additive latencies of 1, 2, 3 and 4
• Auto Precharge
• Redundant DQS (RDQS)
• Dual rank DIMMs and Deep design
Controller Architecture
Controller Architecture
Xilinx FPGA
Control Layer
Physical Layer
Memories
ug086_c8_01_012907
Hierarchy
Figure 8-2 shows the hierarchical structure of the DDR2 SDRAM design generated by MIG
with a testbench and a DCM. In the figure, the physical and control layers are clearly
separated. MIG generates the entire controller, as shown in this hierarchy, including the
testbench. The user can replace the testbench with a design that makes use of the DDR2
SDRAM interface.
<top_
module>
infrastructure_
main*
top*
test_
top* clk_dcm cal_top
bench*
data_ data_
data_ data_ infrastructure controller_
read_ path_
read* write* _iobs* iobs*
controller* iobs*
Design Modules
Test Bench Modules
Clocks, Reset Generation, and Calibration Modules
Note: A block with a * has a parameter file included.
UG086_c8_02_010108
Controller Architecture
For a design without a testbench (user_design), the yellow shaded modules in Figure 8-2
are not present in the design. The <top_module> module has the user interface signals for
designs without a testbench. The list of user interface signals is provided in Table 8-8.
The infrastructure_top module comprises the clock and the reset generation module of the
design. It instantiates a DCM in the module when selected by MIG. The differential design
clock is an input to this module. A user reset is also input to this module. Using the input
clocks and reset signals, system clocks and system reset are generated in this module
which is used in the design. Infrastructure_top also consists of calibration logic.
The DCM primitive is not instantiated in this module if the Use DCM option is unchecked.
Therefore, the system operates on the user-provided clocks. The system reset is generated
in the infrastructure_top module using the dcm_lock input signal. Figure 8-3 and
Figure 8-4, page 319 represent the system clock for differential only.
sys_rst
cntrl0_ddr2_odt
sys_clk sys_rst90
System cntrl0_ddr2_ras_n
sys_clkb sys_rst180
Clocks cntrl0_ddr2_cas_n
and Reset infrastructure_top clk90_0
reset_in_n cntrl0_ddr2_we_n
clk0_0
cntrl0_ddr2_cs_n
cntrl0_ddr2_cke
cntrl0_ddr2_dm
main_0 Memory
cntrl0_ddr2_ba
Device
cntrl0_ddr2_a
cntrl0_ddr2_ck
cntrl0_ddr2_ck_n
cntrl0_ddr2_dqs
cntrl0_init_done
Status cntrl0_ddr2_dq
cntrl0_led_error_output1
Signals cntrl0_ddr2_dqs_n
cntrl0_data_valid_out
cntrl0_ddr2_reset_n
UG086_c8_03_090208
Figure 8-3: MIG Output of the DDR2 SDRAM Controller Design with a DCM and a Testbench
Controller Architecture
Figure 8-4 shows a block diagram representation of the top-level module for a DDR2
SDRAM design with a DCM but without a testbench. “Clocking Scheme,” page 325
describes how various clocks are generated using the DCM. The input clocks can be
differential or single-ended based on the System Clock selection in the GUI.
For differential, differential clocks sys_clk and sys_clkb appear as input ports, whereas for
single-ended sys_clk_in appears as the input port. The DCM clock is instantiated in the
infrastructure_top module that generates the required design clocks. reset_in_n is the
active-Low system reset signal. All design resets are gated by the dcm_lock signal.
The user interface signals are listed in Figure 8-4. The design provides the clk_tb, clk90_tb,
sys_rst_tb, sys_rst90_tb, and sys_rst180_tb signals to the user in order to synchronize with
the design. The signals clk_tb, clk90_tb, sys_rst_tb, sys_rst90_tb, and sys_rst180_tb are
connected to clocks clk_0 and clk90_0 and reset signals sys_rst, sys_rst90, and sys_rst180,
respectively, in the controller. If the user clock domain is different from clk_tb/clk90_tb,
then the user should add FIFOs for all the inputs and outputs of the controller (user
application signals) in order to synchronize them to clk_tb/clk90_tb.
sys_rst
sys_clk sys_rst90
System
Clocks sys_clkb sys_rst180
infrastructure_top
and Reset reset_in_n clk90_0
clk_0
cntrl0_ddr2_ras_n
cntrl0_ddr2_cas_n
cntrl0_burst_done cntrl0_ddr2_we_n
cntrl0_user_command_register cntrl0_ddr2_cs_n
cntrl0_user_data_mask cntrl0_ddr2_cke
cntrl0_user_input_data cntrl0_ddr2_dm
cntrl0_user_input_address top_0 Memory
cntrl0_ddr2_ba
cntrl0_init_done Device
cntrl0_ddr2_a
cntrl0_ar_done cntrl0_ddr2_ck
User cntrl0_auto_ref_req
Interface cntrl0_ddr2_ck_n
Signals cntrl0_user_cmd_ack cntrl0_ddr2_dqs
cntrl0_clk_tb
cntrl0_ddr2_dq
cntrl0_clk90_tb
cntrl0_ddr2_dqs_n
cntrl0_sys_rst_tb
cntrl0_ddr2_reset_n
cntrl0_sys_rst90_tb
cntrl0_sys_rst180_tb
cntrl0_user_data_valid
cntrl0_user_output_data
UG086_c8_04_071708
Figure 8-4: MIG Output of the DDR2 SDRAM Controller Design with a DCM but without a Testbench
Figure 8-5 shows a block diagram representation of the top-level module for a DDR2
SDRAM design without a DCM or a testbench. “Clocking Scheme,” page 325 describes
how various clocks are generated using the DCM. The user should provide all the clocks
and the dcm_lock signal. These clocks should be single-ended. reset_in_n is the active-Low
system reset signal. All design resets are gated by the dcm_lock signal.
The user interface signals are listed in Figure 8-5. The design provides the clk_tb, clk90_tb,
sys_rst_tb, sys_rst90_tb, and sys_rst180_tb signals to the user in order to synchronize with
the design. The signals clk_tb, clk90_tb, sys_rst_tb, sys_rst90_tb, and sys_rst180_tb are
connected to clocks clk_0 and clk90_0 and reset signals sys_rst, sys_rst90, and sys_rst180,
respectively, in the controller. If the user clock domain is different from clk_tb/clk90_tb,
then user should add FIFOs for all the inputs and outputs of the controller (user
application signals) in order to synchronize them to clk_tb/clk90_tb.
clk_int
System sys_rst
Reset clk90_int
and User sys_rst90
dcm_lock infrastructure_top
DCM sys_rst180
Clocks reset_in_n
cntrl0_ddr2_odt
cntrl0_ddr2_ras_n
cntrl0_burst_done cntrl0_ddr2_cas_n
cntrl0_user_command_register cntrl0_ddr2_we_n
cntrl0_user_data_mask cntrl0_ddr2_cs_n
cntrl0_user_input_data cntrl0_ddr2_cke
cntrl0_user_input_address top_0 cntrl0_ddr2_dm
cntrl0_init_done Memory
cntrl0_ddr2_ba
Device
cntrl0_ar_done cntrl0_ddr2_a
User cntrl0_auto_ref_req cntrl0_ddr2_ck
Interface cntrl0_user_cmd_ack cntrl0_ddr2_ck_n
Signals
cntrl0_clk_tb cntrl0_ddr2_dqs
cntrl0_clk90_tb cntrl0_ddr2_dq
cntrl0_sys_rst_tb cntrl0_ddr2_dqs_n
cntrl0_sys_rst90_tb cntrl0_ddr2_reset_n
cntrl0_sys_rst180_tb
cntrl0_user_data_valid
cntrl0_user_output_data
UG086_c8_05_071708
Figure 8-5: MIG Output of the DDR2 SDRAM Controller Design without a DCM or a Testbench
Controller Architecture
Figure 8-6 shows a block diagram representation of the top-level module for a DDR2
SDRAM design without a DCM but with a testbench. “Clocking Scheme,” page 325
describes how various clocks are generated using the DCM. The user should provide all
the clocks and the dcm_lock signal. These clocks should be single-ended. reset_in_n is the
active-Low system reset signal. All design resets are gated by the dcm_lock signal.
The cntrl0_led_error_output1 output signal indicates whether the case passes or fails. The
testbench module does writes and reads, and also compares the read data with the written
data. The cntrl0_led_error_output1 signal is driven High on data mismatches. The
cntrl0_data_valid_out signal indicates whether the read data is valid or not.
cntrl0_ddr2_odt
clk_int
System sys_rst180 cntrl0_ddr2_ras_n
Reset clk90_int
and User sys_rst90 cntrl0_ddr2_cas_n
reset_in_n infrastructure_top
DCM sys_rst cntrl0_ddr2_we_n
Clocks dcm_lock
cntrl0_ddr2_cs_n
cntrl0_ddr2_cke
cntrl0_ddr2_dm Memory
main_0 Device
cntrl0_ddr2_ba
cntrl0_ddr2_a
cntrl0_led_error_output1
Status cntrl0_ddr2_ck
cntrl0_data_valid_out
Signals cntrl0_ddr2_ck_n
cntrl0_init_done
cntrl0_ddr2_dq
cntrl0_ddr2_dqs
cntrl0_ddr2_dqs_n
cntrl0_ddr2_reset_n
UG086_c8_06_071708
Figure 8-6: MIG Output of the DDR2 SDRAM Controller Design without a DCM but with a Testbench
All the Memory Device interface signals that are shown in Figure 8-3 through Figure 8-6
do not necessarily appear for all designs that are generated from MIG. For example, port
cntrl0_ddr2_reset_n appears in the port list only for Registered DIMM designs. Similarly,
cntrl0_ddr2_DQS_N does not appear for single-ended DQS designs. Port cntrl0_ddr2_dm
appears only for the parts that contain a data mask. A few RDIMMs do not have a data
mask, and cntrl0_ddr2_dm does not appear in the port list for these parts.
Figure 8-7 shows a detailed block diagram of the DDR2 SDRAM controller. All four blocks
shown are subblocks of the ddr2_top module. The functionality of these blocks is
explained in following sections.
cntrl0_ddr2_ck
cntrl0_ddr2_ck_n
user_clk cntrl0_ddr2_cke
infrastructure_top
cntrl0_ddr2_dqs
cntrl0_ddr2_dq
Datapath
cntrl0_ddr2_dm
user_data cntrl0_ddr2_cs_n
IOBs
cntrl0_ddr2_ba
cntrl0_ddr2_a
user_command_register cntrl0_ddr2_ras_n
Controller cntrl0_ddr2_cas_n
user_address
cntrl0_ddr2_we_n
cntrl0_ddr2_odt
cntrl0_ddr2_reset_n
UG086_c8_07_071708
Controller
The controller module accepts and decodes user commands and generates read, write,
memory initialization, and load mode commands. The controller also generates signals for
other modules.
The memory is initialized and powered up using a defined process. The controller state
machine handles the initialization process upon receiving an initialization command.
Datapath
This module transmits and receives data to and from the memories. Major functions
include storing the read data and transferring write data and write enable to the IOBS
module. The data_read, data_write, data_path_IOBs, and data_read_controller modules
perform the actual read and write functions. For more information, refer to XAPP768c
[Ref 24].
Data Read
The data_read module contains the read datapaths for the DDR2 SDRAM interface. Details
for this module are described in XAPP768c [Ref 24].
Data Write
This module contains the write datapath for the DDR2 SDRAM interface. The write data
and write enable signals are forwarded together to the DDR2 SDRAM through IOB flip-
flops. The IOBs are implemented in the datapath_IOBs module.
Controller Architecture
Infrastructure_top
The infrastructure module generates the FPGA clock and reset signals. For differential
clocking, sys_clk and syc_clkb ports are used as inputs to the IBUFGDS_LVDS_25 buffer
and the output of the buffer is driven to the DCM input. For single-ended clocking, the
sys_clk_in port is used as an input to the IBUFG buffer; the output of the buffer is driven to
the DCM input. A DCM generates the clock and its inverted version. The infrastructure
module also generates all of the reset signals required for the design.
IOBs
All input and output signals of the FPGA are implemented in the IOBs.
Test Bench
MIG generates two different RTL folders, example_design and user_design. The
example_design includes the synthesizable test bench, while user_design does not include
the test bench modules. The MIG test bench performs five write commands and five read
commands in an alternating fashion. The number of words in a write command depends
on the burst length. For a burst length of 4, every write command writes four data words.
For all five write commands, the test bench writes a total of 20 data words (10 rise data
words and 10 fall data words). For a burst length of 8, the test bench writes a total of 40 data
words. The pattern data is shown in Table 8-2 and Table 8-3 for burst lengths of 4 and 8,
respectively.
The falling edge data is the complement of the rising edge data. The data pattern is
repeated for the next set of five burst write commands based on the selected burst length,
as given in Table 8-2 and Table 8-3. This data pattern is repeated in the same order based on
the number of data words written. For data widths greater than 8, the same data pattern is
concatenated for the other bits. For a 32-bit design and a burst length of 8, the data pattern
for the first write command is 96969696, 69696969, 2C2C2C2C, D3D3D3D3, 58585858,
A7A7A7A7, B1B1B1B1, 4E4E4E4E.
For all five write commands, five different address locations are generated, as shown in
Table 8-4. Read commands read the data from the same locations where writes are
performed. The column address is incremented based on the burst length from one write
command to the next write command. The row address is the same for all five write
commands. For the next five write commands, the row address is incremented by 2, and
this continues for each subsequent group of five write commands. Only five bits are used
for row address generation. The row address rolls back to the initial value on reaching the
terminal value. The bank address is the same for all five write commands, but it gets
incremented for the next five write commands. This continues until the terminal count
value is reached, depending on whether the selected memory part has a 4- or 8-bank
architecture. The MIG test bench exercises only a certain memory area. Table 8-4 provides
the details of how the bank, row, and column address are incremented in the test bench.
Clocking Scheme
During reads, the read data is compared with the pattern written. For example, for an 8-bit
data width and a burst length of 4, the write data for a single write command is 96, 69, 2C,
D3. During reads, the read pattern is compared with the 96, 69, 2C, D3 pattern. If the data
read back matches with the data written, the led_error_output1 signal is set to 0,
otherwise, it is set to 1 to indicate an error condition.
Clocking Scheme
Figure 8-8 shows the clocking scheme for this design. Global and local clock resources are
used.
The global clock resources consist of a DCM and several BUFGs. The local clock resources
consist of regional I/O clock networks (BUFIO). The global clock architecture is discussed
in this section.
The MIG tool allows the user to customize the design such that the DCM is not included.
In this case, clk_0 and clk90_0 must be supplied by the user.
DCM
BUFG
GC I/O
SYSTEM CLK
CLK0 clk_0
CLKIN
UG086_c8_11_071608
Interface Signals
Interface Signals
Table 8-6 shows the DDR2 SDRAM interface signals, directions, and descriptions. The
signal direction is with respect to the DDR2 SDRAM controller. The cntrl0_ddr2_reset_n
signal is present only for registered DIMMs, and the cntrl0_ddr2_dqs_n signal is present
when DQS# Enable is selected in the Extended Mode register.
Table 8-7 describes the DDR2 SDRAM controller system interface signals. Except for the
cntlr0_led_error_ouput1 signal, all other signals in Table 8-7 are present in designs either
with or without testbenches. The cntrl0_led_error_ouput1 signal is present only in designs
with a testbench.
Interface Signals
Table 8-8 describes the DDR2 SDRAM controller system interface signals in designs
without a testbench.
Table 8-8: DDR2 SDRAM Controller User Interface Signals (without a Testbench)
Signal Names Direction(1) Description
This bus is the write data to the DDR2 SDRAM from the user
interface, where n is the width of the DDR2 SDRAM data bus.
The DDR2 SDRAM controller converts single data rate to
cntrl0_user_input_data[(2n–1):0] Input
double data rate on the physical layer side. The data is valid
on the DDR2 SDRAM write command. In 2n, the MSB is
rising-edge data and the LSB is falling-edge data.
This bus is the data mask for write data. Like user_input_data,
it is twice the size of the data mask bus at memory, where m is
cntrl0_user_data_mask[(2m–1):0] Input the size of the data mask at the memory interface. In 2m, the
MSB applies to rising-edge data and the LSB applies to falling-
edge data.
cntrl0_user_input_address This bus consists of the row address, the column address, and
[(ROW_ADDRESS + the bank address for DDR2 SDRAM writes and reads. The
Input
COLUMN_ADDRESS + address sequence starting from the LSB is bank address,
BANK_ADDRESS – 1):0](2) column address, and row address.
Supported user commands for the DDR2 SDRAM controller:
Table 8-8: DDR2 SDRAM Controller User Interface Signals (without a Testbench) (Cont’d)
Signal Names Direction(1) Description
This is the acknowledgement signal for a user read or write
command. It is asserted by the DDR2 SDRAM controller
cntrl0_user_cmd_ack Output during a write or read to/from the DDR2 SDRAM. The user
should not issue any new commands to the controller until
this signal is deasserted.
The DDR2 SDRAM controller asserts this signal to indicate
cntrl0_init_done Output
that the DDR2 SDRAM initialization is complete.
This signal is asserted based on the frequency. For example,
for a frequency of 166 MHz, the signal is asserted every
~7.6 µs. It is asserted until the controller issues an auto-refresh
command to the memory. Upon seeing this signal, the user
should terminate any ongoing command after the current
cntrl0_auto_ref_req (3) Output burst transaction by asserting the cntrl0_burst_done signal.
The frequency with which this signal is asserted is determined
by the MAX_REF_CNT value in parameter file.
cntrl0_auto_ref_req indicates the refresh request to the
memory, and cntrl0_ar_done indicates completion of the auto-
refresh command.
This indicates that the auto-refresh command was completed
to DDR2 SDRAM. The DDR2 SDRAM controller asserts this
signal for one clock after giving an auto-refresh command to
cntrl0_ar_done (3) Output the DDR2 SDRAM and completion of TRFC time. The TRFC
time is determined by the rfc_count_value value in the
parameter file. The user can assert the next command any time
after the assertion of the cntrl0_ar_done signal.
Notes:
1. All of the signal directions are with respect to the DDR2 SDRAM controller.
2. Linear addressing is used, i.e., the row address immediately follows the column address bits, and the bank address follows the row
address bits, thus supporting more devices. The number of address bits used depends on the density of the memory part. The
controller ignores the unused bits, which can all be tied High.
3. For more information on auto refresh, refer to “Auto Refresh,” page 335.
Resource Utilization
A local inversion clocking technique is used in this design. The DCM generates only clk0
and clk90. One DCM and two BUFGMUXs are used. The Spartan-3 generation FPGA
designs operate at 166 MHz and below.
Interface Signals
power supply and reference voltages are stable. The controller asserts clock-enable to
memory after 200 µs.
Load mode parameters are to be selected from the GUI while generating the design. These
parameters are updated by MIG in the parameter file. When the INIT command is
executed, the DDR2 SDRAM controller passes these values to the Memory Load Mode
register. When the DDR2 SDRAM is initialized, the DDR2 SDRAM controller asserts the
init_done signal.
Figure 8-9 shows the timing for the memory initialization command.
I
clk0
clk180
1
user_command_register 010 Cmd
3
init_done 2
UG086_c8_08_091007
Write
Figure 8-10 shows the timing diagram for a write to DDR2 SDRAM for a burst length of
four. The user initiates the write command by sending a Write instruction to the DDR2
SDRAM controller. To terminate a write burst, the user asserts the burst_done signal for
two clocks after the last user_input_address. The burst_done signal should be asserted for
two clocks for burst lengths of four and four clocks for burst lengths of eight.
The write command is asserted on the falling edge of clk0. In response to a write
command, the DDR2 SDRAM controller acknowledges with the usr_cmd_ack signal on a
falling edge of clk0. If the controller is busy with a refresh, the usr_cmd_ack signal is not
asserted until after the refresh command cycle completes. The user asserts the first address
(row + column + bank address) with the write command and keeps it asserted for three
clocks after usr_cmd_ack assertion. Any subsequent write addresses are asserted on
alternate falling edges of clk0 after deasserting the first memory address for a burst length
of four, and it is asserted once in four clocks for a burst length of eight. The first user data
is asserted on a rising edge of clk90 after usr_cmd_ack is asserted. As the SDR data is
converted to DDR data, the width of this bus is 2n, where n is data width of DDR2 SDRAM
data bus.
For a burst length of four, only two data words (each of 2n) are given to the DDR2 SDRAM
controller for each user address, and four data words are given for a burst length of eight.
Internally, the DDR2 SDRAM controller converts into four data words for a burst length of
four and eight data words for a burst length of eight, each of n bits. To terminate the write
burst, the user asserts burst_done on a rising edge of clk180 for two clocks for a burst
length of four and four clocks for a burst length of eight. The burst_done signal is asserted
after the last memory address. Any further commands to the DDR2 SDRAM controller are
given only after the usr_cmd_ack signal is deasserted. After burst_done is asserted, the
controller terminates the burst and issues a precharge to the memory. The usr_cmd_ack
signal is deasserted after completion of the precharge.
clk0
clk90
1 6
user_command_register WRITE Command (3’b100)
user_cmd_ack 2
7
3 CLKs 4
user_input_address Addr1 Addr2
burst_done 5
(CLK180) 1.75 CLKs (CLK90)
user_input_data D0,D1 D2,D3 D4,D5 D6,D7
3 UG086_c8_09_010108
Figure 8-10: DDR2 SDRAM Write Burst, Burst Lengths of Four and Two Bursts
Interface Signals
busy with a refresh, the usr_cmd_ack signal is not asserted until after the refresh
command cycle completes.
3. The first user_input_address must be placed along with the command. The input data
is asserted with the clk90 signal after the user_cmd_ack signal is asserted.
4. The user asserts the first address (row + column +bank address) with the write
command and keeps it asserted for three clocks after usr_cmd_ack assertion. The
user_input_address signal is asserted on a falling edge of clk0. All subsequent
addresses are asserted on alternate falling edges of clk0.
5. To terminate the write burst, burst_done is asserted after the last user_input_address.
The burst_done signal is asserted for two clock cycles.
6. The user command is deasserted after burst_done is asserted.
7. The controller deasserts the user_cmd_ack signal after completion of precharge to the
memory. The next command must be given only after user_cmd_ack is deasserted.
Back-to-back write operations are supported only within the same bank and row.
Read
The user initiates a memory read with a read command to the DDR2 SDRAM controller.
Figure 8-11 shows the memory read timing diagram for a burst length of four.
clk0
clk90
1 7
user_command_
Read Command
register
user_cmd_ack 2 8
3 CLKs 3
user_input_
Addr1 Addr2
address
2 CLKs
burst_done 6
user_data_valid 5
Figure 8-11: DDR2 SDRAM Read, Burst Lengths of Four and Two Bursts
The user provides the first memory address with the read command, and subsequent
memory addresses upon receiving the usr_cmd_ack signal. Data is available on the user
data bus with the user_data_valid signal. To terminate read burst, the user asserts the
burst_done signal on a falling edge of clk0 for two clocks with the deassertion of the last
user_input_address. All subsequent addresses are asserted on alternate clocks for burst
lengths of four, and subsequent addresses are asserted once every four clock cycles for
burst lengths of eight.
For burst lengths of four, the burst_done signal is asserted for two clocks after the last
address and for four clocks for burst lengths of eight.
Interface Signals
Auto Refresh
The DDR2 SDRAM controller does a memory refresh at intervals determined by the
frequency. For example, for a frequency of 166 MHz, an auto-refresh request is raised every
~7.6 µs. The user must terminate any ongoing commands when auto_ref_req flag is
asserted after the current burst transaction by asserting the burst_done signal. The
auto_ref_req flag is asserted until the controller issues a refresh command to the memory.
The user must wait for completion of the auto-refresh command before giving any
commands to the controller when auto_ref_req is asserted.
The ar_done signal is asserted by the DDR2 SDRAM controller upon completion of the
auto-refresh command—i.e., after TRFC time. The ar_done signal is asserted on the falling
edge of clk0 for one clock cycle.
The controller sets the MAX_REF_CNT value in the parameter file according to the
frequency and selected memory component for a refresh interval (7.7 µs). The
rfc_count_value setting in the parameter file defines TRFC, the time between the refresh
command to Active or another refresh command.
After completion of the auto-refresh command, the next command can be given any time
after ar_done is asserted.
The current testbench generates five consecutive write bursts followed by five read burst
commands. For every group of five write/read commands, the controller issues an active
command followed by five write/read commands, and then a precharge command to the
memory. All five burst commands take up a maximum of 20 clock cycles. After every
precharge command, the controller state machine goes to an idle state and checks for an
auto_ref_req. When an auto_ref_req is asserted, the controller issues an auto refresh
command to the memory if it is in an idle state. In the worst case, the controller takes
20 clocks to go from burst_done to the auto refresh command and to the memory. The
controller issues auto refresh commands to the memory within 40 clock cycles after
auto_ref_req is asserted. Because the delay from auto_ref_req to the refresh command to
the memory is within the specified number of clocks even in the worst case scenario, the
testbench does not need to terminate the write or read transaction on the auto_ref_req
signal.
For example, at 77 MHz, an auto_ref_req is generated every 7.292 µs, and at 166 MHz, it is
generated every 7.572 µs. The MAX_REF_CNT parameter is set to the following values at
77 MHz and 166 MHz frequencies to allow 40 clock cycles of delay from auto_ref_req to
the refresh command:
Average periodic refresh = 7.8125 µs
MAX_REF_CNT = (7812.5 ns – 40 × clk_period)/clk_period
At 77 MHz (13 ns): MAX_REF_CNT = (7812.5 ns – 40 × 13)/13 = 7292.5/13 = 560
At 166 MHz (6 ns): MAX_REF_CNT = (7812.5ns – 40 × 6)/6 = 7572.5/6 = 1262
User transactions should be terminated within 20 clock cycles of the auto_ref_req signal
being asserted. The ar_done signal is asserted for one clock period by the controller on
completion of an auto refresh command (i.e., after TRFC time). Normal read and write
commands can be issued to the controller any time after ar_done is asserted.
clock periods) = (refresh interval) / (clock period). For example, for a refresh rate of 7.7 µs
with a memory bus running at 133 MHz:
MAX_REF_CNT = 7.7 µs / (clock period) = 7.7 µs / 7.5 ns = 1026 (decimal) = 0x402
If the above value exceeds 2MAX_REF_WIDTH – 1, the value of MAX_REF_WIDTH must be
increased accordingly in parameters_0.v (or .vhd) to increase the width of the counter
used to track the refresh interval.
Load Mode
MIG does not support the LOAD MODE command.
UCF Constraints
Some constraints are required to successfully create the design. The following examples
explain the different constraints in the UCF for XST.
MAXDELAY Constraints
The MAXDELAY constraints define the maximum allowable delay on the net. Following
are the list of MAXDELAY constraints used in Spartan FPGA designs in the UCF on
different nets. The values provided here vary depending on FPGA family and the device
type. Some values are dependent on frequency. The constraints shown here are from
example_design. The hierarchy paths of the nets are different between
example_design and user_design.
Interface Signals
Design Notes
The DDR2 SDRAM design is not validated on hardware. The MAXDELAY constraints in
the UCF are set based on the selected frequency.
Calibration circuit details and data capture techniques are covered in XAPP768c [Ref 24].
Tool Output
When the design is generated from the tool, it outputs docs, example_design, and
user_design folders. The example_design consists of the design with test_bench,
and user_design consists of the design without test_bench. Each folder contains
rtl, par, synth, and sim folders. The sim folder contains simulation files for the
generated design. The sim folder contains the external testbench, memory model, and .do
file to simulate the generated design. The memory model files are currently generated in
Verilog only. To learn more details about the files in the sim folder and to simulate the
design, refer to “Simulation Guide,” page 499.
For single-rank DIMMs, MIG outputs only the base part memory model. In the simulation
testbench (sim_tb_top in the sim folder), MIG instantiates the required number of
memory models. For example, a 1 GB single-rank DIMM with the base part is 1 Gb, and
MIG instantiates the base model eight times. If the MIG generated memory model is to be
used with the user’s test bench, multiple instances should be used based on the selected
configuration.
The MIG output memory model considers the MEM_BITS parameter by default for
memory range allocation. This covers only a partial memory range, i.e., 2MEM_BITS. To
allocate the full memory range, the MAX_MEM parameter should be set in the memory
model, which in turn sets the full_mem_bits parameter for memory allocation. Allocating
the full memory range might exceed the memory of the operating system, thus causing
memory allocation failure in simulations.
Supported Devices
The design generated out of MIG is independent of memory speed grade, hence the
package part of the memory component is replaced with X, where X indicates a don't care
condition. See Appendix G, “Low Power Options.”
The tables below list the components (Table 8-11) and DIMMs (Table 8-12 through
Table 8-14) supported by the tool for Spartan-3 FPGA DDR2 local clocking designs.
Table 8-12: Supported Unbuffered DIMMs for DDR2 SDRAM Local Clocking
(Spartan-3 FPGAs)
Unbuffered DIMMs
MT4HTF1664AY-667 MT8HTF6464AY-40E
MT4HTF1664AY-40E MT8HTF12864AY-667
MT4HTF1664AY-53E MT8HTF12864AY-40E
MT4HTF3264AY-667 MT8HTF12864AY-53E
MT4HTF3264AY-40E MT9HTF3272AY-667
MT4HTF3264AY-53E MT9HTF3272AY-40E
MT4HTF6464AY-667 MT9HTF3272AY-53E
MT4HTF6464AY-40E MT9HTF6472AY-667
MT4HTF6464AY-53E MT9HTF6472AY-53E
MT8HTF6464AY-667 MT9HTF6472AY-40E
MT8HTF6464AY-53E --
Table 8-13: Supported Registered DIMMs for DDR2 SDRAM Local Clocking
(Spartan-3 FPGAs)
Registered DIMMs
MT9HTF3272Y-53E MT18HTF6472Y-53E
MT9HTF3272PY-53E MT18HTF6472PY-53E
MT9HTF3272Y-40E MT18HTF6472Y-40E
MT9HTF3272PY-40E MT18HTF6472PY-40E
MT9HTF6472Y-53E MT18HTF12872Y-53E
MT9HTF6472PY-53E MT18HTF12872PY-53E
MT9HTF6472Y-40E MT18HTF12872Y-40E
MT9HTF6472PY-40E MT18HTF12872PY-40E
MT9HTF12872Y-53E MT18HTF25672Y-53E
Supported Devices
Table 8-13: Supported Registered DIMMs for DDR2 SDRAM Local Clocking
(Spartan-3 FPGAs) (Cont’d)
Registered DIMMs
MT9HTF12872PY-53E MT18HTF25672PY-53E
MT9HTF12872Y-40E MT18HTF25672Y-40E
MT9HTF12872PY-40E MT18HTF25672PY-40E
MT18HTF6472G-53E --
The tables below list the components (Table 8-15) and DIMMs (Table 8-16 through
Table 8-18) supported by the tool for Spartan-3A/AN FPGA DDR2 local clocking designs.
Table 8-15: Supported Components for DDR2 SDRAM Local Clocking
(Spartan-3A/AN FPGAs)
Components Packages (XX) Components Packages (XX)
MT47H64M4XX-3 BP MT47H16M16XX-3 BG
MT47H64M4XX-37E BP MT47H16M16XX-37E BG
MT47H64M4XX-5E BP MT47H16M16XX-5E BG
MT47H128M4XX-3 B6,CB,GB MT47H32M16XX-3 BN,CC,FN,GC
MT47H128M4XX-37E B6,CB,GB MT47H32M16XX-37E BN,CC,FN,GC
MT47H128M4XX-5E B6,CB,GB MT47H32M16XX-5E BN,CC,FN,GC
MT47H256M4XX-3 BT,HQ MT47H64M16XX-3 BT,HR
MT47H256M4XX-37E BT,HQ MT47H64M16XX-37E BT,HR
MT47H256M4XX-5E BT,HQ MT47H64M16XX-5E BT,HR
MT47H512M4XX-3 HG MT47H128M16XX-3 HG
MT47H512M4XX-37E HG MT47H128M16XX-37E HG
MT47H512M4XX-5E HG MT47H128M16XX-5E --
MT47H32M8XX-3 BP HYB18T1G800XXXX-3S BF,BFL,BC
MT47H32M8XX-37E BP HYB18T1G800XXXX-37 BF,BFL,BC
MT47H32M8XX-5E BP HYB18T1G160XXXX-3S BF,BFV,BFL,BC
MT47H64M8XX-3 B6,CB,F6,GB HYB18T1G160XXXX-37 BF,BFV,BFL,BC
Table 8-17: Supported Registered DIMMs for DDR2 SDRAM Local Clocking
(Spartan-3A/AN FPGAs)
Registered DIMMs
MT9HTF3272Y-53E MT9HTF6472Y-40E
MT9HTF3272PY-53E MT9HTF6472PY-40E
MT9HTF3272Y-40E MT9HTF12872Y-53E
MT9HTF3272PY-40E MT9HTF12872PY-53E
MT9HTF6472Y-53E MT9HTF12872Y-40E
MT9HTF6472PY-53E MT9HTF12872PY-40E
Supported Devices
The tables that follow list the components (Table 8-19) and DIMMs (Table 8-20 and
Table 8-21) supported by the tool for Spartan-3A DSP FPGA DDR2 local clocking designs.
Table 8-19: Supported Components for DDR2 SDRAM Local Clocking
(Spartan-3A DSP FPGAs)
Components Packages (XX) Components Packages (XX)
MT47H64M4XX-3 BP MT47H16M16XX-3 BG
MT47H64M4XX-37E BP MT47H16M16XX-37E BG
MT47H64M4XX-5E BP MT47H16M16XX-5E BG
MT47H128M4XX-3 B6,CB,GB MT47H32M16XX-3 BN,CC,FN,GC
MT47H128M4XX-37E B6,CB,GB MT47H32M16XX-37E BN,CC,FN,GC
MT47H128M4XX-5E B6,CB,GB MT47H32M16XX-5E BN,CC,FN,GC
MT47H256M4XX-3 BT,HQ MT47H64M16XX-3 BT,HR
MT47H256M4XX-37E BT,HQ MT47H64M16XX-37E BT,HR
MT47H256M4XX-5E BT,HQ MT47H64M16XX-5E BT,HR
MT47H512M4XX-3 HG MT47H128M16XX-3 HG
MT47H512M4XX-37E HG MT47H128M16XX-37E HG
MT47H512M4XX-5E HG MT47H128M16XX-5E --
MT47H32M8XX-3 BP HYB18T1G800XXXX-3S BF,BFL,BC
MT47H32M8XX-37E BP HYB18T1G800XXXX-37 BF,BFL,BC
MT47H32M8XX-5E BP HYB18T1G160XXXX-3S BF,BFV,BFL,BC
MT47H64M8XX-3 B6,CB,F6,GB HYB18T1G160XXXX-37 BF,BFV,BFL,BC
MT47H64M8XX-37E B6,CB,F6,GB HYB18T1G400XXXX-3S BF,BFL,BC
MT47H64M8XX-5E B6,CB,F6,GB HYB18T1G400XXXX-37 BF,BFL,BC
MT47H128M8XX-3 BT,HQ HYB18T512800XXXX-3S B2F,B2C,B2FL
MT47H128M8XX-37E BT,HQ HYB18T512800XXXX-37 B2F,B2C,B2FL
MT47H128M8XX-5E BT,HQ HYB18T512160XXXX-3S B2F,B2C,B2FL
MT47H256M8XX-3 HG HYB18T512160XXXX-37 B2F,B2C,B2FL
MT47H256M8XX-37E HG HYB18T512400XXXX-3S B2F,B2C,B2FL
MT47H256M8XX-5E HG HYB18T512400XXXX-37 B2F,B2C,B2FL
Supported Devices
Table 8-22: Spartan-3 FPGA Maximum Data Width for DDR and DDR2 Memories
Maximum Data Width when Data, Address, and Control are Allocated in...
Serial ...Different Banks ...the Same Bank
FPGA
Number
Bank Bank Bank Bank Bank Bank Banks
Left Right Left Right
2 3 6 7 2 3 6/7
1 XC3S50CP132 0 0 0 0 8 8 0 0 0 0 0
2 XC3S50PQ208 0 0 0 0 8 8 0 0 0 0 0
3 XC3S50TQ144 0 0 0 0 8 8 0 0 0 0 0
4 XC3S200FT256 8 8 8 8 16 16 0 0 0 8 8
5 XC3S200PQ208 0 8 0 0 16 16 0 0 0 0 0
6 XC3S200TQ144 0 0 0 0 8 8 0 0 0 0 0
7 XC3S400FG320 8 8 8 8 24 24 0 0 0 16 16
8 XC3S400FG456 16 8 16 8 32 24 0 0 0 16 16
9 XC3S400FT256 8 8 8 8 16 16 0 0 0 8 8
10 XC3S400PQ208 0 0 0 0 8 8 0 0 0 0 0
11 XC3S400TQ144 0 0 0 0 8 8 0 0 0 0 0
12 XC3S1000FG320 8 8 8 8 24 24 0 0 0 16 16
13 XC3S1000FG456 16 16 16 16 48 48 8 8 8 32 32
14 XC3S1000FG676 24 24 24 24 48 48 8 8 8 32 32
Table 8-22: Spartan-3 FPGA Maximum Data Width for DDR and DDR2 Memories (Cont’d)
Maximum Data Width when Data, Address, and Control are Allocated in...
Serial ...Different Banks ...the Same Bank
FPGA
Number
Bank Bank Bank Bank Bank Bank Banks
Left Right Left Right
2 3 6 7 2 3 6/7
15 XC3S1000FT256 8 8 8 8 16 16 0 0 0 8 8
16 XC3S1500FG320 8 8 8 8 24 24 0 0 0 16 16
17 XC3S1500FG456 16 16 16 16 48 48 8 8 8 40 40
18 XC3S1500FG676 32 32 32 32 72 72 16 16 16 48 48
19 XC3S2000FG456 16 16 16 16 48 48 8 8 8 32 32
20 XC3S2000FG676 32 32 32 32 72 72 16 16 16 56 56
21 XC3S2000FG900 32 32 32 40 72 72 24 24 24 64 64
22 XC3S4000FG676 24 32 32 32 72 72 16 16 16 56 48
23 XC3S4000FG900 40 40 40 40 72 72 32 32 32 72 72
24 XC3S4000FG1156 48 48 48 48 72 72 32 32 32 72 72
25 XC3S5000FG676 24 24 24 32 64 64 16 16 16 48 48
26 XC3S5000FGG676 24 24 24 32 64 64 16 16 16 48 48
27 XC3S5000FG900 40 40 40 40 72 72 32 32 32 72 72
28 XC3S5000FG1156 56 56 48 56 72 72 40 40 40 72 72
Table 8-23: Spartan-3E FPGA Maximum Data Width for DDR SDRAMs
Maximum Data Width when Data, Address, and
Control are Allocated in...
Serial
FPGA
Number ...Different Banks ...the Same Bank
Left Right Left/Right
1 XC3S100ECP132 8 8 0
2 XC3S100ETQ144 8 8 0
3 XC3S250ECP132 8 0 0
4 XC3S250EFT256 16 16 0
5 XC3S250EPQ208 16 16 0
6 XC3S250ETQ144 8 8 0
7 XC3S500ECP132 8 0 0
8 XC3S500EFG320 24 24 8
9 XC3S500EFT256 16 16 8
10 XC3S500EPQ208 8 8 0
11 XC3S1200EFG320 16 16 16
12 XC3S1200EFG400 32 32 16
13 XC3S1200EFT256 16 8 8
14 XC3S1600EFG320 16 16 8
Supported Devices
Table 8-23: Spartan-3E FPGA Maximum Data Width for DDR SDRAMs (Cont’d)
Maximum Data Width when Data, Address, and
Serial Control are Allocated in...
FPGA
Number ...Different Banks ...the Same Bank
Left Right Left/Right
15 XC3S1600EFG400 24 32 16
16 XC3S1600EFG484 48 40 32
Table 8-27: Spartan-3A DSP FPGA DQS Maximum Data Width (Single/Differential
DQS Enabled)
Maximum Data Width when Data, Address, and
Control are Allocated in...
Serial
FPGA
Number ...Different Banks ...the Same Bank
Left Right Left Right
1 XC3SD1800A-CS484 32 32 16 16
2 XC3SD3400A-CS484 32 32 16 16
3 XC3SD1800A-FG676 64 64 48 48
4 XC3SD3400A-FG676 64 64 48 48
Supported Devices
Notes:
NS = not supported.
Notes:
1. NS = not supported.
Supported IO Standards
Table 8-33 shows the I/O standards supported for MIG generated Spartan FPGA families.
Table 8-33: Supported I/O Standards for MIG-Generated Spartan FPGA Families
Spartan-
Standard Vcco Drive/Class 3A/3AN/3A Spartan-3E Spartan-3
DSP
8MA All Banks All Banks All Banks
1.8V
16MA Banks 1/3 All Banks All Banks
LVCMOS
12MA All Banks All Banks All Banks
2.5V
24MA Banks 1/3 All Banks All Banks
I All Banks All Banks All Banks
1.8V
II All Banks (1) - All Banks
SSTL
I All Banks All Banks All Banks
2.5V
II All Banks (1) - All Banks
Supported Devices
Table 8-33: Supported I/O Standards for MIG-Generated Spartan FPGA Families
Spartan-
Standard Vcco Drive/Class 3A/3AN/3A Spartan-3E Spartan-3
DSP
I All Banks All Banks -
1.8V
II All Banks (2) - -
DIFF_SSTL
I All Banks All Banks -
2.5V
II All Banks (2) - All Banks
LVDS 2.5V - All Banks (3) All Banks All Banks
Notes:
1. Outputs are restricted to banks 1 and 3. Inputs are unrestricted.
2. These high-drive outputs are restricted to banks 1 and 3. Inputs are unrestricted.
3. These differential outputs are restricted to banks 0 and 2. Inputs are unrestricted.
Table 8-34: Hardware Tested Configurations for Spartan-3A FPGA DDR2 SDRAM
Designs
Synthesis Tools XST
HDL Verilog and VHDL
FPGA Device XC3S700AFG484-4
Burst Lengths 4 and 8
CAS Latency (CL) 3
16-bit Design Tested on 16-bit Component “MT47H32M16XX-5E”
Frequency Range 25 MHz to 225 MHz
The frequency shown in Table 8-35 was achieved on the Spartan-3A DSP FPGA 3400A
Development Board under nominal conditions. This frequency should not be used to
determine the design frequency. The maximum design frequency supported in the MIG
wizard is based a combination of the TRCE results for fabric timing on multiple
device/package combinations and I/O timing analysis using FPGA and memory timing
parameters for a 64-bit interface.
Table 8-35: Hardware Tested Configurations for Spartan-3A DSP FPGA DDR2 SDRAM
Designs
Synthesis Tools XST
HDL Verilog and VHDL
FPGA Device XC3SD3400AFG676-4
Burst Lengths 4 and 8
CAS Latency (CL) 3
32-bit Design Tested on 64-bit SO DIMM “MT4HTF6464HY-667”
Frequency 133 MHz
Chapter 9
Interface Model
DDR2 SDRAM interfaces are source-synchronous and double data rate. They transfer data
on both edges of the clock cycle. A memory interface can be modularly represented as
shown in Figure 9-1. A modular interface has many advantages. It allows designs to be
ported easily and also makes it possible to share parts of the design across different types
of memory interfaces.
Xilinx FPGA
Control Layer
Physical Layer
Memories
UG086_c9_01_061606
Feature Summary
This section summarizes the supported and unsupported features of the DDR2 SDRAM
controller design.
Supported Features
The DDR2 SDRAM controller design supports:
• Burst lengths of four and eight
• Sequential and interleaved burst types
• CAS latencies of 3, 4, and 5
• Additive latencies of 0, 1, 2, 3, and 4
• Differential DQS
• ODT
• Verilog and VHDL
• Byte wise data masking
• Precharge and auto refresh
• Bank management
• Linear addressing
• ECC
• Different memories (density/speed)
• Memory components, registered DIMMs, unbuffered DIMMs, and SODIMMs
• Deep support for Dual Rank DIMMs of value 2
• With and without a testbench
• With and without a PLL
• Multicontroller (DDR2) and Multiple Interfaces (DDR2 and QDRII)
• PPC440 pinout
• Data mask
• Two bytes per bank
• System clock, differential and single-ended
The supported features are described in more detail in “Architecture.”
Architecture
Unsupported Features
The DDR2 SDRAM controller design does not support:
• Single-ended DQS
• Redundant DQS (RDQS)
• Deep support for components, single-rank DIMMs and deep value of 4
Architecture
Implemented Features
This section provides details on the supported features of the DDR2 SDRAM controller.
Burst Length
The DDR2 SDRAM controller supports burst lengths of four and eight. Through the “Set
mode register(s)” option, the burst length can be selected. For a design without a testbench
(user_design), the user has to provide bursts of the input data based on the chosen burst
length. Bits M2:M0 of the Mode Register define the burst length, and bit M3 indicates the
burst type (see the Micron data sheet). Read and write accesses to the DDR2 SDRAM are
burst-oriented. It determines the maximum number of column locations accessed for a
given READ or WRITE command.
CAS Latency
The DDR2 SDRAM controller supports CAS latencies of 3, 4, and 5. The CAS latency (CL)
can be selected in the “Set mode register(s)” option. CL is implemented in the phy_write
module. During data write operations, the generation of the dqs_oe_n and dqs_rst_n
signals varies according to the CL in the phy_write module. During read data operations,
the generation of the ctrl_rden signal varies according to the CL in the ctrl module. Bits
M4:M6 of the Mode Register define the CL (see the Micron data sheet). CL is the delay in
clock cycles between the registration of a READ command and the availability of the first
bit of output data.
Additive Latency
DDR2 SDRAM devices support a feature called posted CAS additive latency (AL). The
DDR2 SDRAM supports ALs of 0, 1, 2, 3, and 4. AL can be selected in the “Set mode
register(s)” option. AL is implemented in the DDR2 SDRAM ctrl module. The ctrl module
issues READ/WRITE commands prior to tRCD (minimum) depending on the user-selected
AL value in the Extended Mode Register. This feature allows the READ command to be
issued prior to tRCD (minimum) by delaying the internal command to the DDR2 SDRAM
by AL clocks. Posted CAS AL makes the command and data bus efficient for sustainable
bandwidths in DDR2 SDRAM. Bits E3:E5 of the Extended Mode Register define the value
of AL (see the Micron data sheet).
Data Masking
DDR2 SDRAM design supports data masking per byte. Masking per nibble is not
supported due to the limitation of the internal block RAM based FIFOs. So, the masking of
data can be done on per byte basis. The mask data is stored in the Data FIFO along with the
actual data.
MIG supports a data mask option. If this option is checked in the GUI, MIG generates a
design with data mask pins. This option can be chosen if the selected part has data
masking. DDR2 SDRAM designs do not support read-modify-write operations in ECC
mode. The mask bits to the SDRAM should never be asserted while in the ECC mode.
Thus, when ECC is selected, the data masking selection is disabled in the GUI.
Precharge
The PRECHARGE command is used to close the open row in a bank if there is a command
to be issued in the same bank. The Virtex-5 FPGA DDR2 controller issues a PRECHARGE
command only if there is already an open row in the particular bank where a read or write
command is to be issued, thus increasing the efficiency of the design. The auto precharge
function is not supported in this design. The design ties the A10 bit Low during normal
reads and writes.
Auto Refresh
The auto refresh command is issued to the memory at specified intervals of time. The
memory issues an auto refresh command to refresh the charge to retain the data.
Bank Management
A Virtex-5 FPGA DDR2 SDRAM controller design supports bank management that
increases the efficiency of the design. The controller keeps track of whether the bank being
accessed already has an open row or not and also decides whether a PRECHARGE
command should be issued or not to that bank. When bank management is enabled via the
MULTI_BANK_EN parameter, a maximum of four banks/rows can open at any one time.
A least recently used (LRU) algorithm is employed to keep the three most recently used
banks and to close the least recently used bank when a new bank/row location needs to be
accessed. The bank management feature can also be disabled by clearing
MULTI_BANK_EN. For more information on Bank Management, refer to application note
XAPP858 [Ref 27].
Linear Addressing
The DDR2 SDRAM controller supports linear addressing. Linear addressing refers to the
way the user provides the address of the memory to be accessed. For Virtex-5 FPGA DDR2
SDRAM controllers, the user provides the address information through the app_af_addr
signal. As the densities of the memory devices vary, the number of column address bits
and row address bits also change. In any case, the row address bits in the app_af_addr
Architecture
signal always start from the next higher bit, where the column address ends. This feature
increases the number of devices that can be supported with the design.
Deep Memories
The MIG DDR2 SDRAM controller supports Dual Rank DIMMs with depth of 2. For deep
memory implementations, MIG generates chip selects, CKE signals, and ODT signals for
each memory. The clock widths (CK and CK_N) are a multiple factor of the deep
configuration chosen in MIG.
For deep memories, DDR2 SDRAMs are initialized one after the other to avoid loading the
address and control buses, and the calibration is done on the last memory. Apart from
initialization, the DDR2 SDRAM controller module also demultiplexes the column, row,
and bank addresses from the user address. The module also decodes the chip selects and
rank addresses for DIMMs.
On-Die Termination
The DDR2 SDRAM controller supports on-die termination (ODT). Through the “Set mode
register(s)” option from the GUI, the user can disable ODT or can choose 75, 150, or 50.
ODT can turn the termination on and off as needed to improves signal integrity in the
system.
ODT is only enabled on writes to DDR2 memory. It is disabled on read operations. One
single dual-rank DIMM is populated in a single slot. Rank 1 and Rank 2 of slot 1 or slot 2
are referred to as CS0 and CS1. ODT0 should be connected to the ODT signal of CS0 and
ODT1 should be connected to the ODT signal of CS1. ODT0 is enabled when writing to
CS0 or CS1. During read operations, the ODT is disabled. In this configuration, ODT for
CS1 is always off. Table 9-2 shows ODT control during write operations, and Table 9-3
shows ODT control during read operations.
Note: The Virtex-5 FPGA DDR2 interface requires that if parallel termination is used at the
memory end, it must be ODT rather than external termination resistor(s). This is a requirement of
the read capture scheme used. For more information on the need for ODT, refer to
XAPP858 [Ref 27].
Multicontrollers
MIG supports multicontrollers for DDR2 SDRAMs and multiple interfaces for
DDR2 SDRAMs and QDRII SRAMs. Up to eight controllers are supported. In
multicontroller and multiple interface designs, every controller can have a different
frequency. The number of controllers that can have different frequencies is limited by the
number of PLLs available in the selected FPGA. For example, a total of six PLL resources
are available for the XC5VLX50 device, so a maximum of six controllers can have different
frequencies. Even though the number of controllers selected in the GUI is eight, a
maximum of six controllers can have different frequencies. Thus, for the remaining two
controllers, the user should select one of the already selected frequencies. Refer to the
Virtex-5 FPGA User Guide [Ref 10] for PLL resources available for various devices.
For a single controller design, all memory and user interface signals appear as shown in
Figure 9-7 and Table 9-6 based on the selected part. For a multicontroller design, all
memory and user interface signal names are prepended with the controller number. For
example, for a two controller design (two DDR2 controllers), the ddr2_dq port appears as
c0_ ddr2_dq and c1_ ddr2_dq. A similar naming convention is followed for the parameters
provided in Table 9-2. Some parameters, such as HIGH_PERFORMANCE_MODE,
CLK_TYPE and RST_ACT_LOW, are common for all the controllers and do not have the
controller number prepended.
System Clock
MIG supports differential and single-ended system clocks. Based on the selection in the
GUI, input system clocks and IDELAY clocks are differential or single-ended.
Architecture
When this parameter value is set to TRUE, the IODELAY jitter valuer per tap is reduced.
This reduction results in a slight increase in power dissipation from the IODELAY element.
When this parameter value is set to FALSE, the IODELAY power dissipation is reduced,
but with an increase in the jitter value per tap.
The value of this parameter can be selected from the MIG FPGA options page. Users can
also manually set this parameter value to TRUE or FALSE in the design top-level block
HDL module.
Refer to Appendix E, “Debug Port” for more information on the IODELAY Performance
Mode.
Generic Parameters
The DDR2 SDRAM design is a generic design that works for most of the features
mentioned above. User input parameters are defined as parameters for Verilog and
generics in VHDL in the design modules and are passed down the hierarchy. For example,
if the user selects a burst length of 4, then it is defined as follows in the <top_module>
module:
parameter BURST_LEN = 4, // burst length (in doublewords)
The user can change this parameter in <top_module> for various burst lengths to get the
desired output. The same concept applies to most of the other parameters listed in the
<top_module> module. The user cannot change REG_ENABLE and CLK_TYPE to reflect
those changes directly. The user should manually edit <top_module> for port connections
and other logical changes. Table 9-4 lists the details of all parameters.
Architecture
Architecture
Hierarchy
Figure 9-2 shows the hierarchical structure of the DDR2 SDRAM design generated by MIG
with a testbench and a PLL.
<top_
module>
ddr2_ ddr2_
ddr2_ ddr2_ctrl ddr2_
tb_test_ tb_test_
phy_top usr_top
addr_gen data_gen
Constraints
The Virtex-5 FPGA DDR2 design uses a combination of the IOB flop (IDDR) and fabric-
based flops for read data capture. This requires the use of pinout-dependent location
constraints. For more details, see Appendix B, “Pinout-Related UCF Constraints for Virtex-
5 FPGA DDR2 SDRAMs.”
In Virtex-5 FPGA DDR2 designs containing single PPC440 processors (FX30T-FF65, FX70T-
FF665, and FX70T-FF1136), data cannot be allocated to non-DCI banks (Bank 1 and Bank 2).
Because PPC440 processor blocks are closer to the I/O pads, location constraints for DQ
read-data capture flip-flops (for second # stage capture) will not find slices closer to I/Os.
Therefore, a Virtex-5 FPGA DDR2 design cannot generate squelch constraints for Bank 1
and Bank 2 and these two non-DCI banks are not selectable in the GUI.
Architecture
Sig:
u_ddr2_top_0/u_mem_if_top/u_phy_top/u_phy_io/gen_dq[0].u_iob_dq/stg1_o
ut_rise_sg3
Sig:
u_ddr2_top_0/u_mem_if_top/u_phy_top/u_phy_io/gen_dq[1].u_iob_dq/stg1_o
ut_rise_sg3
and also compares the read data with written data. The error signal is driven High on data
mismatches. The phy_init_done signal indicates the completion of initialization and
calibration of the design.
clk200
clk200_p rst200 idelay_ctrl idelay_ctrl_rdy
clk200_n
System
Clocks sys_clk_p clk0
ddr2_ras_n
and Reset sys_clk_n Infrastructure clk90
ddr2_cas_n
sys_rst_n clkdiv0
ddr2_we_n
rst0 ddr2_cs_n
rst90 ddr2_cke
rstdiv0 ddr2_odt
ddr2_dm
ddr2_top ddr2_ba Memory
Device
ddr2_a
ddr2_ck
ddr2_ck_n
ddr2_dq
ddr2_dqs
phy_init_done ddr2_dqs_n
ddr2_reset_n
Status
Signals
tb_top
error
UG086_c9_03_070808
Figure 9-3: Top-Level Block Diagram of the DDR2 SDRAM Design with a PLL and a Testbench
Architecture
Figure 9-4 shows a top-level block diagram of a DDR2 SDRAM design with a PLL but
without a testbench. The sys_clk_p and sys_clk_n signals are differential input system
clocks. “Clocking Scheme,” page 377 describes how various clocks are generated using the
PLL. The PLL is instantiated in the infrastructure module that generates the required
design clocks. The clk200_p and clk200_n signals are used for the idelay_ctrl element. The
sys_rst_n signal is the active-Low system reset signal. All design resets are gated by the
locked signal. The user has to drive the user application signals. The design provides the
clk_tb and reset_tb signals to the user in order to synchronize with the design. The clk0_tb
signal is connected to clk0 in the controller. If the user clock domain is different from
clk0/clk0_tb, the user should add FIFOs for all the inputs and outputs of the controller
(user application signals) in order to synchronize them to the clk0_tb clock. The
phy_init_done signal indicates the completion of initialization and calibration of the
design.
clk200
rst200 idelay_ctrl idelay_ctrl_rdy
clk200_p clk0
clk200_n clk90
System
sys_clk_p Infrastructure
Clocks clkdiv0
and Reset sys_clk_n rst0 ddr2_ras_n
sys_rst_n rst90
ddr2_cas_n
rstdiv0
ddr2_we_n
app_af_addr ddr2_cs_n
app_af_cmd ddr2_cke
app_af_wren ddr2_odt
app_wdf_data ddr2_dm
app_wdf_mask_data ddr2_ba Memory
ddr2_top
Device
app_wdf_wren ddr2_a
User
app_wdf_afull ddr2_ck
Application
app_af_afull ddr2_ck_n
rd_data_valid ddr2_dq
rd_data_fifo_out ddr2_dqs
clk0_tb ddr2_dqs_n
rst0_tb ddr2_reset_n
phy_init_done
UG086_c9_04_070808
Figure 9-4: Top-Level Block Diagram of the DDR2 SDRAM Design with a PLL but without a Testbench
Figure 9-5 shows a top-level block diagram of a DDR2 SDRAM design without a PLL or a
testbench. The user should provide all the design clocks and the locked signal. “Clocking
Scheme,” page 377 explains the details of how to generate the design clocks from the user
interface. These clocks should be single-ended. The sys_rst_n signal is the active-Low
system reset signal. All design resets are gated by the locked signal. The user application
must have a PLL/DCM primitive instantiated in the design, and all user clocks should be
driven through BUFGs. The user has to drive the user application signals. The design
provides the clk_tb and reset_tb signals to the user in order to synchronize with the design.
The clk0_tb signal is connected to clk0 in the controller. If the user clock domain is different
from clk0/clk0_tb, the user should add FIFOs for all the inputs and outputs of the
controller (user application signals) in order to synchronize them to the clk0_tb clock. The
phy_init_done signal indicates the completion of initialization and calibration of the
design.
clk_200
idelay_ctrl
System rst200 idelay_ctrl_rdy
Reset
and clk_0
User PLL/ clk_90
Infrastructure rst0
DCM
clkdiv0 rst90
sys_rst_n
rstdiv0
locked ddr2_ras_n
ddr2_cas_n
ddr2_we_n
app_af_addr ddr2_cs_n
app_af_cmd ddr2_cke
app_af_wren ddr2_odt
app_wdf_data ddr2_dm Memory
ddr2_ba Device
app_wdf_mask_data ddr2_top
app_wdf_wren ddr2_a
User app_wdf_afull ddr2_ck
Application
app_af_afull ddr2_ck_n
rd_data_valid ddr2_dq
rd_data_fifo_out ddr2_dqs
clk0_tb ddr2_dqs_n
rst0_tb ddr2_reset_n
phy_init_done
UG086_c9_05_012609
Figure 9-5: Top-Level Block Diagram of the DDR2 SDRAM Design without a PLL or a Testbench
Architecture
Figure 9-6 shows a top-level block diagram of a DDR2 SDRAM design without a PLL but
with a testbench. The user should provide all the clocks and the locked signal. “Clocking
Scheme,” page 377 explains the details of how to generate the design clocks from the user
interface. These clocks should be single-ended. sys_rst_n is the active-Low system reset
signal. All design resets are gated by the locked signal. The user application must have a
PLL/DCM primitive instantiated in the design, and all user clocks should be driven
through BUFGs. The error output signal indicates whether the case passes or fails. The
testbench module does writes and reads, and also compares the read data with the written
data. The error signal is driven High on data mismatches. The phy_init_done signal
indicates the completion of initialization and calibration of the design.
clk200 clk200
idelay_ctrl
rst200 idelay_ctrl_rdy
System clk0
Clocks rst0
and Reset clk90 Infrastructure ddr2_ras_n
rst90
clkdiv0 ddr2_cas_n
sys_rst_n ddr2_we_n
rstdiv0
locked ddr2_cs_n
ddr2_cke
ddr2_odt
ddr2_dm
ddr2_top Memory
ddr2_ba
Device
ddr2_a
ddr2_ck
ddr2_ck_n
ddr2_dq
ddr2_dqs
phy_init_done ddr2_dqs_n
ddr2_reset_n
Status
Signals
tb_top
error
UG086_c9_06_012609
Figure 9-6: Top-Level Block Diagram of the DDR2 SDRAM Design without a PLL but with a Testbench
clk200
clk200_p rst200 idelay_ctrl idelay_ctrl_rdy
clk200_n clk0
idly_clk_200 clk90
System
Clocks sys_clk_p Infrastructure
clkdiv0
and Reset sys_clk_n rst0
sys_clk rst90
sys_rst_n rstdiv0
ddr2_top/mem_if_top
app_af_addr ddr2_ras_n
write_data
app_af_cmd ddr2_cas_n
app_af_wren ddr2_we_n
app_wdf_data ddr2_cs_n
app_wdf_mask_data ddr2_cke
usr_top
app_wdf_wren ddr2_odt
User ddr2_dm
Application app_wdf_afull read_data
phy_top ddr2_ba Memory
app_af_afull Device
ddr2_a
rd_data_valid
ddr2_ck
rd_data_fifo_out
Control ddr2_ck_n
clk0_tb
Signals ddr2_dq
rst0_tb
ddr2_dqs
phy_init_done
ctrl ddr2_dqs_n
ddr2_reset_n
Control
Signals
UG086_c9_07_070808
Infrastructure
The infrastructure module generates the design clocks and reset signals. When differential
clocking is used, sys_clk_p, sys_clk_n, clk_200_p, and clk_200_n signals appear. When
single-ended clocking is used, sys_clk and idly_clk_200 signals appear. In addition, clocks
are available for design use and a 200 MHz clock is provided for the IDELAYCTRL
primitive. Differential and single-ended clocks are passed through global clock buffers
before connecting to a PLL/DCM. For differential clocking, the output of the
sys_clk_p/sys_clk_n buffer is single-ended and is provided to the PLL/DCM input.
Likewise, for single-ended clocking, sys_clk is passed through a buffer and its output is
provided to the PLL/DCM input. The outputs of the PLL/DCM are clk0 (0° phase-shifted
version of the input clock) clk90 (90° phase-shifted version of the input clock), and clkdiv0
Architecture
(half the frequency of the input clock and phase aligned with clk0). After the PLL/DCM is
locked, the design is in the reset state for at least 25 clocks. The infrastructure module also
generates all of the reset signals required for the design.
PLL/DCM
In MIG 3.0 and later, the DCM is replaced with a PLL for all Virtex-5 FPGA designs. If the
user selects a design with a PLL in the GUI, the infrastructure module will have both PLL
and DCM codes. The CLK_GENERATOR parameter enables either a PLL or a DCM in the
infrastructure module. The CLK_GENERATOR parameter is set to PLL by default. If the
user wants to use DCM, this parameter should be changed manually to DCM.
For designs without a PLL, the user application must have a PLL/DCM primitive
instantiated in the design, and all user clocks should be driven through BUFGs.
Idelay_ctrl
This module instantiates the IDELAYCTRL primitive of the Virtex-5 FPGA. The
IDELAYCTRL primitive is used to continuously calibrate the individual delay elements in
its region to reduce the effect of process, temperature, and voltage variations. A 200 MHz
clock has to be fed to this primitive. For more information on IDELAYCTRLs, refer to
section “Verify IDELAYCTRL Instantiation for Virtex-4 and Virtex-5 FPGA Designs” in
Chapter 14.
Ctrl
The ctrl module is the main controller of the Virtex-5 FPGA DDR2 SDRAM controller
design. It generates all the control signals required for the DDR2 memory interface and the
user interface. During the normal operation, this module toggles the memory address and
control signals.
The ctrl module decodes the user command and issues the specified command to the
memory. The app_af_cmd signal is decoded as a write command when it equals 3’b000,
and app_af_cmd is decoded as a read command when it equals 3’b001. The commands and
control signals are generated based on the input burst length and CAS latency. The
controller state machine issues the commands in the correct sequence while determining
the timing requirements of the memory.
In the multi-bank mode (MULTIBANK_EN = 1), the controller has the ability to keep four
banks open at a time. The banks are opened in the order of the commands that are
presented to the controller. In the event that four banks are already opened and an access
arrives to the fifth bank, the least recently used bank is closed and the new bank is opened.
All the banks are closed during auto refresh and are opened as commands are presented to
the controller. Depending on the traffic pattern, the multi-bank enable mode can increase
the efficiency of the design.
In the single-bank mode (MULTIBANK_EN = 0), the controller keeps one bank open at a
time. When there is an access to a different bank or to a different row in the current bank,
the controller closes the current row and bank and opens the new row and bank.
phy_top
The phy_top module is the top level of the physical interface of the design. The physical
layer includes the input/output blocks (IOBs) and other primitives used to read and write
the double data rate signals to and from the memory, such as IDDR and ODDR. This
module also includes the IODELAY elements of the Virtex-5 FPGA. These IODELAY
elements are used to delay the data signals to capture the valid data into the Read Data
FIFO.
The memory control signals, such as RAS_N, CAS_N, and WE_N, are driven from the
buffers in the IOBs. All the input and output signals to and from the memory are
referenced from the IOB to compensate for the routing delays inside the FPGA.
The phy_init module, which is instantiated in the phy_top module, is used to initialize the
DDR2 memory in a predefined sequence according to the JEDEC standard for DDR2
SDRAM.
The phy_calib module calibrates the design to align the strobe signal such that it always
captures the valid data in the FIFO. This calibration is needed to compensate for the trace
delays between the memory and the FPGA devices.
The phy_write module splits the user data into rise data and fall data to be sent to the
memory as a double data rate signal using ODDR. Similarly, while reading the data from
memory, the data from IDDR is combined to get a single vector that is written into the read
FIFO.
usr_top
The usr_top module is the user interface block of the design. It receives and stores the user
data, command, and address information in respective FIFOs. The ctrl module generates
the required control signals for this module. During a write operation, the data stored in
the usr_wr_fifo is read and given to the physical layer to output to the memory. Similarly,
during a read operation, the data from the memory is read via IDDR and written into the
FIFOs. This data is given to the user with a valid signal (rd_data_valid), which indicates
valid data on the rd_data_fifo_out signal. Table 9-6 lists the user interface signals.
The FIFO36 and FIFO36_72 primitives are used for loading address and data from the user
interface. The FIFO36 primitive is used in the ddr2_usr_addr_fifo module. The FIFO36_72
primitive is used in the ddr2_usr_wr and ddr2_usr_rd modules. Every FIFO has two FIFO
threshold attributes, ALMOST_EMPTY_OFFSET and ALMOST_FULL_OFFSET, that are
set to 7 and F, respectively, in the RTL by default. These values can be changed as needed.
For valid FIFO threshold offset values, refer to UG190 [Ref 10].
Test Bench
The MIG tool generates two RTL folders, example_design and user_design. The
example_design folder includes the synthesizable test bench, while user_design does not
include the test bench modules. The MIG test bench performs eight write commands and
eight read commands in an alternating fashion. The number of words in a write command
depends on the burst length. For a burst length of 4, the test bench writes a total of 32 data
words for all eight write commands (16 rise data words and 16 fall data words). For a burst
length of 8, the test bench writes a total of 64 data words. It writes the data pattern of FF,
00, AA, 55, 55 AA, 99, 66 in a sequence of which FF, AA, 55, and 99 are rise data words and
00, 55, AA, and 66 are fall data words for an 8-bit design. The falling edge data is the
complement of the rising edge data. For a burst length of 4, the data sequence for the first
write command is FF, 00, AA, 55, and the data sequence for the second write command is
55, AA, 99, 66. For a burst length of 8, the data pattern for the first write command is FF,
00, AA, 55, 55 AA, 99, 66 and the same pattern is repeated for all the remaining write
commands. This data pattern is repeated in the same order based on the number of data
words written. For data widths greater than 8, the same data pattern is concatenated for
the other bits. For a 32-bit design and a burst length of 8, the data pattern for the first write
command is FFFFFFFF, 00000000, AAAAAAAA, 55555555, 55555555, AAAAAAAA,
99999999, 66666666.
Address generation logic generates eight different addresses for eight write commands.
The same eight address locations are repeated for the following eight read commands. The
read commands are performed at the same locations where the data is written. There are
total of 32 different address locations for 32 write commands, and the same address
locations are generated for 32 read commands. Upon completion of a total of 64
commands, including both writes and reads (eight writes and eight reads repeated four
times), address generation rolls back to the first address of the first write command and the
same address locations are repeated. The MIG test bench exercises only a certain memory
area. The address is formed such that all address bits are exercised. During writes, a new
address is generated for every burst operation on the column boundary.
During reads, comparison logic compares the read pattern with the pattern written, i.e., the
FF, 00, AA, 55, 55 AA, 99, 66 pattern. For example, for an 8-bit design of burst length 4, the
data written for a single write command is FF, 00, AA, 55. During reads, the read pattern is
compared with the FF, 00, AA, 55 pattern. Based on a comparison of the data, a status
signal error is generated. If the data read back is the same as the data written, the error
signal is 0, otherwise it is 1. Comparison logic only compares the DATA bits and not the
ECC data pattern. For example, for a 72-bit ECC design, comparison logic only compares
64 bits. The 8 MSBs (ECC bits) are not compared.
Stage 3:
Continuous read back of
Read data valid calibration
stage 3/4 training pattern
(once per DQS group)
Adjust number of clock cycles to wait
after issuing read command before valid
Read Data Valid calibration
data arrives in FPGA_CLK domain
all DQS
Perform once per DQS group
Stage 4:
DQS Gate Control DQS gate control calibration
calibration for all DQS Adjust IDELAY for DQS gate control
Perform once per DQS group
Calibration Done
UG086_c9_08_091707
The first calibration stage is used to position the DQS in the DQ valid window. This
synchronizes the capture of DQ using DQS in the IDDR flop. A training pattern of 1 for rise
and 0 for fall data is written into the memory and is continuously read back. The DQ and
IDELAYs are adjusted depending upon the DQ to DQS relationship. Per-bit deskew is
performed on the DQ bits.
The second calibration stage is between the DQS and the FPGA clock. This synchronizes
the transfer of data between the IDDR flop and flip-flops located in the FPGA fabric. The
DQ and DQS IDELAY taps are incremented together to align to the FPGA clock domain.
The third calibration stage is the read-enable calibration, which is used to generate a read
valid signal. The memory devices do not provide a signal indicating when the read data is
valid. The read data is delayed by CAS latency, additive latency, the PCB trace, and the I/O
buffer delays. The read-enable calibration is used to determine the delay between issuing
a read command and the arrival of the read data.
The fourth calibration stage is used to align the DQS Gate signal from the controller to the
falling edge of DQS. The DQS Gate controls the clock enable to the DQ IDDRs. It is used to
prevent clocking of invalid data into the IDDR after the read postamble. This can happen
because the DQS is 3-stated by the memory at the end of a read. The DQS can then go into
an indeterminate value, causing false clocking of the IDDR.
Clocking Scheme
After initialization and calibration is done, the controller is signaled to start normal
operation of the design. Now, the controller can start issuing user write and read
commands to the memory.
Clocking Scheme
Figure 9-10, page 378 shows the clocking scheme for this design. Global and local clock
resources are used.
The global clock resources consist of a PLL or a DCM, three BUFGs on PLL/DCM output
clocks, and one BUFG for clk200. The local clock resources consist of regional I/O clock
networks (BUFIO). The global clock architecture is discussed in this section.
The MIG tool allows the user to customize the design such that the PLL/DCM is not
included. In this case, clk0, clk90, clkdiv0 and IDELAYCTRL clock clk200 must be supplied
by the user.
Notes:
1. See “User Interface Accesses,” page 381 for timing requirements and restrictions on the user interface
signals.
BUFG
PLL
GC I/O CLKOUT0 CLK0
SYSTEM CLK
CLKIN
CLKOUT1 CLK90
CLKOUT2 CLKDIV0
CLKFBIN
CLKFBOUT
UG086_c9_14_012709
Figure 9-9: Clocking Scheme for DDR2 Interface Logic Using PLL
DCM
BUFG
GC I/O
SYSTEM CLK
CLK0 CLK0
CLKIN
CLK90 CLK90
UG086_c9_13_071008
Figure 9-10: Clocking Scheme for DDR2 Interface Logic Using DCM
Table 9-6: DDR2 SDRAM Controller System Interface Signals (with a PLL)
Signal Name Direction Description
sys_clk_p, sys_clk_n Input Differential input clock to the PLL/DCM. The DDR2 controller
and memory operate at this frequency. This differential input
clock pair is present only when the DIFFERENTIAL clocks
option is selected in MIG FPGA options page.
sys_clk Input Single-ended input clock to the PLL/DCM. The DDR2 controller
and memory operate at this frequency. This input clock is
present only when the SINGLE_ENDED clocks option is
selected in the MIG FPGA options page.
clk200_p, clk200_n Input 200 MHz differential input clock for the IDELAYCTRL primitive
of Virtex-5 FPGAs. This differential input clock pair is present
only when the DIFFERENTIAL clocks option is selected in
MIG FPGA options page.
idly_clk_200 Input 200 MHz single-ended input clock for the IDELAYCTRL
primitive of Virtex-5 FPGAs. This input clock is present only
when the SINGLE_ENDED clocks option is selected in the
MIG FPGA options page.
sys_rst_n Input Active-Low reset to the DDR2 controller.
Table 9-7: DDR2 SDRAM Controller System Interface Signals (without a PLL)
Signal Direction Description
clk0 Input The DDR2 SDRAM controller and memory operate on this clock.
clk90 Input 90° phase-shifted clock with the same frequency as clk0.
clkdiv0 Input Clocks the memory initialization and read capture timing
calibration state machines in the PHY layer.
clk200 Input 200 MHz differential input clock for the IDELAYCTRL primitive
of Virtex-5 FPGAs.
sys_rst_n Input Active-Low reset to the DDR2 SDRAM controller. This signal is
used to generate the synchronous system reset.
locked Input The status signal indicating whether the PLL is locked or not.
This signal is used to generate the synchronous system reset.
app_af_cmd[2:0] Input 3-bit command to the Virtex-5 FPGA DDR2 SDRAM design.
app_af_cmd = 3’b000 for write command
app_af_cmd = 3’b001 for read command
Other combinations are invalid. Functionality of the controller is
unpredictable for unimplemented commands.
app_af_addr[30:0](2) Input Gives information about the address of the memory location to be
accessed. This bus contains the bank address, the row address, and
the column address.
Column address = app_af_addr[COL_WIDTH-1: 0]
Row address = app_af_addr[ROW_WIDTH+COL_WIDTH–1:
COL_WIDTH]
Bank address =
app_af_addr[BANK_WIDTH+ROW_WIDTH+COL_WIDTH–1:
ROW_WIDTH+COL_WIDTH]
Chip select (3)=
app_af_addr[BANK_WIDTH+ROW_WIDTH+COL_WIDTH]
app_af_wren Input Write enable to the User Address FIFO. This signal should be
synchronized with the app_af_addr and app_af_cmd signals.
app_wdf_data[2*DQ_WIDTH–1:0] Input User input data. It should contain the fall data and the rise data.
Rise data = app_wdf_data[DQ_WIDTH–1: 0]
Fall data = app_wdf_data[2*DQ_WIDTH–1: DQ_WIDTH]
app_wdf_mask_data[2*DM_WIDTH–1: 0] Input User mask data. It should contain the masking information for both
rise and fall data.
Rise mask data = app_wdf_mask_data[DM_WIDTH–1: 0]
Fall mask data = app_wdf_mask_data[2*DM_WIDTH–1:
DM_WIDTH]
app_wdf_wren Input Write enable for the User Write FIFO. This signal should be
synchronized with the app_wdf_data and app_wdf_mask_data
signals.
app_af_afull Output Almost Full status of the Address FIFO. When this signal is asserted,
the user can write 12 more locations into the FIFO.
app_wdf_afull Output Almost Full status of the User Write FIFO. When this signal is
asserted, the user can write 12 more locations into the FIFO.
rd_data_valid Output Status signal indicating read data is valid on the read data bus.
clk0_tb Output Clock output to the user. All user interface signals must be
synchronized with this clock. This signal is sourced from clk0 in the
controller.
Notes:
1. Direction indicated in the table is referenced from the design perspective. For example, input here indicates that the signal is input to the
design.
2. Addressing in Virtex-5 FPGAs is linear addressing (i.e., the row address immediately follows the column address bits, and the bank address
follows the row address bits, thus supporting more devices). The number of address bits used depends on the density of the memory part.
The controller ignores the unused bits, which can all be tied High.
3. For single-rank devices, Chip Select is not taken from the user address (i.e., app_af_addr). Hence, the controller always selects all of
the existing devices. For dual-rank devices, the corresponding device is selected based on the Chip Select value. In other words, for
dual-rank devices, the Chip Select is taken from the user address (i.e., app_af_addr). CS0 is selected for a Chip Select value of 0, and
CS1 is selected for a Chip Select value of 1.
Table 9-9 lists the signals between the user interface and the controller.
that the controller assumes write data is available when it receives the write command
from the user.
2. The clk0_tb signal is connected to clk0 in the controller. If the user clock domain is
different from clk0 / clk0_tb of MIG, the user should add FIFOs for all data inputs and
outputs of the controller in order to synchronize them to the clk0_tb.
Write Interface
Figure 9-11 shows the user interface block diagram for write operations.
app_af_addr
User Interface af_addr
app_af_afull ctrl_af_rden
Write Data
FIFO
app_wdf_data (FIFO36_72) wdf_rden
512 x 72
app_wdf_mask_data wdf_data
ug086_c9_11_122007
The following steps describe the architecture of the Address and Write Data FIFOs and
show how to perform a write burst operation to DDR2 SDRAM from the user interface.
1. The user interface consists of an Address FIFO and a Write Data FIFO. The Write Data
FIFO is constructed using the Virtex-5 FPGA FIFO36_72 primitive with a 512 x 72
configuration. The 72-bit architecture comprises one 64-bit port and one 8-bit port. For
Write Data FIFOs, the 64-bit port is used for data bits and the 8-bit port is used for
mask bits for ECC-disabled designs. Mask bits are available only when supported by
the memory part and when data mask is enabled in the MIG GUI. Some memory parts,
such as Registered DIMMs of x4 parts, do not support mask bits.
2. In ECC-enabled designs, the 64-bit port is used for data bits and the 8-bit port is used
for ECC data. The attributes passed to the Virtex-5 FPGA FIFO36_72 primitive are
different for ECC-enabled designs; attribute EN_ECC_WRITE is set to TRUE for ECC-
enabled designs to enable the generation of ECC data.
3. The Address FIFO is constructed using the Virtex-5 FPGA FIFO36 primitive with a
1024 x 36 configuration. The 36-bit architecture comprises one 32-bit port and one 4-bit
port. The 32-bit port is used for the address (app_af_addr) and the 4-bit port is used for
the command (app_af_cmd).
4. The Address FIFO is common for both Write and Read commands. It comprises an
address part and a command part. Command bits discriminate between write and
read commands.
5. User interface data width app_wdf_data is twice that of the memory data width. For
an 8-bit memory width, the user interface is 16 bits consisting of rising-edge data and
falling-edge data. There is a mask bit for every 8 bits of data. For 72-bit memory data,
the user interface data width app_wdf_data is 144 bits, and the mask data
app_wdf_mask_data is 18 bits.
6. The minimum configuration of the Write Data FIFO is 512 x 72 for a memory data
width of 8 bits. For an 8-bit memory data width, the least-significant 16 bits of the data
port are used for write data and the least-significant two bits of the 8-bit port are used
for mask bits. The controller internally pads all zeros for the most-significant 48 bits of
the 64-bit port and the most-significant 6 bits of the 8-bit port.
7. Depending on the memory data width, MIG instantiates multiple FIFO36_72s to gain
the required width. For designs using 8-bit to 32-bit data width, one FIFO36_72 is
instantiated; for 72-bit data width, a total of three FIFO36_72s are instantiated. The bit
architecture comprises 32 bits of rising-edge data, 4 bits of rising-edge mask, 32 bits of
falling-edge data, and 4 bits of falling-edge mask, which are all stored in a FIFO36_72.
MIG routes app_wdf_data and app_wdf_mask_data to FIFO36_72s accordingly.
8. The user can initiate a write to memory by writing to the Address FIFO and the Write
Data FIFO when the FIFO full flags are deasserted. Status signal app_af_afull is
asserted when the Address FIFO is full; similarly, app_wdf_afull is asserted when the
Write Data FIFO is full.
9. At power on, both the Address FIFO and Write Data FIFO full flags are deasserted.
10. The user should assert Address FIFO write-enable signal app_af_wren along with
address app_af_addr and command app_af_cmd to store the address and command
into Address FIFO.
11. The user data should be synchronized to the clk_tb clock. The user should assert the
Data FIFO write-enable signal app_wdf_wren along with write data app_wdf_data
and mask data app_wdf_mask_data to store the write data and mask data into the
Write Data FIFOs. The user should provide both rising-edge and falling-edge data
together for each write to the Data FIFO. The Virtex-5 FPGA DDR2 SDRAM controller
design supports byte-wise masking of data only.
12. The write command should be given by keeping app_af_cmd = 3'b000 and asserting
app_af_wren. Address information is given on the app_af_addr signal. Address and
command information is written into the User Address FIFO.
13. After the completion of the initialization and calibration process and when the User
Address FIFO empty signal is deasserted, the controller reads the Command and
Address FIFO and issues a write command to the DDR2 SDRAM.
14. The write timing diagram in Figure 9-12 is derived from the MIG-generated testbench
for a burst length of 4. As shown, each write to the Address FIFO should have two
writes to the Data FIFO. The phy_init_done signal indicates memory initialization and
calibration completion.
clk_tb
reset_tb
app_wdf_afull
app_af_afull
phy_init_done
app_wdf_wren
app_af_wren
app_af_addr A0 A1 A2 A3
Figure 9-12: DDR2 SDRAM Write Burst for Four Bursts (BL = 4)
Read Interface
Figure 9-13 shows the block diagram of the read interface.
app_af_cmd
af_empty
Address FIFO
Controller
(FIFO16)
app_af_wren 1024 x 36 ctrl_af_rden
app_af_afull
rd_data_fifo_out
wdf_almost_full
ug086_c9_12_122007
The following steps describe the architecture of the Read Data FIFO and show how to
perform a read burst operation from DDR2 SDRAM from user interface.
1. The Read Data FIFOs are constructed using the Virtex-5 FPGA FIFO36_72 primitive
with a 512 x 72 configuration for ECC-enabled designs. For non-ECC designs, read
data is latched using the flops.
2. In ECC-enabled designs, the 64-bit port is used for data bits and the 8-bit port is used
for ECC data. The Virtex-5 FPGA FIFO36_72 performs ECC comparison when the
attribute EN_ECC_READ is set during read operation. MIG instantiates the FIFOs
appropriately for ECC or non-ECC designs.
3. The user can initiate a read to memory by writing to the Address FIFO when the FIFO
full flag app_af_afull is deasserted.
4. To write the read address and read command into the Address FIFO, the user should
issue the Address FIFO write-enable signal app_af_wren along with read address
app_af_addr and app_af_cmd is the command (set to 001 for a read command).
5. The controller reads the Address FIFO and generates the appropriate control signals to
memory. After decoding app_af_cmd, the controller issues a read command to the
memory at the specified address.
6. Prior to the actual read and write commands, the design calibrates the latency in
number of clock cycles from the time the read command is issued to the time the data
is received. Using this precalibrated delay information, the controller stores the read
data in the Read Data FIFOs.
7. The read_data_valid signal is asserted when data is available in the Read Data FIFOs.
8. When the calibration is completed, the controller generates the control signals to
capture the read data from the FIFO according to the CAS latency selected by the user.
The rd_data_valid signal is asserted when the read data is available to the user, and
rd_data_fifo_out is the read data from the memory to the user.
9. Figure 9-14 shows the user interface timing diagram for burst length of four.
clk_tb
app_af_afull
app_af_wren
app_af_addr A0 A1 A2 A3
rd_data_valid
Figure 9-14: DDR2 SDRAM Read Burst (BL = 4) for Four Bursts
Read latency is defined as the time between when the read command is written to the user
interface bus until when the corresponding first piece of data is available on the user
interface bus (see Figure 9-14).
When benchmarking read latencies, it is important to specify the exact conditions under
which the measurement occurs.
Read latency varies based on the following parameters:
• Number of commands already in the FIFO pipeline before the read command is
issued
• Whether an ACTIVATE command needs to be issued to open the new bank/row
• Whether a PRECHARGE command needs to be issued to close a previously opened
bank
• Specific timing parameters for the memory, such as TRAS and TRCD in conjunction
with the bus clock frequency
• Commands can be interrupted, and banks/rows can forcibly be closed when the
periodic AUTO REFRESH command is issued
• CAS latency
• Board-level and chip-level (for both memory and FPGA) propagation delays
Table 9-10 and Table 9-11 show read latencies for the Virtex-5 FPGA DDR2 interface for
two different conditions. Table 9-10 shows the case where a row activate is not required
prior to issuing a read command on the DDR bus. This situation is possible, for example,
when bank management is enabled, and the read targets an already opened bank.
Table 9-11 shows the case when a read results in a bank/row conflict. In this case, a
precharge of the previous row must be followed by an activation of the new row, which
increases read latency. Other specific conditions are noted in the footnotes for each table.
Notes:
1. Test conditions: Clock frequency = 333 MHz, CAS latency = 5, DDR2 -3E speed grade device.
2. Access conditions: Read to an already open bank/row is issued to an empty control/address FIFO.
3. Some entries have fractional clock cycles because the inverted version of CLK0 is used to drive the
DDR2 memory.
4. The Virtex-5 FPGA DDR2 interface uses a FIFO36 for the address/control FIFO. It is possible to
shorten the READ command to empty signal deassertion latency by implementing the FIFO as a
distributed RAM FIFO or removing the FIFO altogether, as the application requires.
Notes:
1. Test conditions: Clock frequency = 333 MHz, CAS latency = 5, DDR2 -3E speed grade device.
2. Access conditions: Read that results in a bank/row conflict is issued to an empty control/address
FIFO. This requires that the previous bank/row be closed first.
3. Some entries have fractional clock cycles because the inverted version of CLK0 is used to drive the
DDR2 memory.
4. The Virtex-5 FPGA DDR2 interface uses a FIFO36 for the address/control FIFO. It is possible to
shorten the READ command to empty signal deassertion latency by implementing the FIFO as a
distributed RAM FIFO or removing the FIFO altogether, as the application requires.
Note: Timing has been verified for most of the MIG generated configurations. For the best timing
results, adjacent banks in the same column of the FPGA should be used. Banks that are separated
by unbonded banks should be avoided because these can cause timing violations.
Supported Devices
The design generated out of MIG is independent of memory package, hence the package
part of the memory component is replaced with XX or XXX, where XX or XXX indicates a
don't care condition. The tables below list the components (Table 9-8) and DIMMs
(Table 9-11) supported by the tool for the DDR2 design. In supported devices, X in the
components column denotes a single alphanumeric character. For example,
MT47H128M4XX-3 can be either MT47H128M4BP-3 or MT47H128M4B6-3. XX for
Registered DIMMs denotes a single or two alphanumeric characters. For example,
MT9HTF3272XX-667 can be either MT9HTF3272Y-667 or MT9HTF3272DY-667. See
Appendix G, “Low Power Options.”
Table 9-14: Supported Registered DIMMs for DDR2 SDRAM (Virtex-5 FPGAs)
MT9HTF3272Y-667 MT18HTF12872Y-40E
MT9HTF3272PY-667 MT18HTF12872PY-40E
MT9HTF3272Y-53E MT18HTF25672Y-667
MT9HTF3272PY-53E MT18HTF25672PY-667
MT9HTF3272Y-40E MT18HTF25672Y-53E
MT9HTF3272PY-40E MT18HTF25672PY-53E
MT9HTF6472Y-667 MT18HTF25672Y-40E
Table 9-14: Supported Registered DIMMs for DDR2 SDRAM (Virtex-5 FPGAs)
MT9HTF6472PY-667 MT18HTF25672PY-40E
MT9HTF6472Y-53E MT18HTF6472DY-667
MT9HTF6472PY-53E MT18HTF6472PDY-667
MT9HTF6472Y-40E MT18HTF6472DY-53E
MT9HTF6472PY-40E MT18HTF6472PDY-53E
MT9HTF12872Y-667 MT18HTF6472DY-40E
MT9HTF12872PY-667 MT18HTF6472PDY-40E
MT9HTF12872Y-53E MT18HTF12872DY-667
MT9HTF12872PY-53E MT18HTF12872PDY-667
MT9HTF12872Y-40E MT18HTF12872DY-53E
MT9HTF12872PY-40E MT18HTF12872PDY-53E
MT18HTF6472G-53E MT18HTF12872DY-40E
MT18HTF6472Y-667 MT18HTF12872PDY-40E
MT18HTF6472PY-667 MT18HTF25672DY-667
MT18HTF6472Y-53E MT18HTF25672PDY-667
MT18HTF6472PY-53E MT18HTF25672DY-53E
MT18HTF6472Y-40E MT18HTF25672PDY-53E
MT18HTF6472PY-40E MT18HTF25672DY-40E
MT18HTF12872Y-667 MT18HTF25672PDY-40E
MT18HTF12872PY-667 MT36HTJ51272Y-667
MT18HTF12872Y-53E MT36HTJ51272Y-53E
MT18HTF12872PY-53E MT36HTJ51272Y-40E
MT36HTF51272Y-667 MT36HTF51272Y-53E
MT36HTF51272Y-40E --
DDR2 PPC440
Supported Features
• Supports a maximum performance of 333 MHz in the fastest speed grade
• Supports 16-bit, 32-bit, and 64-bit data widths, and 72-bit data width with ECC
(DQ:DQS = 8:1)
• Supports DDR2 SDRAM single-rank registered DIMMs, SODIMMs, UDIMMs, and
components. RDIMMs support a maximum of 9 loads. SODIMMs and UDIMMs
support a maximum of 4 loads.
• Supports the following DDR2 SDRAM features:
• CAS latencies (3, 4, 5)
• Additive latencies (0, 1, 2, 3, 4)
• On-die termination (ODT)
• Burst lengths (4, 8)
• Supports bank management (up to four banks open)
Unsupported Features
• GUI options
• Verify UCF and Update Design and UCF
• Data mask
• Two bytes per bank
• System clock type
• Dual Rank DIMMs
• Multi controllers
DDR2 PPC440
Introduction
By selecting the PPC440 option the MIG tool will output a UCF that is optimal for the
PowerPC440 in the selected Virtex-5 FXT device. This limits the location of the memory
interface to the banks adjacent to the PPC440 hard block. This also limits the supported
memory interface widths to 16-bit, 32-bit, and 64-bit data widths, and 72-bit data width
with ECC. And the only supported DQ:DQS ratio is 8:1.
• XC5VFX100T-FF1136
• XC5VFX100T-FF1738
• XC5VFX130T-FF1738
• XC5VFX200T-FF1738
Chapter 10
Feature Summary
This section summarizes the supported and unsupported features of the QDRII controller
design.
Supported Features
The QDRII controller design supports the following:
• A maximum frequency of 300 MHz
• 18-bit, 36-bit, and 72-bit data widths
• Burst lengths of four and two
• Implementation using different Virtex-5 devices
• Support for DCI Cascading
• Operation with 18-bit and 36-bit memory components
• Verilog and VHDL
• With and without a testbench
• With and without a PLL
Unsupported Features
The QDRII controller design does not support:
• 9-bit data widths
• 9-bit memory components
Architecture
Figure 10-1 shows a top-level block diagram of the QDRII memory controller. One side of
the QDRII memory controller connects to the user interface denoted as User Interface. The
other side of the controller interfaces to QDRII memory. The memory interface data width
is selectable from MIG.
Virtex-5 FPGA
QDRII
QDRII
Memory
Memory
Controller
User
Interface
UG086_c10_01_070506
The QDR operation can support double data rated read and write operations through
separate data output and input ports with the same cycle. Memory bandwidth is
maximized because data can be transferred into SRAM on every edge of the clock and
transferred out of SRAM on every edge of the read clock. Independent read and write ports
eliminate the need for high-speed bus turnaround.
Read and write addresses are latched on positive edges of the input clock K. A common
address bus is used to access the addresses for both read and write operations. The key
advantage to QDRII devices is they have separate data buses for reads and writes to
SRAM.
Interface Model
The QDRII memory interface is layered to simplify the design and make the design
modular. Figure 10-2 shows the layered memory interface in the QDRII memory controller.
The two layers are the application layer and the implementation and physical layer.
User Interface
Clocks and
Datapath Control
Reset
UG086_c10_02_071206
The application layer creates the user interface, which initiates memory writes and reads
by writing data and memory addresses to the User Interface FIFOs.
The implementation and physical layer comprises:
Architecture
Hierarchy
Figure 10-3 shows the hierarchical structure of the QDRII SRAM design generated by MIG
with a testbench and a PLL.
<top_
module>
qdrii_ qdrii_
qdrii_ qdrii_
idelay_ tb_top
top infrastructure
ctrl
Design Modules
Test Bench Modules
Clocks and Reset Generation Modules
UG086_c10_03_012709
Figure 10-3: Hierarchical Structure of the Virtex-5 FPGA QDRII SRAM Design
Architecture
Figure 10-4 shows a top-level block diagram of a QDRII SRAM design with a PLL and a
testbench. The sys_clk_p and sys_clk_n pair are differential input system clocks. “Clocking
Scheme,” page 410 describes how various clocks are generated using the PLL. The PLL is
instantiated in the infrastructure module that generates the required design clocks.
dly_clk_200_p and dly_clk_200_n are used for the idelay_ctrl element. Sys_rst_n is an
active-Low system reset signal. All design resets are generated using the sys_rst_n signal,
the locked signal, and the dly_ready signal of the IDELAYCTRL element. The
compare_error output signal indicates whether the design passes or fails. The testbench
module called “tb_top” generates the user interface data, address, and command signals.
The user data bits and address bits are stored in the corresponding User Interface FIFOs.
The compare_error signal is driven High on data mismatches. The cal_done signal
indicates the completion of initialization and calibration of the design.
clk200
idelay_ctrl_rdy
user_rst_200 idelay_ctrl
dly_clk_200_p
dly_clk_200_n qdr_r_n
System
Clocks sys_clk_p clk270 qdr_w_n
Infrastructure
and Reset sys_clk_n clk180 qdr_bw_n
sys_rst_n clk0 qdr_dll_off_n
user_rst_270 qdr_sa
user_rst_180 qdr_d
user_rst_0 qdr_k Memory
qdr2_top qdr_k_n Device
qdr_c
qdr_c_n
qdr_cq
qdr_cq_n
qdr_q
cal_done
Status
Signals
tb_top
compare_error
UG086_c10_04_091707
Figure 10-4: Top-Level Block Diagram of the QDRII SRAM Design with a PLL and a Testbench
Figure 10-5 shows a top-level block diagram of a QDRII SRAM design without a PLL but
with a testbench. The user should provide all the clocks and the locked signal. “Clocking
Scheme,” page 410 explains how to generate the design clocks from the user interface.
These clocks should be single-ended. sys_rst_n is the active-Low system reset signal. All
design resets are generated using the sys_rst_n signal, the locked signal, and the dly_ready
signal of the IDELAYCTRL element. The user application must have a PLL/DCM
primitive instantiated in the design, and all user clocks should be driven through BUFGs.
The compare_error output signal indicates whether the case passes or fails. The testbench
module called “tb_top” generates the user interface data, address, and command signals.
The user data bits and address bits are stored in the corresponding User Interface FIFOs
The compare_error signal is driven High on data mismatches. The cal_done signal
indicates the completion of initialization and calibration of the design.
clk200 clk200
idelay_ctrl_rdy
idelay_ctrl
user_rst_200
User
Clocks and
System
clk0 Infrastructure clk180 qdr_r_n
Reset
clk270 qdr_w_n
user_rst_270
locked qdr_bw_n
user_rst_180
sys_rst_n qdr_dll_off_n
user_rst_0
qdr_sa
qdr_d
Memory
qdr2_top qdr_k
Device
qdr_k_n
qdr_c
qdr_c_n
qdr_cq
qdr_cq_n
cal_done
qdr_q
Status
Signals
tb_top
compare_error
UG086_c10_05_012709
Figure 10-5: Top-Level Block Diagram of the QDRII SRAM Design without a PLL but with a Testbench
Figure 10-6, page 401 shows a top-level block diagram of a QDRII SRAM design with a
PLL but without a testbench. The sys_clk_p and sys_clk_n pair are differential input
system clocks. “Clocking Scheme,” page 410 describes how various clocks are generated
using the PLL. The PLL is instantiated in the infrastructure module that generates the
required design clocks. dly_clk_200_p and dly_clk_200_n are used for the idelay_ctrl
element. Sys_rst_n is an active-Low system reset signal, and all design resets are generated
using the sys_rst_n signal, the locked signal, and the dly_ready signal of the IDELAYCTRL
element. The user has to drive the user application signals. The design provides the clk0_tb
and user_rst_0_tb signals to the user in order to synchronize the user application signals
Architecture
with the design. The signal clk0_tb is connected to clock clk0 in the controller. If the user
clock domain is different from clk0/clk0_tb, the user should add FIFOs for all the input
and outputs of the controller (user application signals) in order to synchronize them to
clk0_tb.The cal_done signal indicates the completion of initialization and calibration of the
design.
clk200
idelay_ctrl_rdy
user_rst_200 idelay_ctrl
dly_clk_200_p
dly_clk_200_n
System
Clocks sys_clk_p clk270
Infrastructure
and Reset sys_clk_n clk180
sys_rst_n clk0
user_rst_270
user_rst_180
user_rst_0 qdr_r_n
qdr_w_n
qdr_bw_n
qdr_dll_off_n
clk0_tb
qdr_sa
user_rst_0_tb
qdr_d
user_wr_full Memory
qdr_k
user_rd_full qdr2_top Device
qdr_k_n
user_qr_valid qdr_c
user_qrl qdr_c_n
user_qrh qdr_cq
User cal_done qdr_cq_n
Interface
user_ad_w_n
Signals qdr_q
user_d_w_n
user_r_n
user_dwl
user_dwh
user_bwl_n
user_bwh_n
user_ad_wr
user_ad_rd
UG086_c10_06_091707
Figure 10-6: Top-Level Block Diagram of the QDRII SRAM Design with a PLL but without a Testbench
Figure 10-7, page 402 shows a top-level block diagram of a QDRII SRAM design without a
PLL or a testbench. The user should provide all the clocks and the locked signal. “Clocking
Scheme,” page 410 explains how to generate the design clocks from the user interface.
These clocks should be single-ended. sys_rst_n is the active-Low system reset signal. All
design resets are generated using the sys_rst_n signal, the locked signal, and the dly_ready
signal of the IDELAYCTRL element. The user application must have a PLL/DCM
primitive instantiated in the design, and all user clocks should be driven through BUFGs.
The user has to drive the user application signals. The design provides the clk0_tb and
user_rst_0_tb signals to the user in order to synchronize the user application signals with
the design. The clk0_tb signal is connected to the clk0 clock in the controller. If the user
clock domain is different from clk0/clk0_tb, the user should add FIFOs for all the inputs
and outputs of the controller (user application signals) in order to synchronize them to
clk0_tb. The cal_done signal indicates the completion of initialization and calibration of the
design.
clk200
idelay_ctrl_rdy
idelay_ctrl
user_rst_200
User
Clocks and
System
Reset clk0 Infrastructure clk180
clk270 user_rst_270
locked user_rst_180
sys_rst_n user_rst_0
qdr_r_n
qdr_w_n
qdr_bw_n
qdr_dll_off_n
clk0_tb
qdr_sa
user_rst_0_tb
qdr_d
user_wr_full Memory
qdr2_top qdr_k
user_rd_full Device
qdr_k_n
user_qr_valid
user_qrl qdr_c
user_qrh qdr_c_n
User
Interface qdr_cq
cal_done
Signals qdr_cq_n
user_ad_w_n
user_d_w_n qdr_q
user_r_n
user_dwl
user_dwh
user_bwl_n
user_bwh_n
user_ad_wr
user_ad_rd
UG086_c10_07_012709
Figure 10-7: Top-Level Block Diagram of the QDRII SRAM Design without a PLL or a Testbench
Architecture
user_bwl_n
user_bwh_n
Write Path qdr_bw_n
user_dwl
qdr_d
user_dwh
user_wr_full qdr_k
user_rd_full qdr_k_n
user_qr_valid clk0 Delay qdr_dll_off_n
Calibration
State Machine
UG086_c10_08_091707
Controller
The QDRII memory controller initiates alternate WRITE and READ commands to the
memory as long as the User Write Address FIFO and the User Read Address FIFO are not
empty.
The user writes the write data, its corresponding byte write enable, and the Write Address
bits into the User Write Data FIFOs, the User Byte Write FIFO, and the User Write Address
FIFOs, respectively. When the User Write Address FIFO is not empty, the QDRII controller
generates a write-enable signal to the memory. When the write enable is asserted, the write
data, the byte write enable, and the write address bits are transferred to memory from the
User Write Data FIFOs, the User Byte Write FIFO, and the User Write Address FIFO,
respectively.
The read address from where the data is to be read from the memory is stored by the user
in the User Read Address FIFO. The QDRII memory controller generates a read-enable
signal to the memory when the User Read Address FIFO is not empty. When the read
enable is asserted, the read address from the Read Address FIFO is transferred to memory.
When the read data from the memory corresponding to the read address is captured
correctly, a valid user_qr_valid signal is asserted High. The user can access the read data
corresponding to the read address only when the data valid signal user_qr_valid is
asserted High.
Figure 10-9 shows a state machine of the QDRII memory controller for burst lengths of
four. When calibration is complete (that is, when the cal_done signal is asserted), the state
machine is in the IDLE state. When the User Write Address FIFO is not empty (that is,
when the user has written the write data, the byte write enable, and the write address bits
into their corresponding FIFOs, respectively), the state machine goes to the WRITE state,
initiating a memory write of one burst.
IDLE
RD WR
READ WR WRITE
R_n=0 R_n=1
W_n=1 RD W_n=0
UG086_c10_09_013107
Figure 10-9: QDRII Memory Controller State Machine with Burst Lengths of 4
When the User Read Address FIFO is not empty (that is, the user has written read address
bits into the User Read Address FIFO), the state machine goes to the READ state, initiating
a memory read of one burst.
From the IDLE state, the QDRII memory controller can go to either the WRITE or the
READ state depending on the status of the User FIFOs. Writes are given priority. In the
WRITE state, a memory write is initiated, and the User Read Address Not Empty status is
checked in order to transfer into the READ state. When the User Read Address FIFO is
empty, the state machine goes to the IDLE state.
In the READ state, a memory read is initiated, and the User Write Address FIFO Not
Empty status is checked before going to the WRITE state. If the User Address FIFO is
empty, the state machine goes to the IDLE state.
IDLE
WRITE_
READ
W_n=0
R_n=0
UG086_c10_14_122007
Figure 10-10: QDRII Memory Controller State Machine with Burst Lengths of 2
Figure 10-10 shows a state machine of the QDR II memory controller for burst lengths of
two when the FIFO user interface is used. When calibration is complete, the state machine
Architecture
is in the IDLE state. When the User Write Address FIFO is not empty (that is, when the user
has written the write data, the byte write enable, and the write address bits into their
corresponding FIFOs), the state machine goes to the WRITE_READ state, initiating a
memory write of one complete burst. When the User Read Address FIFO is not empty (that
is, the user has written read address bits into the User Read Address FIFO), the state
machine goes to the READ_WRITE state, initiating a memory read of one complete burst.
From the IDLE state, the QDR II memory controller goes to WRITE_READ state if either:
• the User Write Address FIFO is not empty, or
• the User Read Address FIFO is not empty.
In the WRITE_READ state, the User Read Address Not Empty status is checked to initiate
a memory read. To initiate a memory write in the WRITE_READ state, the User Write
Address FIFO not empty status is checked. If both the User Write Address FIFO and the
User Read Address FIFO are empty, the state machine goes to the IDLE state. If either the
User Write Address FIFO or the User Read Address FIFO is not empty, the state machine
remains in the WRITE_READ state to issue memory writes or reads.
Refer to XAPP853 [Ref 26] for data capture techniques and timing analysis of the QDRII
memory controller module.
Infrastructure
The infrastructure module generates the design clocks and reset signals. When differential
clocking is used, sys_clk_p, sys_clk_n, clk_200_p, and clk_200_n signals appear. When
single-ended clocking is used, sys_clk and idly_clk_200 signals appear. In addition, clocks
are available for design use and a 200 MHz clock is provided for the IDELAYCTRL
primitive. Differential and single-ended clocks are passed through global clock buffers
before connecting to a PLL/DCM. For differential clocking, the output of the
sys_clk_p/sys_clk_n buffer is single-ended and is provided to the PLL/DCM input.
Likewise, for single-ended clocking, sys_clk is passed through a buffer and its output is
provided to the PLL/DCM input. The outputs of the PLL/DCM are 180° and 270° phase-
shifted versions of the input clock. After the PLL/DCM is locked, the design is in the reset
state for at least 25 clocks. The infrastructure module also generates all of the reset signals
required for the design.
PLL/DCM
In MIG 3.0 and later, the DCM is replaced with a PLL for all Virtex-5 FPGA designs. If the
user selects a design with a PLL in the GUI, the infrastructure module will have both PLL
and DCM codes. The CLK_GENERATOR parameter enables either a PLL or a DCM in the
infrastructure module. The CLK_GENERATOR parameter is set to PLL by default. If the
user wants to use DCM, this parameter should be changed manually to DCM.
Idelay_ctrl
This module instantiates the IDELAYCTRL primitive of the Virtex-5 FPGA. The
IDELAYCTRL primitive is used to continuously calibrate the individual delay elements in
its region to reduce the effect of process, temperature, and voltage variations. A 200 MHz
clock has to be fed to this primitive.
MIG uses the “automatic” method for IDELAYCTRL instantiation in which the MIG HDL
only instantiates a single IDELAYCTRL for the entire design. No location (LOC)
constraints are included in the MIG-generated UCF. This method relies on the ISE® tools to
replicate and place as many IDELAYCTRLs as needed (for example, one per clock region
that uses IDELAYs). Replication and placement are handled automatically by the software
tools if IDELAYCTRLs have same refclk, reset, and rdy nets. A new constraint called
IODELAY_GROUP associates a set of IDELAYs with an IDELAYCTRL and allows for
multiple IDELAYCTRLs to be instantiated without LOC constraints specified. ISE software
generates the IDELAY_CTRL_RDY signal by logically ANDing the RDY signals of every
IDELAYCTRL block.
The IODELAY_GROUP name should be checked in the following cases:
• The MIG design is used with other IP cores or user designs that also require the use of
IDELAYCTRL and IDELAYs.
• Previous ISE software releases 8.2.03i and 9.1i had an issue with IDELAYCTRL block
replication and trimming. When using these revisions of the ISE software, the user
must instantiate and constrain the location of each IDELAYCTRL individually.
See UG190 [Ref 10] for more information on the requirements of IDELAYCTRL placement.
top_phy
This module is the interface between the controller and the memory. It consists of the
following:
• Control logic that generates READ/WRITE commands and address signals to the
memory.
• Write Data logic that associates the write data, the byte enable, and the write address
with the WRITE commands and the read address with the READ commands. It also
generates the write data pattern for calibration purposes.
• Read Data logic that comprises the read data capturing scheme and calibration logic.
Architecture
Multicontrollers
MIG supports multicontrollers for QDRII SRAMs and multiple interfaces for QDRII
SRAMs and DDR2 SDRAMs. Up to eight controllers are supported. In multicontroller and
multiple interface designs, every controller can have a different frequency. The number of
controllers that can have different frequencies is limited by the number of PLLs available in
the selected FPGA. For example, a total of six PLL resources are available for the
XC5VLX50 device, so a maximum of six controllers can have different frequencies. Even
though the number of controllers selected in the GUI is eight, a maximum of six controllers
can have different frequencies. Thus, for the remaining two controllers, the user should
select one of the already selected frequencies. Refer to the Virtex-5 FPGA User Guide
[Ref 10] for PLL resources available for various devices.
For a single controller design, all memory signals and user interface signals appear as
shown in Figure 10-8, page 403 and Table 10-5, page 414 based on the selected part. For a
multicontroller design, all memory signal names and user interface signal names are
prepended with the controller number. For example, for a two controller design (two
QDR2 controllers), the qdr_d port appears as c0_ qdr_d and c1_ qdr_d. A similar naming
convention is followed for the design parameters. Some parameters such as
HIGH_PERFORMANCE_MODE, CLK_TYPE, and RST_ACT_LOW are common for all
the controllers and do not have the controller number prepended.
DCI Cascading
In Virtex-5 family devices, I/O banks that need DCI reference voltage can be cascaded with
other DCI I/O banks. One set of VRN/VRP pins can be used to provide reference voltage
to several I/O banks. With DCI cascading, one bank (the master bank) must have its
VRN/VRP pins connected to external reference resistors. Other banks in the same column
(slave banks) can use DCI standards with the same impedance as the master bank, without
connecting the VRN/VRP pins on these banks to external resistors. DCI impedance control
in cascaded banks is received from the master bank. This results in more usable pins and in
reduced power usage because fewer VR pins and DCI controllers are used.
The syntax for representing the DCI Cascading in the UCF is:
CONFIG DCI_CASCADE = "<master> <slave1> <slave2> ...";
There are certain rules that need to be followed in order to use DCI Cascade option:
1. The master and slave banks must all reside on the same column (left, center, or right)
on the device.
2. Master and slave banks must have the same VCCO and VREF (if applicable) voltages.
This feature enables placing all 36 bits of read data, as well as the CQ and CQ# clocks, in the
same bank when interfacing with 36-bit QDRII components.
MIG supports DCI Cascading. Following are the possibilities for generating the designs
with DCI support using the DCI Cascade option.
• For x36 component designs, the DCI Cascade option is always enabled. This feature
cannot be disabled if DCI support is needed.
• For x18 component designs, DCI Cascade is optional. DCI support for these designs
can be selected with or without the DCI Cascade selection.
• For x18 component with 18-bit data width designs, the DCI Cascade option is
disabled and cannot be utilized.
When DCI Cascade option is selected, MIG displays the master bank selection box for each
column of the FPGA in the bank selection page.
• If an FPGA has no banks or has only non-DCI banks in a particular column, the
master bank selection box for that column is not displayed.
• All the data read banks are treated as slave banks.
• When a data read bank is selected in a particular column, the master bank selection
box for that particular column is activated and the rest of the master bank selection
boxes for other columns are deactivated.
• In a particular column, when a data read bank is selected and there are no DCI banks
left in that column for master banks selection, then the design cannot be generated.
The data read banks must be moved to the other columns in order to select the master
banks.
• The master bank selection box shows all the bank numbers in that particular column
other than the data read banks and non-DCI banks in that column.
• There can be only one master bank selected for each column of banks.
• MIG utilizes VRN/VRP pins in the slave banks for pin allocation.
• For each master, VRN/VRP pins are reserved. When a selected master bank does not
have any data read pins then a dummy input pin called masterbank_sel_pin is
allocated and assigned the HSTL_I_DCI_18 I/O standard.
• The dummy input pin is required to satisfy the requirement of the master bank. Any
master bank should have at least one input pin to program the DCI option, and the
I/O standard of the master and slave banks should be the same.
• When all the banks in a particular column are allocated with data read pins, MIG
chooses only the required banks for data read pins allocation depending upon the
design data width. When there is only one bank allocation for data read pins in a
column of banks of an FPGA, then that particular data read bank should not be
selected as a master bank. Doing so would result in an inappropriate DCI Cascade
syntax in the UCF of the generated design.
The center column banks of all the FPGAs are divided into two sections, top-column banks
and bottom-column banks. Top-column banks are the banks available above the 0th bank,
and the bottom column banks are the banks available below 0th bank. Therefore, there are
two master bank selection boxes for the center column.
The VRN/VRP pins for a master bank do not need to be reserved in the reserve pins page.
Once the design is ready with the valid master and slave bank selection, the same master
and slave bank information (along with the DCI Cascading syntax) is provided in the UCF
when the design is generated.
For more information about DCI Cascade, refer to DCI Cascading in the Virtex-5 FPGA
User Guide [Ref 10] and the Xilinx® Constraints Guide.
CQ/CQ_n Implementation
Controller uses CQ and CQ_n for capturing read data of a 36-bit component. CQ and CQ_n
are placed on the P pins of the clock-capable I/Os. For a 36-bit component, CQ is used to
capture the first 18 bits of the read data, and CQ_n is used to capture the second 18 bits of
the read data. For an 18-bit component, only CQ is used for capturing the read data. CQ_n
is not used, and it is connected to a dummy logic. This dummy logic is used just to retain
CQ_n pin during PAR. Users can use the CQ_n pin if needed.
Pinout Considerations
It is recommended to select banks within the same column in MIG. This helps to avoid the
clock tree skew that the design would incur while crossing from one column to another.
When the Data Read, Data Write, Address, and System Control pins are allocated to
individual banks in a column, then the System Control pins must be allocated in a bank
that is central to the rest of banks allocated. This helps reduce datapath and clock path
skew.
For larger FPGAs (for example, FF1738, FF1760, and similar), it is recommended to place
Data Read, Data Write, Address, and System Control pins in the same column to reduce
datapath and clock path skew.
Test Bench
MIG generates two RTL folders, example_design and user_design. The example_design
includes the synthesizable test bench, while user_design does not include the test bench
modules. The MIG test bench performs one write command followed by one read
command in an alternating manner for designs with a burst length of 4. For a burst length
of 2, the test bench performs one write command and one read command in the same clock
and repeats one write and one read command continuously. The number of words in a
write command depends on the burst length. For a burst length of 4, the test bench writes
a total 4 data words for a single write command (2 rise data words and 2 fall data words).
For a burst length of 2, the test bench writes a total of 2 data words. The data pattern is an
incremental pattern. On every write command, the data pattern is incremented by one, and
this is repeated with each subsequent write command. The initial data pattern for the first
write command is 000. The test bench writes the 000, 001, 002, 003 data pattern in a
sequence in which 000 and 002 are rise data words, and 001 and 003 are fall data words
for a 9-bit design. The falling edge data is always rising edge data plus one. For a burst
length of 2, the data sequence for the first write command is 000, 001. The data sequence
for the second write command is 002, 003. The pattern is then incremented for the next
write command. For data widths greater than 9, the same data pattern is concatenated for
the other bits. For a 36-bit design and a burst length of 4, the data pattern for the first write
command is 000000000, 008040201, 010080402, 0180C0603.
Address generation logic generates the address in an incremental pattern for each write
command. The same address location is repeated for the next read command. In Samsung
components, the burst address increments are done by the memory, so the address is
generated by the test bench in a linear incremental pattern. In Cypress parts, the MIG test
bench increments the address for burst operation. After the address reaches the maximum
value, it rolls back to the initial address, i.e., 00000.
During reads, comparison logic compares the read pattern with the pattern written, i.e., the
000, 001, 002, 003 pattern. For example, for a 9-bit design of burst length 4, the data
written for a single write command is 000, 001, 002, and 003. During reads, the read
pattern is compared with the 000, 001, 002, 003pattern. Based on a comparison of the
data, a status signal error is generated. If the data read back is the same as the data written,
the error signal is 0, otherwise it is 1.
Clocking Scheme
Figure 10-12, page 412 shows the clocking scheme for this design. Global and local clock
resources are used.
The global clock resources consist of a PLL or a DCM, two BUFGs on PLL/DCM output
clocks, and one BUFG for clk200. The local clock resources consist of regional I/O clock
networks (BUFIO). The global clock architecture is discussed in this section.
The MIG tool allows the user to customize the design such that the PLL/DCM is not
included. In this case, system clocks clk0 and clk270, and IDELAYCTRL clock clk200 must
be supplied by the user.
Clocking Scheme
Notes:
1. See “User Interface Accesses,” page 415 for timing requirements and restrictions on the user interface
signals.
CLKOUT1 clk270
CLKFBIN
CLKFBOUT
UG086_c10_22_012709
Figure 10-11: Clocking Scheme for QDRII Interface Logic Using PLL
DCM
BUFG
GC I/O
SYSTEM CLK
CLK0 clk0
CLKIN
UG086_c10_21_012709
Figure 10-12: Clocking Scheme for QDRII Interface Logic Using DCM
Table 10-4: QDRII SRAM System Interface Signals (without a PLL) (Cont’d)
Signal Name Direction Description
sys_rst_n Input Reset to the QDRII memory controller
compare_error Output This signal represents the status of the comparison between the read
data with the corresponding write data.
cal_done Output This signal is asserted when the design initialization and calibration is
complete.
Table 10-5: QDRII SRAM User Interface Signals (without a Testbench [user_design])
Signal Name Direction Description
user_wr_full Output This signal indicates the User Write FIFO status. It is asserted
when either the User Write Address FIFO or the User Write
Data FIFO is full. When this signal is asserted, any writes to
the User Write Address FIFO and the User Write Data FIFO
are invalid, possibly leading to controller malfunction.
user_rd_full Output This signal indicates the User Read Address FIFO status. It is
asserted when the User Read Address FIFO is full. When this
signal is asserted, any writes to the User Read Address FIFO
are ignored.
user_qr_valid Output This status signal indicates that data read from the memory
is available to the user.
clk0_tb Output All user interface signals are to be synchronized to this clock.
user_rst_0_tb Output This reset is active until the PLL/DCM is not locked.
user_dwl [(DATA_WIDTH-1):0] Input Positive-edge data for memory writes. This data bus is valid
when user_d_w_n is asserted.
user_dwh [(DATA_WIDTH-1):0] Input Negative-edge data for memory writes. This data bus is valid
when user_d_w_n is asserted.
user_qrl [(DATA_WIDTH-1):0] Output Positive-edge data read from memory. This data is output
when user_qen_n is asserted.
user_qrh [(DATA_WIDTH-1):0] Output Negative-edge data read from memory. This data is output
when user_qen_n is asserted.
user_bwl_n [(BW_WIDTH-1):0] Input Byte enables for QDRII memory positive-edge write data.
The byte enables are valid when user_d_w_n is asserted.
user_bwh_n [(BW_WIDTH-1):0] Input Byte enables for QDRII memory negative-edge write data.
The byte enables are valid when user_d_w_n is asserted.
user_ad_wr [(ADDR_WIDTH-1):0] (1) Input QDRII memory address for write data. This address is valid
when user_ad_w_n is asserted.
user_ad_rd [(ADDR_WIDTH-1):0] (1) Input QDRII memory address for read data. This address is valid
when user_r_n is asserted.
user_ad_w_n Input This active-Low signal is the write enable for the User Write
Address FIFO.
Table 10-5: QDRII SRAM User Interface Signals (without a Testbench [user_design]) (Cont’d)
Signal Name Direction Description
user_d_w_n Input This active-Low signal is the write enable for the User Write
Data FIFO and Byte Write FIFOs.
user_r_n Input This active-Low signal is the write enable for the User Read
Address FIFO.
Notes:
1. The number of address bits used depends on the density of the memory part. The controller ignores the unused bits, which can all
be tied to High.
user_d_w_n, and user_r_n interface signals need to be held High until calibration is
complete.
• For issuing a write command, the memory write address must be written into the
Read Address FIFO. The first write data word must be written to the Write Data FIFO
on the same clock cycle as the when the write address is written. In addition, the write
data burst must be written over consecutive clock cycles; there cannot be a break
between bursts of data. These restrictions arise from the fact that the controller
assumes write data is available when it receives the write command from the user.
• The clk0_tb signal is connected to clk0 in the controller. If the user clock domain is
different from clk0 / clk0_tb of MIG, the user should add FIFOs for all data inputs and
outputs of the controller in order to synchronize them to the clk0_tb. The timing for
the non-FIFO user interface for controllers with a burst length of two is the same as
that of the FIFO interface. With respect to the user backend, the timing remains the
same for both the FIFO and non-FIFO user interface.
User Interface
The user interface has two interfaces: a Read user interface and a Write user interface.
The Read user interface consists of the Read Address interface modules. The Read Address
interface consists of the Read Address FIFO. The user has to write the read address bits of
the memory into this FIFO.
The Write User interface consists of the Write Data interface and the Write Address
interface. The Write Address interface consists of the Write Address FIFO. The user has to
write the write address bits of the memory into this FIFO.
The Write Data interface consists of the Write Data FIFO and the Byte Write FIFO. The
width of the Write Data FIFO depends upon the data width of the controller design. There
are two Write Data FIFOs for every controller: the LSB Write Data FIFO and the MSB Write
Data FIFO. The outputs of these FIFOs are SDR and are later converted to DDR at the
ODDR primitive before transferring to memory.
The Byte Write enable signals are stored in the Byte Write FIFO by the user.
The controller monitors the status signals of these User FIFOs and issues the
READ/WRITE commands to the memory.
The user must wait until the cal_done signal is asserted by the controller, which indicates
completion of calibration prior to writing the user data to the Write Data FIFOs, Byte Write
FIFO, and Write Address FIFO. Even if the user wants to write any data in to these FIFOs
before the completion of calibration, the data does not get written to these FIFOs. These
Write Data FIFOs and Byte Write FIFOs write enable signals are considered valid only after
the calibration is complete.
Refer to the timing diagrams in “QDRII Controller Interface Signals” for how the user can
access these FIFOs.
The FIFO36 and FIFO36_72 primitives are used for loading address and data from the user
interface. The FIFO36 primitive is used in the qdrii_top_wr_addr_interface,
qdrii_top_rd_addr_interface, and qdrii_top_wrdata_bw_fifo modules. The FIFO36_72
primitive is used in the qdrii_top_wrdata_fifo module. Every FIFO has two FIFO threshold
attributes, ALMOST_EMPTY_OFFSET and ALMOST_FULL_OFFSET, that are both set to
128 in the RTL, by default. These values can be changed as needed. For valid FIFO
threshold offset values, refer to UG190 [Ref 10].
Table 10-7 lists the signals between the user interface and the controller.
Write Interface
Figure 10-13 illustrates the user interface block diagram for write operations.
user_ad_wr
User Interface
Address FIFO
(FIFO36)
user_ad_w_n fifo_wr_empty
1024 x 36 Controller
user_wr_full wr_init_n
fifo_bw_l
Byte Write FIFO
user_bwh_n (FIFO36)
1024 x 36 fifo_bw_h
ug086_c10_15_122007
The following steps describe the architecture of Address and Write Data FIFOs and how to
perform a write burst operation to QDRII memory from user interface.
1. The user interface consists of an Address FIFO, Data FIFOs, and a Byte Write FIFO.
These FIFOs are built out of Virtex-5 FPGA FIFO primitives. The Address FIFO is a
FIFO36 primitive with 1K x 36 configuration. The Data FIFO is a FIFO36_72 primitive
with 512 x 72 configuration.
2. The Address FIFO is used to store the memory address where the data is to be written
from the user interface. A single instantiation of a FIFO36 constitutes the Address
FIFO.
3. Two separate sets of Data FIFOs are used for storing the rising-edge and falling-edge
data to be written to QDRII memory from the user interface. For 9-bit, 18-bit, and 36-bit
configurations, the controller pads the extra bits of the Data FIFO with 0s.
4. The Byte Write FIFO is used to store the Byte Write signals to QDRII memory from the
user interface. Extra bits are padded with zeros.
5. The user can initiate a write command to memory by writing to the Write Address
FIFO, Write Data FIFO, and Byte Write FIFOs when the FIFO full flags are deasserted
and after the calibration done signal cal_done is asserted. The user should not access
any of these FIFOs until cal_done is asserted. During the calibration process, the
controller writes pattern data into the Data FIFOs. The cal_done signal assures that the
clocks are stable, the reset process is completed, and the controller is ready to accept
commands. Status signal user_wr_full is asserted when the Address FIFO, Data FIFOs,
or Byte Write FIFOs are full.
6. When signal user_ad_w_n is asserted, user_ad_wr is stored in the Address FIFO.
When signal user_d_w_n signal is asserted, user_dwl and user_dwh are stored into
the Data FIFO, and user_bwl and user_bwh are stored into the Byte Write FIFOs. For
proper controller functionality, user_ad_w_n and user_d_w_n must be asserted and
deasserted simultaneously.
7. The controller reads the Address, Data, and Byte Write FIFOs when they are not empty
by issuing the wr_init_n signal. The QDRII memory write command is generated from
the wr_init_n signal by properly timing it.
clk0_tb
cal_done
user_wr_full
user_ad_w_n
user_ad_wr A0 A1 A2
user_d_w_n
UG086_c10_16_122007
8. Figure 10-14 shows the timing diagram for a write command with a burst length of
four. The address should be asserted for one clock cycle as shown. For BL = 4, each
write to the Address FIFO has two writes to the Data FIFO consisting of two rising-
edge and two falling-edge data.
9. Figure 10-15 shows the timing diagram for a write command with a burst length of
two. For BL = 2, each write to the Address FIFO has one write to Data FIFO, consisting
of one rising-edge and one falling-edge data. Commands can be given in every clock
when BL = 2.
When BURST2_FIFO_INTERFACE is set to FALSE in design_top, the timing shown in
Figure 10-15 must be followed from the user side. Corresponding address and write
data are provided in the same cycle associated with asserting user_ad_w_n and
user_d_w_n. The user_wr_full and user_rd_full signals are tied Low in this case.
clk0_tb
cal_done
user_wr_full
user_ad_w_n
user_ad_wr A0 A1 A2 A3 A4
user_d_w_n
UG086_c10_17_122007
For a design with a burst length of two without FIFOs, the total write latency becomes 4.5
cycles. This is because the four clock cycle latency of the empty signal deassertion of the
Write Address FIFO does not apply.
Read Interface
Figure 10-16 shows a block diagram for the read interface.
User Interface
user_ad_rd fifo_rd_empty
Controller
rd_init_n
Address FIFO
user_r_n (FIFO36)
1024 x 36
user_rd_full fifo_ad_rd
To top_phy
user_qrl
user_qrh
From top_phy
user_qr_valid
UG086_c10_18_030308
clk0_tb
cal_done
user_rd_full
user_r_n
user_ad_rd A0 A1 A2
user_qr_valid
UG086_c10_19_122007
clk0_tb
cal_done
user_rd_full
user_r_n
user_ad_rd A0 A1 A2 A3 A4
user_qr_valid
UG086_c10_20_122007
The BL = 2 example without the FIFO user interface uses the same input signal timing on
the address, command, and data as the non-FIFO user interface.
For a design with a burst length of two without FIFOs, the total read latency becomes 4.5
cycles. This is because the four clock cycle latency of the empty signal deassertion of the
Read Address FIFO does not apply.
MIG shows checkboxes for Address, Data_Write, Data_Read, System Control, and
System_Clock when a bank is selected for a QDRII SRAM design.
When the Address box is checked in a bank, the address, qdr_w_n, qdr_r_n, and
qdr_dll_off_n bits are assigned to that particular bank.
When the Data_Write box is checked in a bank, the memory data write, memory byte write
bits, the memory write clocks, and the memory input clock for the output data are assigned
to that particular bank.
When the Data_Read box is checked in a bank, the memory data read and memory read
clocks are assigned to that particular bank.
When the System Control box is checked in a bank, the sys_rst_n, compare_error, and
cal_done bits are assigned to that particular bank.
When the System Clock box is checked in a bank, the sys_clk_p, sys_clk_n, dly_clk_200_p,
and dly_clk_200_n bits are assigned to that particular bank.
For special cases, such as without a testbench and without a PLL, the corresponding input
and output ports are not assigned to any FPGA pins in the design UCF because the user
can connect these ports to the FPGA pins or can connect to some logic internal to the same
FPGA.
Note: Timing has been verified for most of the MIG generated configurations. For the best timing
results, adjacent banks in the same column of the FPGA should be used. Banks that are separated
by unbonded banks should be avoided because these can cause timing violations.
Supported Devices
The design generated out of MIG is independent of the memory package, hence the
package part of the memory component is replaced with X, where X indicates any package.
Table 10-11 shows the list of components supported by MIG.
Chapter 11
Interface Model
DDR SDRAM interfaces are source-synchronous and double data rate. They transfer data
on both edges of the clock cycle. A memory interface can be modularly represented as
shown in Figure 11-1. A modular interface has many advantages. It allows designs to be
ported easily and also makes it possible to share parts of the design across different types
of memory interfaces.
Xilinx FPGA
Control Layer
Physical Layer
Memories
UG086_c11_01_012207
Feature Summary
This section summarizes the supported and unsupported features of DDR SDRAM
controller design.
Supported Features
The DDR SDRAM controller design supports the following:
• Burst lengths of two, four, and eight
• Sequential and interleaved burst types
• DDR SDRAM components and DIMMs
• CAS latencies of 2, 2.5, and 3
• Verilog and VHDL
• With and without a testbench
• Bank management
• Bytewise data masking
• Linear addressing
• With and without a PLL
• Registered DIMMs, unbuffered DIMMs and SO-DIMMs
• Data mask
• System clock, differential and single-ended
The supported features are described in more detail in “Architecture.”
Unsupported Features
The DDR SDRAM controller design does not support:
• Deep memories/dual rank DIMMs
• Multicontrollers
Architecture
Architecture
Implemented Features
This section provides details on the supported features of the DDR SDRAM controller. The
Virtex-5 FPGA DDR SDRAM design is a generic design that works for most of the features
mentioned above. User input parameters are defined as parameters for Verilog and
generics in VHDL in the design modules and are passed down the hierarchy. For example,
if the user selects a burst length of 4, then it is defined as follows in the <top_module>
module:
parameter BURST_LEN = 4, // burst length (in doublewords)
The user can change this parameter for various burst lengths to get the desired output. The
same concept holds for all the other parameters listed in the <top_module> module.
Table 11-2 lists the details of all parameters.
Architecture
Burst Length
Bits M0:M3 of the Mode Register define the burst length and burst type. Read and write
accesses to the DDR SDRAM are burst-oriented. The burst length is programmable to
either 2, 4, or 8 through the GUI. The burst length determines the maximum number of
column locations accessed for a given READ or WRITE command. The DDR SDRAM ctrl
module implements a burst length that is programmed.
CAS Latency
Bits M4:M6 of the Mode Register define the CAS latency (CL). CL is the delay in clock
cycles between the registration of a READ command and the availability of the first bit of
output data. CL can be set to 2, 2.5, or 3 clocks through the GUI. CAS latency is
implemented in the ctrl module. For CL = 2.5, the input value is read as “25” in the design.
During read data operations, the generation of the read_en signal varies according to the
CL in the ctrl module.
Precharge
The PRECHARGE command is used to close the open row in a bank if there is a command
to be issued in the same bank. The Virtex-5 FPGA DDR controller issues a PRECHARGE
command only if there is already an open row in the particular bank where a read or write
command is to be issued, thus increasing the efficiency of the design. The auto-precharge
function is not supported in this design. This design ties the A10 bit Low during normal
reads and writes.
Data Masking
Virtex-5 FPGA DDR SDRAM controllers support bytewise data masking of the data bits
during a write operation. For x4 components, data masking cannot be done on a per nibble
basis due to an internal block RAM based FIFO limitation. The mask data is stored into the
FIFOs along with the write data. MIG supports a data mask option. If this option is
checked in the GUI, MIG generates a design with data mask pins. This option can be
chosen if the selected part has data masking.
Auto Refresh
An AUTO REFRESH command is issued to the DDR memory at specified intervals of time
to refresh the charge to retain the data.
Bank Management
Bank management is done by the Virtex-5 FPGA DDR SDRAM controller design to
increase the efficiency of the design. The controller keeps track of whether the bank being
accessed already has an open row or not, and also decides whether a PRECHARGE
command should be issued or not to that bank. When bank management is enabled via the
MULTI_BANK_EN parameter, a maximum of four banks/rows can open at any one time.
A least-recently-used (LRU) algorithm is employed to keep the three banks most recently
used. It closes the bank least recently used when a new bank/row location needs to be
accessed. The bank management feature can also be disabled by clearing
MULTI_BANK_EN. In this case, only one bank is kept open at any one time. For more
information on Bank Management, refer to application note XAPP858 [Ref 27].
Linear Addressing
Linear addressing refers to the way the user provides the address of the memory to be
accessed. For Virtex-5 FPGA DDR SDRAM controllers, the user provides the address
information through the app_af_addr signal. As the densities of the memory devices vary,
the number of column address bits and row address bits also change. In any case, the row
address bits in the app_af_addr signal always start from the next higher bit where the
column address ends. This feature increases the coverage of more devices that can be
supported with the design.
System Clock
MIG supports differential and single-ended system clocks. Based on the selection in the
GUI, input system clocks and IDELAY clocks are differential or single-ended.
Hierarchy
Hierarchy
Figure 11-2 shows the hierarchical structure of the design generated by MIG with a PLL
and a testbench.
<top_
module>
ddr_tb_ ddr_tb_
ddr_ ddr_ctrl ddr_
test_ test_
phy_top usr_top
addr_gen data_gen
ddr_usr_
ddr_phy_ ddr_ ddr_ ddr_phy_ ddr_ backend_
ctl_io phy_io phy_init write usr_rd fifo
ddr_usr_
Design Modules
ram_d
Test Bench Modules
Clocks and Reset Generation Modules
UG086_c11_02_012809
module. A user reset is also input to this module. Using the input clocks and reset signals,
the system clocks and the system reset are generated in this module, which is used in the
design.
If the design has no PLL, the PLL primitive is not instantiated in the module. Instead, the
system operates on the user-provided clocks. A system reset is also generated in the
infrastructure module using the input locked signal.
clk200
clk200_p rst200 idelay_ctrl idelay_ctrl_rdy
clk200_n
System
Clocks sys_clk_p
clk90 ddr_ras_n
and Reset Infrastructure
sys_clk_n
clk0 ddr_cas_n
sys_rst_n
rst0 ddr_we_n
rst90 ddr_cs_n
ddr_cke
ddr_dm
ddr1_top ddr_ba Memory
Device
ddr_a
ddr_ck
ddr_ck_n
ddr_dq
phy_init_done ddr_dqs
ddr_reset_n
Status
Signals
tb_top
error
UG086_c11_03_091007
Figure 11-3: Top-Level Block Diagram of the DDR SDRAM Design with a PLL and a Testbench
Figure 11-4 shows a block diagram representation of the top-level module for a design
with a testbench but without a PLL. “Clocking Scheme,” page 442 explains how to
generate the design clocks from the user interface. The inputs consist of user clocks for the
design and Idelayctrl modules and the user reset. The design uses the user input clocks.
These clocks should be single-ended. The infrastructure module uses the input reset and
locked signals to reset the design. The user application must have a PLL/DCM primitive
instantiated in the design. The error output signal indicates whether the case passes or
fails. The phy_init_done signal indicates the completion of initialization and calibration of
the design.
clk200
Status
Signals
tb_top
error
UG086_c11_04_012809
Figure 11-4: Top-Level Block Diagram of the DDR SDRAM Design with a Testbench but without a PLL
Figure 11-5 shows a block diagram representation of the top-level module for a design
with a PLL but without a testbench. “Clocking Scheme,” page 442 describes how various
clocks are generated using the PLL. The phy_init_done signal indicates the completion of
initialization and calibration of the design. The user interface signals are also listed in the
<top_module> module. The design provides the clk0_tb and rst0_tb signals to the user to
synchronize with the design. The clk0_tb signal is connected to clk0 in the controller. If the
user clock domain is different from clk0/clk0_tb, the user should add FIFOs for all the
inputs and outputs of the controller (user application signals) in order to synchronize them
to the clk0_tb clock. Because the PLL is instantiated in the infrastructure module, it
generates the required clock and reset signals for the design.
clk200
clk200_p rst200 idelay_ctrl idelay_ctrl_rdy
clk200_n
System
Clocks sys_clk_p
Infrastructure clk90
and Reset sys_clk_n
clk0
sys_rst_n rst90
ddr_ras_n
rst0
ddr_cas_n
app_af_addr ddr_we_n
app_af_wren ddr_cs_n
app_wdf_data ddr_cke
app_wdf_mask_data ddr_dm
app_wdf_wren ddr1_top ddr_ba Memory
User Device
app_wdf_afull ddr_a
Application
app_af_afull ddr_ck
and Status
Signal app_af_cmd ddr_ck_n
rd_data_valid ddr_dq
rd_data_fifo_out ddr_dqs
clk0_tb ddr_reset_n
rst0_tb
phy_init_done
UG086_c11_05_091007
Figure 11-5: Top-Level Block Diagram of the DDR SDRAM Design with a PLL but without a Testbench
Figure 11-6 shows a block diagram representation of the top-level module for designs
without a PLL or a testbench. The inputs consist of user clocks for the design and Idelayctrl
modules and the user reset. The design uses the user input clocks. “Clocking Scheme,”
page 442 explains how to generate the design clocks from the user interface. These clocks
should be single-ended. To reset the design, the signals are generated using the input reset
and the locked signals in the infrastructure module. The user application must have a
PLL/DCM primitive instantiated in the design. The phy_init_done signal indicates the
completion of initialization and calibration of the design. The user interface signals are also
listed in the <top_module> module. The design provides the clk0_tb and rst0_tb signals to
the user to synchronize with the design. The signal clk0_tb is connected to clock clk0 in the
controller. If the user clock domain is different from clk0/clk0_tb, the user should add
FIFOs for all the inputs and outputs of the controller (user application signals) in order to
synchronize them to clk0_tb clock.
clk_200
idelay_ctrl_rdy
idelay_ctrl
System rst200
Reset
and User clk_0
PLL/DCM clk_90 Infrastructure
rst0
sys_rst_n
rst90
locked
ddr_ras_n
ddr_cas_n
app_af_addr ddr_we_n
app_af_wren ddr_cs_n
app_wdf_data ddr_cke
app_wdf_mask_data ddr_dm
Memory
app_wdf_wren ddr1_top ddr_ba Device
User app_wdf_afull ddr_a
Application app_af_afull ddr_ck
and Status
app_af_cmd ddr_ck_n
Signal
rd_data_valid ddr_dq
rd_data_fifo_out ddr_dqs
clk0_tb ddr_reset_n
rst0_tb
phy_init_done
UG086_c11_06_012809
Figure 11-6: Top-Level Block Diagram of the DDR SDRAM Design without a PLL or a Testbench
Figure 11-7 shows an expanded block diagram of the design. The design’s top module is
expanded to show various internal blocks. The functions of these blocks are explained in
following subsections.
clk200
clk200_p rst200 idelay_ctrl idelay_ctrl_rdy
clk200_n
idly_clk_200 clk90
System
Infrastructure
Clocks sys_clk_p clk0
and Reset sys_clk_n rst90
sys_clk rst0
sys_rst_n
ddr1_top/mem_if_top
app_af_addr write_data
app_af_wren ddr_ras_n
app_wdf_data ddr_cas_n
app_wdf_mask_data ddr_we_n
app_wdf_wren ddr_cs_n
usr_top
User ddr_cke
app_wdf_afull
Application ddr_dm
app_af_afull read_data
and Status phy_top ddr_ba Memory
Signal app_af_cmd
Device
ddr_a
rd_data_valid
ddr_ck
rd_data_fifo_out
Control ddr_ck_n
clk0_tb
Signals ddr_dq
rst0_tb
ddr_dqs
phy_init_done
ctrl ddr_reset_n
Control
Signals
UG086_c11_07_083108
Figure 11-7: Detailed Block Diagram of the DDR SDRAM Design with a PLL but without a Testbench
Infrastructure
The infrastructure module generates the FPGA clock and reset signals. When differential
clocking is used, sys_clk_p, sys_clk_n, clk_200_p, and clk_200_n signals appear. When
single-ended clocking is used, sys_clk and idly_clk_200 signals appear. In addition, clocks
are available for design use and a 200 MHz clock is provided for the IDELAYCTRL
primitive. Differential and single-ended clocks are passed through global clock buffers
before connecting to a PLL/DCM. For differential clocking, the output of the
sys_clk_p/sys_clk_n buffer is single-ended and is provided to the PLL/DCM input.
Likewise, for single-ended clocking, sys_clk is passed through a buffer and its output is
provided to the PLL/DCM input. The outputs of the PLL/DCM are clk0 (0° phase-shifted
version of the input clock) and clk90 (90° phase-shifted version of the input clock). After
the PLL/DCM is locked, the design is in the reset state for at least 25 clocks. The
infrastructure module also generates all of the reset signals required for the design.
PLL/DCM
In MIG 3.0 and later, the DCM is replaced with a PLL for all Virtex-5 FPGA designs. If the
user selects a design with a PLL in the GUI, the infrastructure module will have both PLL
and DCM codes. The CLK_GENERATOR parameter enables either a PLL or a DCM in the
infrastructure module. The CLK_GENERATOR parameter is set to PLL by default. If the
user wants to use DCM, this parameter should be changed manually to DCM.
When the user chooses the no PLL option in the GUI, the design does not use any
PLL/DCM primitives. Instead it works on the clocks provided by the user. The input
clocks in this case have to be single-ended. The locked status and user input reset signals
are the inputs to the module when there is no PLL. These signals are used to generate the
synchronous system resets for the design.
idelay_ctrl
This module instantiates the IDELAYCTRL primitive of the Virtex-5 FPGA. The
IDELAYCTRL primitive is used to continuously calibrate the individual delay elements in
its region to reduce the effect of process, temperature, and voltage variations. A 200 MHz
clock has to be fed to this primitive.
MIG uses the “automatic” method for IDELAYCTRL instantiation in which the MIG HDL
only instantiates a single IDELAYCTRL for the entire design. No location (LOC)
constraints are included in the MIG-generated UCF. This method relies on the ISE® tools to
replicate and place as many IDELAYCTRLs as needed (for example, one per clock region
that uses IDELAYs). Replication and placement are handled automatically by the software
tools if IDELAYCTRLs have same refclk, reset, and rdy nets. A new constraint called
IODELAY_GROUP associates a set of IDELAYs with an IDELAYCTRL and allows for
multiple IDELAYCTRLs to be instantiated without LOC constraints specified. ISE software
generates the IDELAY_CTRL_RDY signal by logically ANDing the RDY signals of every
IDELAYCTRL block.
The IODELAY_GROUP name should be checked in the following cases:
• The MIG design is used with other IP cores or user designs that also require the use of
IDELAYCTRL and IDELAYs.
• Previous ISE software releases 8.2.03i and 9.1i had an issue with IDELAYCTRL block
replication or trimming. When using these revisions of the ISE software, the user must
instantiate and constrain the location of each IDELAYCTRL individually.
See UG190 [Ref 10] for more information on the requirements of IDELAYCTRL placement.
ctrl
The ctrl module is the main controller of the Virtex-5 FPGA DDR SDRAM controller
design. It generates all the control signals required for the DDR memory interface and the
user interface. This module signals the FIFOs instantiated in the user interface to output
the fed data in it and also signals the physical layer to output the data on the IOBs during
a write operation. During a read operation, the data read from the memory is taken from
the physical layer and written into the user interface FIFOs using the control signals
generated by the ctrl module.
The ctrl module decodes the user command and issues the specified command to the
memory. The app_af_cmd signal is decoded as a write command when it equals 3’b000,
and app_af_cmd is decoded as a read command when it equals 3’b001. The commands
and control signals are generated based on the input burst length and CAS latency. If the
multi-bank option is enabled, the ctrl module also takes care of bank management, so as to
increase the efficiency of the design. At a given point of time, a maximum of four banks can
be open. The controller issues a PRECHARGE command to the bank only if there is already
an open row in that bank and the next command is to be issued to a different row. An
ACTIVE command is generated to open the row in that particular bank. Thus the efficiency
is increased.
phy_top
The phy_top module is the top level of the physical interface of the design. The physical
layer includes the input/output blocks (IOBs) and other primitives used to read and write
the double data rate signals to and from the memory, such as IDDR and ODDR. This
module also includes the IODELAY elements of the Virtex-5 FPGA. These IODELAY
elements are used to delay the input strobe and data signals to capture the valid data into
the Read Data FIFO.
The memory control signals, such as RAS_N, CAS_N, and WE_N, are driven from the
buffers in the IOBs. All the input and output signals to and from the memory are
referenced from the IOB to compensate for the routing delays inside the FPGA.
The phy_init module, which is instantiated in the phy_top module, is used to initialize the
DDR memory in a predefined sequence according to the JEDEC standard for DDR
SDRAM.
The phy_calib module calibrates the design to align the strobe signal such that it always
captures the valid data in the FIFO. This calibration is needed to compensate for the trace
delays between the memory and the FPGA devices.
The phy_write module splits the user data into rise data and fall data to be sent to the
memory as a double data rate signal using ODDR. Similarly, while reading the data from
memory, the data from IDDR is combined to get a single vector that is written into the read
FIFO.
usr_top
The usr_top module is the user interface block of the design. It receives and stores the user
data, command, and address information in respective FIFOs. The ctrl module generates
the required control signals for this module. During a write operation, the data stored in
the usr_wr_fifo is read and given to the physical layer to output to the memory. Similarly,
during a read operation, the data from the memory is read via IDDR and written into the
FIFOs. This data is given to the user with a valid signal (rd_data_valid), which indicates
valid data on the rd_data_fifo_out signal. See “User Interface Accesses,” page 449 for
required timing requirements and restrictions for user interface signals.
The FIFO36 and FIFO36_72 primitives are used for loading address and data from the user
interface. The FIFO36 primitive is used in the ddr_usr_addr_fifo module. The FIFO36_72
primitive is used in the ddr_usr_wr module. Every FIFO has two FIFO threshold
attributes, ALMOST_EMPTY_OFFSET and ALMOST_FULL_OFFSET, that are set to 7 and
F, respectively, in the RTL by default. These values can be changed as needed. For valid
FIFO threshold offset values, refer to UG190 [Ref 10].
Test Bench
The MIG tool generates two RTL folders, example_design and user_design. The
example_design folder includes the synthesizable test bench, while user_design does not
include the test bench modules. The MIG test bench performs eight write commands and
eight read commands in an alternating fashion. The number of words in a write command
depends on the burst length. For a burst length of 4, the test bench writes a total of 32 data
words for all eight write commands (16 rise data words and 16 fall data words). For a burst
length of 8, the test bench writes a total of 64 data words. It writes the data pattern of FF,
00, AA, 55, 55 AA, 99, 66 in a sequence of which FF, AA, 55, and 99 are rise data words and
00, 55, AA, and 66 are fall data words for an 8-bit design. The falling edge data is the
complement of the rising edge data. For a burst length of 4, the data sequence for the first
write command is FF, 00, AA, 55, and the data sequence for the second write command is
55, AA, 99, 66. For a burst length of 8, the data pattern for the first write command is FF,
00, AA, 55, 55 AA, 99, 66 and the same pattern is repeated for all the remaining write
commands. This data pattern is repeated in the same order based on the number of data
words written. For data widths greater than 8, the same data pattern is concatenated for
the other bits. For a 32-bit design and a burst length of 8, the data pattern for the first write
command is FFFFFFFF, 00000000, AAAAAAAA, 55555555, 55555555, AAAAAAAA,
99999999, 66666666.
Address generation logic generates eight different addresses for eight write commands.
The same eight address locations are repeated for the following eight read commands. The
read commands are performed at the same locations where the data is written. There are
total of 32 different address locations for 32 write commands, and the same address
locations are generated for 32 read commands. Upon completion of a total of 64
commands, including both writes and reads (eight writes and eight reads repeated four
times), address generation rolls back to the first address of the first write command and the
same address locations are repeated. The MIG test bench exercises only a certain memory
area. The address is formed such that all address bits are exercised. During writes, a new
address is generated for every burst operation on the column boundary.
During reads, comparison logic compares the read pattern with the pattern written, i.e., the
FF, 00, AA, 55, 55 AA, 99, 66 pattern. For example, for an 8-bit design of burst length 4, the
data written for a single write command is FF, 00, AA, 55. During reads, the read pattern is
compared with the FF, 00, AA, 55 pattern. Based on a comparison of the data, a status
signal error is generated. If the data read back is the same as the data written, the error
signal is 0, otherwise it is 1.
Clocking Scheme
Figure 11-9, page 444 shows the clocking scheme for this design. Global and local clock
resources are used.
The global clock resources consist of a PLL or a DCM, two BUFGs on PLL/DCM output
clocks, and one BUFG for clk200. The local clock resources consist of regional I/O clock
networks (BUFIO). The global clock architecture is discussed in this section.
The MIG tool allows the user to customize the design such that the PLL/DCM is not
included. In this case, system clocks clk0 and clk90, and IDELAYCTRL clock clk200 must
be supplied by the user.
Clocking Scheme
The system clock from the output of the IBUFGDS or the IBUFG is connected to the
PLL/DCM to generate the various clocks used by the memory interface logic.
The clk200 output of the IBUFGDS or the IBUFG is connected to the BUFG. The output of
the BUFG is used for IDELAY IOB delay blocks for aligning read capture data.
The PLL/DCM generates two separate synchronous clocks for use in the design. This is
shown in Table 11-3, Figure 11-8, and Figure 11-9, page 444. The clock structure is the same
for both the example design and the user design. For designs without PLL/DCM
instantiation, PLL/DCM and the BUFGs should be instantiated at the user end to generate
the required clocks.
Table 11-3: DDR Interface Design Clocks
Clock Description Logic Domain
The clock for the controller and the user interface
logic, most of the DDR bus-related I/O flip-flops
(e.g., memory clock, control/address, output DQS
Skew compensated
strobe, and DQ input capture). This clock is used to
clk0 replica of the input
register the data, address, and command signals,
system clock.
and the address and data enables for the user
interface logic (1). This clock is also used to generate
the FIFO status signals.
Used in the write data path section of physical layer.
Clocks write path control logic, DDR side of the
90° phase-shifted
clk90 Write Data FIFO, and output flip-flops for DQ. This
version of clk_0
clock is also used to generate the read data and read
data valid signals for the user interface logic (1).
Notes:
1. See “User Interface Accesses,” page 449 for timing requirements and restrictions on the user interface
signals.
CLKOUT1 clk90
CLKFBIN
CLKFBOUT
UG086_c11_15_012809
Figure 11-8: Clocking Scheme for QDRII Interface Logic Using PLL
DCM
BUFG
GC I/O
SYSTEM CLK
CLK0 clk0
CLKIN
UG086_c11_14_092908
Clocking Scheme
Notes:
1. The direction indicated in this table is referenced from the design perspective. For example, input indicates that the signal is input to the
design and output for the user.
2. See “User Interface Accesses,” page 449 for required timing requirements and restrictions for the user interface signals.
3. Addressing in the Virtex-5 FPGA is linear. That is, the row address bits immediately follow the column address bits, and the bank address
bits follow the row address bits, thus supporting more devices. The number of address bits used depends on the density of the memory part.
The controller ignores the unused bits, which can all be tied to High.
Table 11-5 lists the signals between the User interface and the controller.
System Reset
Auto Refresh
Wait > 200 clock cycles
Auto Refresh
Wait 45 clock cycles
Initialization complete.
Continue calibration.
Precharge all banks
UG086_c11_08_021307
Continuous readback of
stage 1 training pattern Stage 1:
DQ-DQS per bit calibration.
Adjust DQ IDELAY.
Perform once per DQ bit.
Calibrate all DQ
Stage 3:
Continuous readback of
Read data valid calibration (once per
stage 3/4 training pattern
DQS group).
Adjust number of clock cycles to wait
after issuing read command before valid
Read Data Valid calibration data arrives in FPGA_CLK domain.
all DQS Perform once per DQS group.
Stage 4:
DQS Gate Control DQS gate control calibration.
calibration for all DQS Adjust IDELAY for DQS gate control.
Perform once per DQS group.
Calibration Done
UG086_c9_08_020507
The first calibration stage sets the IDELAY value for each DQ (IDELAY for DQS remains at
0 during this time), and is performed even before a phase relationship between DQS and
FPGA_CLK has been established. A training pattern of “10” (1 = rising, 0 = falling) is used
to calibrate DQ.
The second calibration stage includes calibration between the DQS and the FPGA clock.
The third calibration stage is read-enable calibration, which compensates for the round-
trip delay between when the read command is issued by the controller, and the captured
read data is valid at the outputs of the ISERDES.
The fourth stage includes calibration of a squelch circuit that gates the input DQS to avoid
the glitch that propagates to the second rank of flops in the ISERDES. The glitch occurs
when DQS goes from the Low state to the 3-state level after the last edge of the DQS, which
might cause a “false” rising and/or falling edge on the DQS input to the FPGA. Unless the
DQS glitch is gated after the last DQS falling edge of a read burst, the data registered in the
ISERDES might change prematurely. During calibration, an auto-refresh command is
issued to memory at intervals depending on the stage of calibration.
After initialization and calibration is done, the controller is signaled to start normal
operation of the design. Now, the controller can start issuing user write and read
commands to the memory.
Write Interface
Figure 11-12 shows the user interface block diagram for write operations.
app_af_addr
User Interface af_addr
app_af_afull ctrl_af_rden
Write Data
FIFO
app_wdf_data (FIFO36_72) wdf_rden
512 x 72
app_wdf_mask_data wdf_data
ug086_c11_12_122007
The following steps describe the architecture of the Address and Write Data FIFOs and
show how to perform a write burst operation to DDR SDRAM from the user interface.
1. The user interface consists of an Address FIFO and a Write Data FIFO. The Write Data
FIFO is constructed using Virtex-5 FPGA FIFO36_72 primitive with a 512 x 72
configuration. The 72-bit architecture comprises one 64-bit port and one 8-bit port. For
Write Data FIFOs, the 64-bit port is used for data bits and the 8-bit port is used for
mask bits. Mask bits are available only when supported by the memory part and when
the Data Mask is enabled in the MIG GUI. Some memory parts, such as Registered
DIMMs of x4 parts, do not support mask bits.
2. The Address FIFO is constructed using Virtex-5 FPGA FIFO36 primitive with a
1024 x 36 configuration. The 36-bit architecture comprises one 32-bit port and one 4-bit
port. The 32-bit port is used for addresses (app_af_addr), and the 4-bit port is used for
commands (app_af_cmd).
3. The Address FIFO is common for both Write and Read commands. It comprises an
address part and the command part. Command bits discriminate between write and
read commands.
4. The user interface data width app_wdf_data is twice that of the memory data width.
For an 8-bit memory width, the user interface is 16 bits consisting of rising edge data
and falling edge data. For every 8 bits of data, there is a mask bit. For 72-bit memory
data, the user interface data width app_wdf_data is 144 bits, and the mask data
app_wdf_mask_data is 18 bits.
5. The minimum configuration of the Write Data FIFO is 512 x 72 for a memory data
width of 8 bits. For an 8-bit memory data width, the least-significant 16 bits of the data
port are used for write data and the least-significant two bits of the 8-bit port are used
Write Interface
for mask bits. The controller internally pads all zeros for the most-significant 48 bits of
the 64-bit port and the most-significant six bits of the 8-bit port.
6. Depending on the memory data width, MIG instantiates multiple FIFO36_72s to gain
the required width. For designs using 8-bit to 32-bit data width, one FIFO36_72 is
instantiated; for 72-bit data width, a total of three FIFO36_72s are instantiated. The bit
architecture comprises 32 bits of rising-edge data, 4 bits of rising-edge mask, 32 bits of
falling-edge data, and 4 bits of falling-edge mask, which are all stored in a FIFO36_72.
MIG routes the app_wdf_data and app_wdf_mask_data to FIFO36_72s accordingly.
7. The user can initiate a write to memory by writing to the Address FIFO and the Write
Data FIFO when FIFO full flags are deasserted. Status signal app_af_afull is asserted
when the Address FIFO is full; similarly, app_wdf_afull is asserted when Write Data
FIFO is full.
8. At power-on, both Address FIFO and Write Data FIFO full flags are deasserted.
9. The user should assert Address FIFO write enable signal app_af_wren along with
address app_af_addr and command app_af_cmd to store the address and command
into Address FIFO.
10. The user data should be synchronized to the clk_tb clock. Data FIFO write-enable
signal app_wdf_wren should be asserted to store write data app_wdf_data and mask
data app_wdf_mask_data into the Write Data FIFOs. Rising-edge and falling-edge
data should be provided together for each write to the Data FIFO. The Virtex-5 FPGA
DDR SDRAM controller design supports byte-wise masking of data only.
11. The write command should be given by keeping app_af_cmd = 3'b000 and asserting
app_af_wren. Address information is given on the app_af_addr signal. Address and
command information is written into the User Address FIFO.
12. After the completion of the initialization and calibration process and when the User
Address FIFO empty signal is deasserted, the controller reads the command and
address FIFO and issues a write command to the DDR SDRAM.
clk_tb
reset_tb
app_wdf_afull
app_af_afull
phy_init_done
app_wdf_wren
app_af_wren
app_af_addr A0 A1 A2 A3
Figure 11-13: DDR SDRAM Write Burst for Four Bursts (BL = 4)
13. The write timing diagram in Figure 11-13 is derived from the MIG-generated testbench
for a burst length of four (BL = 4). As shown, each write to Address FIFO should have
two writes to the Data FIFO. The phy_init_done signal indicates memory initialization
and calibration completion.
Read Interface
Read Interface
Figure 11-14 shows the block diagram of the read interface.
app_af_afull
Read Data
FIFO
rd_data_fifo_out
(RAM 16 x 1D)
rd_data_out_rise
ug086_c11_13_122007
The following steps describe the architecture of the Read Data FIFO and show how to
perform a read burst operation from DDR SDRAM from the user interface.
1. The read user interface consists of an Address FIFO and a Read Data FIFO. The
Address FIFO is common between reads and writes. The Read Data FIFO is built out of
Distributed RAMs of 16 x 1 configuration. MIG instantiates the number of RAM16Ds
depending on the data width. For example, for 8-bit data width, MIG instantiates a
total of 16 RAM16X1Ds, 8 for rising-edge data and 8 for falling-edge data. Similarly, for
72-bit data width, MIG instantiates a total of 144 RAM16Ds, 72 for rising-edge data
and 72 for falling-edge data.
2. The user can initiate a read to memory by writing to the Address FIFO when the FIFO
full flag app_af_afull is deasserted.
3. To write the read address and read command into the Address FIFO, the Address FIFO
write enable signal app_af_wren should be issued, along with the memory read
address app_af_addr and app_af_cmd commands (set to 001 for a read command).
4. The controller reads the Address FIFO and generates the appropriate control signals to
memory. After decoding app_af_cmd, the controller issues a read command to the
memory at the specified address.
5. Prior to the actual read and write commands, the design calibrates the latency in
number of clock cycles from the time the read command is issued to the time the data
is received. Using this precalibrated delay information, the controller stores the read
data in Read Data FIFOs.
6. The read_data_valid signal is asserted when data is available in the Read Data FIFOs.
7. When calibration is completed, the controller generates the control signals to capture
the read data from the FIFO according to the CAS latency selected by the user. The
rd_data_valid signal is asserted when the read data is available to the user, and
rd_data_fifo_out is the read data from the memory to the user.
clk_tb
app_af_afull
app_af_wren
app_af_addr A0 A1 A2 A3
rd_data_valid
Read Latency
UG086_c11_11_071808
Figure 11-15: DDR SDRAM Read Burst for Four Bursts (BL = 4)
8. Figure 11-15 shows the user interface timing diagram for a read command, burst
length of four.
Read latency is defined as the time between when the read command is written to the user
interface bus until when the corresponding first piece of data is available on the user
interface bus (see Figure 11-15).
When benchmarking read latencies, it is important to specify the exact conditions under
which the measurement occurs.
Read latency varies based on the following parameters:
• Number of commands already in the FIFO pipeline before the read command is
issued
• Whether an ACTIVATE command needs to be issued to open the new bank/row
• Whether a PRECHARGE command needs to be issued to close a previously opened
bank
• Specific timing parameters for the memory, such as TRAS and TRCD in conjunction
with the bus clock frequency
• Commands can be interrupted, and banks/rows can forcibly be closed when the
periodic AUTO REFRESH command is issued
• CAS latency
• Board-level and chip-level (for both memory and FPGA) propagation delays
Table 11-9 and Table 11-10 show read latencies for the Virtex-5 FPGA DDR interface for two
different conditions. Table 11-9 shows the case where a row activate is not required prior to
issuing a read command on the DDR bus. This situation is possible, for example, when
bank management is enabled, and the read targets an already opened bank. Table 11-10
shows the case when a read results in a bank/row conflict. In this case, a precharge of the
previous row must be followed by an activation of the new row, which increases read
latency. Other specific conditions are noted in the footnotes for each table.
Notes:
1. Test conditions: Clock frequency = 200 MHz, CAS latency = 3, DDR -5 speed grade device.
2. Access conditions: Read to an already open bank/row is issued to an empty control/address FIFO.
3. Some entries have fractional clock cycles because the inverted version of CLK0 is used to drive the
DDR memory.
4. The Virtex-5 FPGA DDR interface uses a FIFO36 for the address/control FIFO. It is possible to shorten
the READ command to empty signal deassertion latency by implementing the FIFO as a distributed
RAM FIFO or removing the FIFO altogether, as the application requires.
Notes:
1. Test conditions: Clock frequency = 200 MHz, CAS latency = 3, DDR -5 speed grade device.
2. Access conditions: Read that results in a bank/row conflict is issued to an empty control/address
FIFO. This requires that the previous bank/row be closed first.
3. Some entries have fractional clock cycles because the inverted version of CLK0 is used to drive the
DDR memory.
4. The Virtex-5 DDR interface uses a FIFO36 for the address/control FIFO. It is possible to shorten the
READ command to empty signal deassertion latency by implementing the FIFO as a distributed RAM
FIFO or removing the FIFO altogether, as the application requires.
Note: Timing has been verified for most of the MIG generated configurations. For the best timing
results, adjacent banks in the same column of the FPGA should be used. Banks that are separated
by unbonded banks should be avoided because these can cause timing violations.
Supported Devices
The design generated by MIG is independent of the memory package; therefore, the
package part of the memory component is replaced with XX, where XX indicates a “don't
care” condition. The tables below list the components (Table 11-12) and DIMMs
(Table 11-13 through Table 11-15) supported by MIG for DDR SDRAM. See Appendix G,
“Low Power Options.”
Table 11-12: Supported Components for DDR SDRAM (Virtex-5 FPGAs)
Components Packages (XX) Components Packages (XX)
MT46V32M4XX-75 P,TG MT46V32M4XX-5B -
MT46V64M4XX-75 FG,P,TG MT46V64M4XX-5B BG,FG,P,TG
MT46V128M4XX-75 BN,FN,P,TG MT46V128M4XX-5B BN,FN,P,TG
MT46V256M4XX-75 P,TG MT46V256M4XX-5B P,TG
MT46V16M8XX-75 P,TG MT46V16M8XX-5B TG,P
MT46V32M8XX-75 FG,P,TG MT46V32M8XX-5B BG,FG,P,TG
MT46V64M8XX-75 BN,FN,P,TG MT46V64M8XX-5B BN,FN,P,TG
MT46V128M8XX-75 P,TG MT46V128M8XX-5B -
MT46V8M16XX-75 P,TG MT46V8M16XX-5B TG,P
MT46V16M16XX-75 BG,FG,P,TG MT46V16M16XX-5B BG,FG,P,TG
MT46V32M16XX-75 - MT46V32M16XX-5B BN,FN,P,TG
MT46V64M16XX-75 P,TG MT46V64M16XX-5B -
Table 11-13: Supported Unbuffered DIMMs for DDR SDRAM (Virtex-5 FPGAs)
Unbuffered DIMMs Packages (X) Unbuffered DIMMs Packages (X)
MT4VDDT1664AX-40B G,Y MT8VDDT3264AX-40B G,Y
MT4VDDT3264AX-40B G,Y MT9VDDT3272AX-40B Y
Table 11-14: Supported Registered DIMMs for DDR SDRAM (Virtex-5 FPGAs)
Registered DIMMs Packages (X) Registered DIMMs Packages (X)
MT9VDDF3272X-40B G,Y MT18VDDF6472X-40B G,Y
MT9VDDF6472X-40B G,Y MT18VDDF12872X-40B G,Y
on multiple device/package combinations and I/O timing analysis using FPGA and
memory timing parameters for a 64-bit wide interface.
Chapter 12
Feature Summary
This section summarizes the supported and unsupported features of DDRII SRAM
controller design.
Supported Features
The DDRII SRAM controller design supports the following:
• A maximum frequency of 300 MHz
• 9-bit, 18-bit, 36-bit, and 72-bit data widths
• CIO and SIO controller designs
• Burst lengths of two and four
• Programmable read-followed-by-write latency
• Linear/burst increment of address bits
• Implemented using different Virtex-5 devices
• Support for DCI cascading
• Support for debug signals
• Operating with 9-bit, 18-bit and 36-bit memory parts
• Verilog and VHDL
• With and without a testbench
• With and without a PLL
Architecture
Figure 12-1 shows a top-level block diagram of the DDRII SRAM controller. One side of the
DDRII SRAM memory controller connects to the user interface denoted as User Interface.
The other side of the controller interfaces to DDRII SRAM memory. The memory interface
data width is selectable from MIG.
Virtex-5 FPGA
DDRII SRAM
Memory DDRII SRAM
Controller Memory
User
Interface
ug086_c12_01_071508
Both common I/O (CIO) and separate I/O (SIO) DDRII SRAM designs are supported by
MIG. SIO designs having independent read and write ports eliminate the need for high-
speed bus turnaround.
Read and write addresses are latched on positive edges of the input clock K. A common
address bus is used to access the addresses for both read and write operations.
Interface Model
DDRII SRAM interfaces are source-synchronous and double data rate. They transfer data
on both edges of the clock cycle. A memory interface has many advantages. It allows
designs to be ported easily and also makes it possible to share parts of the design across
different types of memory interfaces.
Interface Model
Xilinx FPGA
Control Layer
Physical Layer
Memories
ug086_c12_02_071508
Figure 12-2 shows the modular memory interface representation diagram. The application
interface layer creates the user interface, which initiates memory writes and reads by
writing data and memory addresses to the User Interface FIFOs.
The control layer comprises:
• Clocks and reset generation logic
• Datapath logic
• Control logic
Clocks and reset generation logic constitute a PLL/DCM primitive, which derives
different phase-shifted versions of the user-supplied differential clocks (sys_clk_p and
sys_clk_n). These phase-shifted versions of clocks run throughout the controller design. A
200 MHz user-supplied differential clock is used for the IDELAYCTRL elements. Reset
signals are generated for different clock domains using the user-supplied reset signals
(sys_rst_n), the locked signal, and the IDELAYCTRL ready signal (idelay_ctrl_ready).
The Datapath logic consists of memory write clocks, the read clocks and the data write
generation logic.
The Control logic constitutes read/write command generation logic, depending on the
status signals of the User Interface FIFO.
The previously mentioned logic interfaces with memory through IDDRs, ODDRs,
OFLOPS, ISERDES elements, and so on, which are associated with the physical layer. The
read data capturing logic is also associated with the physical layer.
Hierarchy
Figure 12-3 shows the hierarchical structure of the DDRII SRAM design generated by MIG
with a testbench and a PLL.
ip_top
ddrii_phy_cq_io
u_ddrii_phy_cq_io
arch_ddrii_phy_cq_io
LEGEND
ddrii_phy_dq_io
u_ddrii_phy_dq_io
arch_ddrii_phy_dq_io
<module>
<verilog-instance> = Testbench
ddrii_phy_bw_io <vhdl-instance>
u_ddrii_phy_bw_io
arch_ddrii_phy_bw_io
<module>
<verilog-instance> = User Interface
ddrii_phy_init_sm
u_ddrii_phy_init_sm <vhdl-instance>
arch_ddrii_phy_init_sm
ddrii_phy_dly_cal_sm <module>
u_ddrii_phy_dly_cal_sm <verilog-instance> = Physical (PHY) Layer
arch_ddrii_phy_dly_cal_sm <vhdl-instance>
ddrii_phy_en
u_ddrii_phy_en
arch_ddrii_phy_en
ug086_c12_03_100108
reset_clk_200
clk_200 idelay_ctrl
idelay_ctrl_ready
reset_clk_270
reset_clk_0
sys_clk_n clk_90
Reference ddrii_dll_off_n
Clocks and sys_rst_n clk_0
Reset ddrii_ld_n
clk_200_p
ddrii_rw_n
clk_200_n
ddrii_k
ddrii_top ddrii_k_n
ddrii_c Memory
Device
ddrii_c_n
compare_error ddrii_sa ddrii_sa
Status
Signals cal_done ddrii_sa ddrii_bw_n
ddrii_dq
ddrii_cq
tb_top
ug086_c12_04_071508
Figure 12-4: Top-Level Block Diagram of the DDRII SRAM Design with a PLL and a
Testbench
Figure 12-5 shows a top-level block diagram of a DDRII SRAM design with a PLL but
without a testbench. The sys_clk_p and sys_clk_n pair are differential input system clocks.
“Clocking Scheme,” page 480 describes how various clocks are generated using the PLL.
The PLL/DCM is instantiated in the infrastructure module that generates the required
design clocks. dly_clk_200_p and dly_clk_200_n are used for the IDELAYCTRL element.
Sys_rst_n is an active-Low system reset signal. All design resets are generated using this
system reset signal, the locked signal, and the IDELAYCTRL ready signal
(idelay_ctrl_ready). User has to drive the user application signals. The design provides the
clk_0_tb and reset_clk_0_tb signals to the user in order to synchronize with the design. The
signal clk_0_tb is connected to clock clk_0 in the controller. If the user clock domain is
different from clk_0/clk_0_tb, the user should add FIFOs for all the input and outputs of
the controller (user application signals), in order to synchronize them to clk_0_tb clock.
The cal_done signal indicates the completion of initialization and calibration of the design.
reset_clk_200
clk_200 idelay_ctrl
idelay_ctrl_ready
ddrii_dll_off_n
cal_done ddrii_ld_n
clk_0_tb ddrii_rw_n
reset_clk_0_tb ddrii_k
wrdata_fifo_full ddrii_top ddrii_k_n
addr_fifo_ful ddrii_c Memory
Device
rd_data_valid ddrii_c_n
user_rd_data_rise ddrii_sa
User
Application user_rd_data_fall ddrii_bw_n
user_wrdata_wr_en ddrii_dq
user_addr_wr_en ddrii_cq
user_bw_n_rise
user_bw_n_fall
user_addr_cmd
user_wr_data_rise
user_wr_data_fall
ug086_c12_05_071508
Figure 12-5: Top-Level Block Diagram of the DDRII SRAM Design with a PLL and
without a Testbench
Figure 12-6 shows a top-level block diagram of a DDRII SRAM design without a PLL but
with a testbench. User should provide all the clocks and the locked signal. “Clocking
Scheme,” page 480 describes how to generate the design clocks from the user interface.
These clocks should be single-ended. The sys_rst_n signal is an active-Low system reset.
All design resets are generated using this system reset signal, locked signal, and the
IDELAYCTRL ready signal (idelay_ctrl_ready). The user application must have a
PLL/DCM primitive instantiated in the design, and all user clocks should be driven
through BUFGs. The error output signal compare_error indicates whether the case passes
or fails. The testbench module generates write and read address, write and read
commands, write data to the controller. It also compares the read data with written data.
The error signal is driven high on the data mismatches. The cal_done signal indicates the
completion of initialization and calibration of the design.
clk_200
idelay_ctrl
reset_clk_200 idelay_ctrl_ready
User PLL/DCM
Clocks and
Reset clk_0 reset_clk_270 ddrii_dll_off_n
infrastructure
clk_270 ddrii_ld_n
reset_clk_0
locked ddrii_rw_n
sys_rst_n ddrii_k
clk_90
ddrii_k_n
ddrii_top ddrii_c Memory
Device
ddrii_c_n
ddrii_sa
compare_error ddrii_bw_n
Status
Signals cal_done ddrii_dq
ddrii_cq
ug086_c12_06_012909
Figure 12-6: Top-Level Block Diagram of the DDRII SRAM Design without a PLL but
with a Testbench
Figure 12-7 shows a top-level block diagram of a DDRII SRAM design without a PLL or a
testbench. Users should provide all the clocks and the locked signal. “Clocking Scheme,”
page 480 describes how to generate the design clocks from the user interface. These clocks
should be single-ended. The sys_rst_n signal is an active-Low system reset. All design
resets are generated using this system reset signal, the locked signal, and the IDELAYCTRL
ready signal (idelay_ctrl_ready). The user application must have a PLL/DCM primitive
instantiated in the design, and all user clocks should be driven through BUFGs. The user
has to drive the user application signals. The design provides the clk_0_tb and
reset_clk_0_tb signals to the user in order to synchronize with the design. The signal
clk_0_tb is connected to clock clk_0 in the controller. If the user clock domain is different
from clk_0/clk_0_tb, the user should add FIFOs for all the input and outputs of the
controller (user application signals), in order to synchronize them to clk_0_tb clock.The
cal_done signal indicates the completion of initialization and calibration of the design.
clk_200
idelay_ctrl
reset_clk_200 idelay_ctrl_ready
User PLL/DCM
Clocks and
Reset clk_0 clk_90
infrastructure
clk_270
reset_clk_0
locked
sys_rst_n
reset_clk_270
ddrii_dll_off_n
ddrii_ld_n
cal_done ddrii_rw_n
clk_0_tb ddrii_k
reset_clk_0_tb ddrii_k_n
wrdata_fifo_full ddrii_top ddrii_c Memory
Device
addr_fifo_ful ddrii_c_n
rd_data_valid ddrii_sa
user_rd_data_rise ddrii_bw_n
User
Application user_rd_data_fall ddrii_dq
user_wrdata_wr_en ddrii_cq
user_addr_wr_en
user_bw_n_rise
user_bw_n_fall
user_addr_cmd
user_wr_data_rise
user_wr_data_fall
ug086_c12_07_012909
Figure 12-7: Top-Level Block Diagram of the DDRII SRAM Design without a PLL or
a Testbench
Implemented Features
This section provides details on the supported features of the DDRII SRAM controller.
CIO/SIO
The DDRII SRAM memory controller supports both Common I/O (CIO) and Separate I/O
(SIO) memory parts. MIG provides an option to select the required memory parts. CIO
memory parts have support for Burst Lengths 2 and 4, whereas SIO memory parts have
support only for Burst Length 2.
The memory type of the design generated using MIG is represented by a parameter
IO_TYPE in the design top RTL module. This parameter value can be either SIO or CIO in
the design top RTL module depending on the type of memory selected in MIG memory
controller options.
DDRII SRAM memory controller design RTL modules are generic, which means to say that
all the ports and logic related to both the memory types i.e., SIO and CIO (namely
ddrii_dq-CIO port and ddrii_d, ddrii_q-SIO ports) are present in all RTL modules all the
way up to the design top RTL module. When design is generated using MIG, depending
on the type of memory selected in the memory controller options, the design top RTL
module contains the parameter IO_TYPE value and the selected memory type ports.
Example: If the selected memory is a CIO part, then the design top RTL module has the
parameter IO_TYPE = CIO and the ddrii_dq port.
User can change the memory type from SIO to CIO and vice-versa, with a considerable
amount of design top RTL module and UCF modifications. Apart from changing the
parameter value IO_TYPE, appropriated memory ports should also be added, and the un-
necessary ports should either be connected to ground or left unconnected.
Example: If the parameter IO_TYPE value is changed from CIO to SIO in the design top
RTL module, then the design top RTL module port list must have the ports ddrii_d and
ddrii_q. The port ddrii_dq should be removed. The ddrii_top module instantiation in the
design top RTL module must have the signals ddrii_d and ddrii_q port mapped. User
must also take care of the UCF file which should be compatible with the modified design
top RTL module.
The parameter IO_TYPE can only have the values CIO or SIO, other values will result in
the controller misbehavior. Instead of modifying the RTL module manually, it is
recommended to generate the appropriated design using MIG. Custom memory part
feature can be utilized if required.
Implemented Features
adds the delays (in terms of number of clock cycles) to the existing single clock delay
between read and write command.
The parameter RD_TO_WR_LATENCY value should be an integer value between 0 and 3.
Any other value other than from the specified will be considered as value 3.
This parameter is used only for CIO designs. For SIO designs, this parameter is ignored by
the controller. For SIO designs there are separate data buses for read data and write data,
hence there is no need for data bus-turnaround period.
Address Increment
The address generation logic generates an incremental address pattern. The address
pattern can be generated as a linear incremental pattern or as burst incremental pattern.
This depends on the parameter BURST_INC.
For some memory models the address bits for the data bursts are considered internally,
hence a linear incremental address pattern will work. But for some memory models the
address bits for the data bursts are not considered internally, they are included in the
address given to the memory. Hence the address incrementing cannot be linear in this
scenario, only burst increment of address bits should be given to the memory model.
Address generation logic generates a linear incremental pattern of address bits if the
parameter BURST_INC is 0 and generates a burst incremental pattern of address bits if the
parameter BURST_INC is 1. The value of the parameter BURST_INC can be integer and
either 0 or 1.
MIG generates this parameter value depending on the type of the memory selected. User
can even manually edit this value in the generated design top RTL module.
Reset-Active Low
The design reset signal sys_rst_n is an active-Low signal. This active-Low reset input pin is
used to generate the design reset signals which run throughout the design. A parameter
RST_ACT_LOW is provided in the design top module. This parameter indicates whether
the input reset signal is an active-Low or active-High signal.
User can even drive an active-High reset signal as an input reset signal. But the parameter
RST_ACT_LOW should be set to 0. This indicates that the input reset signal is an active-
High signal.
The default value of this parameter is 1. This parameter must be manually modified by the
user in the design top module depending upon the requirement. The value of the
parameter RST_ACT_LOW can be either 0 or 1.
Debug Port
The debug port allows debugging and monitoring of physical layer read timing calibration
logic and timing. This port consists of signals brought to the design top level HDL from the
read calibration module (where the read timing calibration logic resides). These signals
provide information for debugging hardware issues when calibration does not complete or
read data errors are observed in the system even after calibration completes.
Debug port option can be enabled from MIG. By default the option is disabled. By enabling
the option from MIG, the design top-level block parameter DEBUG_EN is set to 1. When
this option is disabled the parameter value is 0. User can even enable/disable this
parameter in the design top-level block HDL module manually.
DCI Cascading
In Virtex-5 family devices, I/O banks that need DCI reference voltage can be cascaded with
other DCI I/O banks. One set of VRN/VRP pins can be used to provide reference voltage
to several I/O banks in the same column. With DCI cascading, one bank (the master bank)
must have its VRN/VRP pins connected to external reference resistors. Other banks in the
same column (slave banks) can use DCI standards with the same impedance as the master
bank, without connecting the VRN/VRP pins on these banks to external resistors. DCI
impedance control in cascaded banks is received from the master bank. This results in
more usable pins and in reduced power usage because fewer VR pins and DCI controllers
are used.
The syntax for representing the DCI Cascading in the UCF is:
CONFIG DCI_CASCADE = "<master> <slave1> <slave2> . . .";
There are certain rules that need to be followed in order to use DCI Cascade option:
1. The master and slave banks must all reside on the same column (left, centre, or right)
on the device.
2. Master and slave banks must have the same VCCO and VREF (if applicable) voltages.
MIG supports DCI Cascading. This feature enables placing all 36 bits of read data, as well
as the CQ and CQ# clocks, in the same bank when interfacing with 36-bit DDRII SRAM
SIO memory parts. While interfacing the 36 bits of data of a 36-bit DDRII SRAM CIO
memory part, first 18 bits of data and corresponding CQ are placed in one bank and the
remaining 18 bits of data and corresponding CQ# are placed in another bank. This is done
to prevent the WASSO limit from exceeding a given bank.
Following are the possibilities for generating the design with DCI support using the DCI
Cascade option.
• For x36 SIO memory part designs, the DCI Cascade option is always enabled. This
feature cannot be disabled if DCI support is needed.
• For x36 CIO memory part designs, the DCI Cascade is optional. DCI Support for these
designs can be selected with or without the DCI Cascade selection. By default, the
DCI Cascade option is disabled for these designs.
Implemented Features
• For x18 memory part designs, DCI Cascade is optional. DCI support for these designs
can be selected with or without the DCI Cascade selection. By default DCI Cascade
option is disabled for these designs.
• For x18 memory part with 18-bit data width designs, the DCI Cascade option is
disabled and cannot be utilized.
When DCI Cascade option is selected, MIG displays the master bank selection box for each
column for the FPGA in the bank selection page.
• If an FPGA has no banks or has only non-DCI banks in a particular column, the
master bank selection box for that column is not displayed.
• All the data read banks are treated as slave banks.
• When a data read bank is selected in a particular column, the master bank selection
box for that particular column is activated and the rest of the master bank selection
boxed for other columns are deactivated.
• In a particular column, when a data read bank is selected and there are no DCI banks
left in that column for master banks selection, then the design cannot be generated.
The data read banks must be moved to the other columns in order to select the master
banks.
• The master bank selection box shows all the bank numbers in that particular column
other than the data read banks and non-DCI banks in that column.
• There can be only one master bank selected for each column of banks.
• MIG utilizes VRN/VRP pins in the slave banks for pin allocation.
• For each master bank, VRN/VRP pins are reserved. When the selected master bank
does not have at least one input or bidirectional pin of the HSTL_I_DCI_18 I/O
standard, then MIG allocates a dummy input pin masterbank_sel_pin and the I/O
standard of this dummy pin is assigned to HSTL_I_DCI_18. For example, consider an
x18 SIO memory part design where the data read bank is selected as master bank,
MIG reserves the VRN/VRP pins of the bank and the dummy input pin is not
required.
• The dummy input pin is required to satisfy the requirement of the master bank. Any
master bank should have at least one input or bidirectional pin of HSTL_I_DCI_18
I/O standard to program the DCI option.
• When all the banks in a particular column are allocated with data or data read pins,
MIG chooses only the required banks for data or data read pin allocation, depending
upon the design data width. When there is only one bank allocated for data/data read
pins in a column of banks of an FPGA, then that particular data/data read bank
should not be selected as a master bank. Doing so would result in an inappropriate
DCI_Cascade syntax in the UCF of the generated design.
The center column banks of all the FPGAs are divided into two sections, top-column banks
and bottom-column banks. Top-column banks are the banks available above the 0th bank,
and the bottom column banks are the banks available below 0th bank. Therefore, there are
two master bank selection boxes for the center column.
The VRN/VRP pins for a master bank do not need to be reserved in the reserve pins page.
Once the design is ready with the valid master and slave bank selection, the same master
and slave bank information (along with the DCI Cascading syntax) is provided in the UCF
when the design is generated.
For more information about DCI Cascade, refer to DCI Cascading in the Virtex-5 FPGA
User Guide [Ref 10] and the Xilinx® Constraints Guide.
CQ/CQ_n Implementation
For x36 memory part, controller design uses both CQ and CQ_n for capturing the read
data. CQ and CQ_n pins are allocated to P pins of an FPGA by MIG. For x36 memory part
controller designs, first 18 bits of the read data is captured using CQ and the second 18 bits
of the read data is captured using CQ_n.
For x18 memory part controller designs, only CQ is used for capturing the read data. CQ_n
is not used and is connected to a dummy logic. This dummy logic is used just to retain
CQ_n pin during the place and routing of the design.
Generic Parameters
Generic Parameters
The DDRII SRAM design is a generic design that works for all the features that are
mentioned previously. User input parameters are defined as parameters for Verilog and
generics in VHDL in the design modules and are passed down the hierarchy. For example,
if the user selects a burst length of 4, then it is defined as follows in the <top_module>
module:
Parameter BURST_LENGTH = 4, // Burst Length
The user can change this parameter in <top_module> for various burst lengths to get the
desired output. Same concept holds for all the other parameters listed in the
<top_module> module. Table 12-2 lists the details of all parameters.
BURST_LENGTH Burst length of the design For SIO designs, the value is Integer. 2 or 4
only 2. For CIO designs, the
value can be 2 or 4
CLK_WIDTH Number of input clock pairs. Number of K/K_n and Integer. 1,2,3,4.5,6,7,8
Memory One input clock pair for C/C_n
Parameters every memory part
RD_TO_WR_LATENCY Number of clock cycle The value selected can be only Integer. 0,1,2,3
delays controller must insert 0 or 1 or 2 or 3. Any other
between a read command value will be considered as 3
and an immediate write
command
DEBUG_EN To enable debug logic and See Appendix E, “Debug Integer. 1,0
able to view the debug Port” for details
signals on the ChipScope™
analyzer
Memory Controller
DDRII SRAM SIO
Physical Interface Memory Device
clk_0
Read/Write
User Interface State Machine clk_270
clk_90
clk_0 reset_clk_0
reset_clk_0 addr_fifo_full reset_clk_270
Read/Write
user_addr_wr_en control ddrii_ld_n
FIFOs
user_wrdata_wr_en read enable ddrii_rw_n
Command bit
user_addr_cmd ddrii_sa
Address path
user_wr_data_rise
Wire path ddrii_d
user_wr_data_fall
user_bw_n_rise ddrii_bw_n
user_bw_n_fall
user_rd_data_rise ddrii_d
Read path
user_data_fall ddrii_cq
rd_data_valid
ddrii_cq_n
wrdata_fifo_full ddrii_k
addr_fifo_full
ddrii_k_n
cal_done
ddrii_dll_off_n
Delay
clk_0 Calibration
State Machine
ug086_c12_08_071508
Memory Controller
DDRII SRAM SIO
Physical Interface Memory Device
clk_0
Read/Write
User Interface State Machine clk_270
clk_90
clk_0 reset_clk_0
reset_clk_0 addr_fifo_full reset_clk_270
Read/Write
user_addr_wr_en control ddrii_ld_n
FIFOs
user_wrdata_wr_en read enable ddrii_rw_n
Command bit
user_addr_cmd ddrii_sa
Address path
user_wr_data_rise
Wire path ddrii_bw_n
user_wr_data_fall
user_bw_n_rise ddrii_dq
user_bw_n_fall
user_rd_data_rise
Read path
user_data_fall ddrii_cq
rd_data_valid
ddrii_cq_n
wrdata_fifo_full ddrii_k
addr_fifo_full
ddrii_k_n
cal_done
ddrii_dll_off_n
Delay
clk_0 Calibration
State Machine
ug086_c12_09_071508
User Interface
User interface module receives and stores the user data, command and address
information in respective FIFOs. The control module generates the required control signals
for this module. During a write operation, the data stored in wr_data_interface is read and
given to the physical layer to output to the memory. Similarly, during a read operation, the
data from the memory is read via IDDR and is given to user with a valid signal
(rd_data_valid). This valid signal indicates valid data on the user_rd_data_rise and
user_rd_data_fall signals. Table 12-4 lists the user interface signals.
The FIFO36, FIFO36_72, and FIFO18 primitives are used for loading address and data from
the user interface. The FIFO36 primitive is used in the ddrii_top_addr_cmd_interface
module, the FIFO36_72 primitive is used in the ddrii_top_wr_data_interface module, and
the FIFO18 primitive is used in the ddrii_top_wr_data_interface module. Every FIFO has
two FIFO threshold attributes, ALMOST_EMPTY_OFFSET and ALMOST_FULL_OFFSET,
that are set to 128 in the RTL. These values can be changed as needed. For valid FIFO
threshold offset values, refer to UG190 [Ref 10].
Test Bench
MIG generates two RTL folders, example_design and user_design. The example_design
includes the synthesizable test bench, while user_design does not include the test bench
modules. The MIG test bench performs one write command followed by one read
command in an alternating manner. The number of words in a write command depends on
the burst length. For a burst length of 4, the test bench writes a total 4 data words for a
single write command (2 rise data words and 2 fall data words). For a burst length of 2, the
test bench writes a total of 2 data words. The data pattern is an incremental pattern. On
every write command, the data pattern is incremented by one, and this is repeated with
each subsequent write command. The initial data pattern for the first write command is
000. The test bench writes the 000, 001, 002, 003 data pattern in a sequence in which 000
and 002 are rise data words, and 001 and 003 are fall data words for a 9-bit design. The
falling edge data is always rising edge data plus one. For a burst length of 2, the data
sequence for the first write command is 000, 001. The data sequence for the second write
command is 002, 003. The pattern is then incremented for the next write command. For
data widths greater than 9, the same data pattern is concatenated for the other bits. For a
36-bit design and a burst length of 4, the data pattern for the first write command is
000000000, 008040201, 010080402, 0180C0603.
Address generation logic generates the address in an incremental pattern for each write
command. The same address location is repeated for the next read command. In Samsung
components, the burst address increments are done by the memory, so the address is
generated by the test bench in a linear incremental pattern. In Cypress parts, the MIG test
bench increments the address for burst operation. After the address reaches the maximum
value, it rolls back to the initial address, i.e., 00000.
During reads, comparison logic compares the read pattern with the pattern written, i.e., the
000, 001, 002, 003 pattern. For example, for a 9-bit design of burst length 4, the data
written for a single write command is 000, 001, 002, and 003. During reads, the read
pattern is compared with the 000, 001, 002, 003pattern. Based on a comparison of the
data, a status signal error is generated. If the data read back is the same as the data written,
the error signal is 0, otherwise it is 1.
Memory Controller
The DDRII SRAM memory controller can initiate write/read commands for both CIO and
SIO memory parts. These write/read commands are issued as long as the User address-
command FIFO is not empty. CIO designs support both Burst Length 4 and 2 whereas SIO
designs support only Burst Length 2.
DDRII SRAM memory controller module (ddrii_top_ctrl_sm) is completely generic. This
means to say that by just passing the correct parameter to this module, it generates
read/write command signals for CIO/SIO, BL2/BL4 designs.
For CIO designs, controller takes care for the data bus-turnaround condition. When ever
there is a situation where in an immediate write command has to be issued after a read
command is issued, one extra clock cycle delay should be introduced before issuing the
write command. According to memory vendor specifications, this will accommodate for
data bus-turnaround period (read data to write data).
Controller introduces a single clock delay between read and write command whenever
there is a read followed by a write condition. The parameter RD_TO_WR_LATENCY value
adds the delays (in terms of number of clock cycles) to the existing single clock delay
between read and write command.
The parameter RD_TO_WR_LATENCY value should be an integer value between 0 and 3.
Any other value other than the specified will be considered as value 3.
For Separate I/O (SIO) designs there are separate data buses for read data and write data,
so there is no need for data bus-turnaround. For Separate I/O (SIO) designs, controller will
not consider the RD_TO_WR_LATENCY parameter.
Controller module decodes the user command and issues the specified command to the
memory. The command_bit signal is decoded as a write command when it equals logic 0
and command_bit signal is decoded as a read command when it equals logic 1. The
read/write command signals are generated based on the parameters BURST_LENGTH,
IO_TYPE and RD_TO_WR_LATENCY. The controller state machine issues the commands
in the correct sequence while determining the timing requirements of the memory.
Once the calibration is complete, controller issues a read enable to the address-command
FIFO (ddrii_top_addr_cmd_interface module). The command bit is extracted from the
output of the address-command FIFO. This command bit is then decoded to issue
read/write commands. Figure 12-10 shows the controller state machine flow chart.
No
cal_done
Yes
Yes
Address FIFO=EMPTY
No
No No
Programmable
Latency=0
Yes
Read from Address
FIFO
BL2
Burst Length
BL4
ug086_c12_10_082908
For Burst Length 4 controller designs, commands (read/write) to the memory are issued
on every alternate clock. In this scenario controller issues read enable to the address-
command FIFO on every alternate clock.
For Burst Length 2 controller designs, commands (read/write) to the memory are issued
on every clock. In this scenario controller issues read enable to the address-command FIFO
on every cock.
When ever the previous decoded command is a read command and the present command
which is decoded is a write command, there is a need for introducing a single clock delay
before the write command is issued to the memory. This single clock delay is for data bus-
turnaround period. This single clock delay is applied on the decoded write command
immediately. The same single clock delay is applied on the address-command FIFO read
enable.
When the parameter RD_TO_WR_LATENCY value is non-zero value (any integer
between 0 and 3), a delay (in number of clock cycles) specified by this parameter in
addition to the single clock cycle delay is applied on the decoded write command before it
is presented on to the command bus of the memory. The same delay is applied on the
address-command FIFO read enable.
cal_done=0
CAL_WAIT_ST
Address FIFO
Empty
CMD_WAIT_ST
(latency=0)and(Burst Length=2)
RD_TO_WR_LATENCY_ST
(latency=0)and(Burst Length=4)
Non-zero latency
ug086_c12_11_071808
Physical Interface
It is the interface between the controller and the memory. It includes the input/output
blocks (IOBs) and other primitives used to read and write the double data rate signals to
and from the memory, such as IDDR and ODDR. This module also includes the IODELAY
elements of the Virtex-5 FPGA. These IODELAY elements are used to delay the data signals
to capture the read data.
The memory control signals, such as ld_n, rw_n and DLLoff_n are driven from the buffers
in the IOBs. All the input and output signals to and from the memory are referenced from
the IOB to compensate for the routing delays inside the FPGA.
Infrastructure
The infrastructure module generates the design clocks and reset signals. When differential
clocking is used, sys_clk_p, sys_clk_n, clk_200_p, and clk_200_n signals appear. When
single-ended clocking is used, sys_clk and idly_clk_200 signals appear. In addition, clocks
are available for design use and a 200 MHz clock is provided for the IDELAYCTRL
primitive. Differential and single-ended clocks are passed through global clock buffers
before connecting to a PLL/DCM. For differential clocking, the output of the
sys_clk_p/sys_clk_n buffer is single-ended and is provided to the PLL/DCM input.
Likewise, for single-ended clocking, sys_clk is passed through a buffer and its output is
provided to the PLL/DCM input. The outputs of the PLL/DCM are 0° and 270°
phase-shifted versions of the input clock). After the PLL/DCM is locked, the design is in
the reset state for at least 25 clocks. The infrastructure module also generates all of the reset
signals required for the design.
PLL/DCM
In MIG 3.0 and later, the DCM is replaced with a PLL for all Virtex-5 FPGA designs. If the
user selects a design with a PLL in the GUI, the infrastructure module will have both PLL
and DCM codes. The CLK_GENERATOR parameter enables either a PLL or a DCM in the
infrastructure module. The CLK_GENERATOR parameter is set to PLL by default. If the
user wants to use DCM, this parameter should be changed manually to DCM.
For designs without a PLL, the user application must have a PLL/DCM primitive
instantiated in the design, and all user clocks should be driven through BUFGs.
Idelay_ctrl
This module instantiates the IDELAYCTRL primitive of the Virtex-5 FPGA. The
IDELAYCTRL primitive is used to continuously calibrate the individual delay elements in
its region to reduce the effect of process, temperature, and voltage variations. A 200 MHz
clock has to be fed to this primitive.
MIG uses the “automatic” method for IDELAYCTRL instantiation in which the MIG HDL
only instantiates a single IDELAYCTRL for the entire design. No location (LOC)
constraints are included in the MIG-generated UCF. This method relies on the ISE® tools to
replicate and place as many IDELAYCTRLs as needed (for example, one per clock region
that uses IDELAYs). Replication and placement are handled automatically by the software
tools if IDELAYCTRLs have same refclk, reset, and rdy nets. A new constraint called
IODELAY_GROUP associates a set of IDELAYs with an IDELAYCTRL and allows for
multiple IDELAYCTRLs to be instantiated without LOC constraints specified. ISE software
generates the IDELAY_CTRL_RDY signal by logically ANDing the RDY signals of every
IDELAYCTRL block.
The IODELAY_GROUP name should be checked in the following cases:
• The MIG design is used with other IP cores or user designs that also require the use of
IDELAYCTRL and IDELAYs.
• Previous ISE software releases 8.2.03i and 9.1i had an issue with IDELAYCTRL block
replication or trimming. When using these revisions of the ISE software, the user must
instantiate and constrain the location of each IDELAYCTRL individually.
See UG190 [Ref 10] for more information on the requirements of IDELAYCTRL placement.
Clocking Scheme
Figure 12-13 shows the clocking scheme for this design. Global and local clock resources
are used. The global clock resources consists of a PLL or a DCM, two BUFGs on PLL/DCM
output clocks, and one BUFG for clk_200. The local clock resources consist of regional I/O
clock networks (BUFIO). The global clock architecture is discussed in this selection.
The MIG tool allows the user to customize the design such that the PLL/DCM is not
included. In this case, system clocks clk_0 and clk_270, and IDELAYCTRL clock clk_200
must be supplied by the user.
Clocking Scheme
Notes:
1. See “User Interface Accesses,” page 489 for timing requirements and restrictions on the user interface
signals.
ug086_c12_19_012909
Figure 12-12: Clocking Scheme for QDRII Interface Logic Using PLL
CLK_FB CLK_270
CLK_270
BUFG
ug086_c12_12_071508
Figure 12-13: Clocking Scheme for DDRII SRAM Memory Interface Logic Using
DCM
Initialization
DDRII SRAM memory is initialized through a specified sequence.
1. A 200 µs wait period is initiated by the DDRII SRAM controller in order to achieve a
stable power condition for the DDRII SRAM memory part.
2. After the stable power and clock (K,K#), ddrii_dll_off_n is set to high to enable DLL in
DDRII SRAM memory part.
3. The additional wait period of 2048 clock cycles is applied in order to lock the DLL.
4. After this sequence of initialization, DDRII SRAM memory part is ready for
calibration.
Delay Calibration
The delay calibration logic is responsible for providing the required amount of delay on
the Read data and the input clocks (CQ/CQ#) to align the FPGA clock in the data valid
window.
The delay calibration is enabled due to the available IODELAY elements in all the I/Os in
the Virtex-5 device. The IODELAY elements delay the input read data by increments of
75 ps, up to a maximum delay of 5 ns. IDELAYCTRLs, available in every bank in
Virtex-5 devices, and help to maintain the resolution of the IODELAY elements.
Calibration begins when the IDELAYCTRL ready signal has been asserted. Calibration is
done in three stages:
1. First Stage Calibration: Calibration of input clocks (CQ/CQ#) with respect to read data.
2. Second Stage Calibration: Calibration of input clocks (CQ/CQ#) and read data with
respect to the FPGA clock.
3. Third Stage Calibration: Read enable calibration that determines when the read data is
valid. This helps to generate the data valid signal rd_data_valid.
First Stage Calibration: Calibration of input clocks (CQ/CQ#) and read data
This stage of calibration helps to align CQ/CQ# inside the data valid window. CQ/CQ# is
delayed more than the read data by the delay on the BUFIO and the route delay of the
CQ/CQ# before it clocks the read data in the ISERDES. In a case where the data valid
window is considerably reduced, this delay on the BUFIO can move the edge of the CQ or
the CQ# clock outside of the valid window. This calibration stage helps to avoid the de-
synchronization of the clock and data. The calibration stage includes a dummy write to the
memory with a constant rise data pattern of 1s and a constant fall data pattern of 0s
followed by constant read to the same location until the first calibration is complete. The
non-transitioning rise and fall data pattern helps to avoid any metastability caused by the
FPGA clock in the second and third register stages in the ISERDES.
The steps involved in this stage include:
1. Increment CQ/CQ# delay taps to see if CQ/CQ# is within the valid window. If it is,
continue to increment CQ/CQ# delay taps until the hold window range is measured.
2. Reset CQ/CQ# delay taps.
3. Increment read data delay taps to determine the read data setup window with respect
to CQ/CQ#.
Table 12-4: DDRII SRAM System Interface Signals (with a PLL) (Cont’d)
Signal Name Direction Description
compare_error Output This signal represents the status of the
comparison of read data when compared to the
corresponding write data.
cal_done Output This signal is asserted when the design
initialization and calibration is complete.
Notes:
1. The number of address bits used depends on the density of the memory part. The controller ignores the
unused bits, which can all be tied to High.
Table 12-8 lists the signals between the user interface and the controller.
Write Interface
Figure 12-14 illustrates the user interface block diagram for write operations.
User Interface
user_addr_cmd
addr_fifo_empty
Address FIFO
(FIFO36) Controller
user_addr_wr_en 1024 x 36 addr_fifo_rd_en
addr_fifo_full
user_bw_n_rise bw_n_rise
ug086_c12_13_071508
The following steps describe the architecture of Address and Write Data FIFOs and how to
perform a write burst operation to DDRII SRAM memory from user interface.
1. The user interface consists of an Address FIFO, Data FIFOs and a byte write FIFO.
These FIFOs are built out of Virtex-5 FPGA FIFO primitives. Address FIFO is FIFO36
primitive with 1K x 36 configuration, Data FIFO is FIFO36_72 primitive with 512 x 72
configuration and Byte Write FIFO is FIFO18 primitive with 1024 x 18 configuration.
2. Address FIFO is common for both write and read commands. It comprises an address
part and a command part. Command bit discriminate between write and read
commands. Single instantiation of FIFO36 constitutes the Address FIFO.
3. Two separate sets of Data FIFOs are being used for storing the rising edge and falling
edge data to be written to DDRII SRAM memory from user interface. For 9bit, 18bit
and 36bit configurations, controller concatenates the extra bits of Data FIFO with 0s.
4. Byte Write FIFO is being used to store the Byte Write signals to DDRII SRAM memory
from user interface. Extra bits are concatenated with zeros
5. User can initiate a write to memory by writing to the address FIFO and the write data
FIFO only when the calibration is complete (cal_done signal is asserted high) and
FIFOs full flags are asserted low. Users should not access any of these FIFOs until the
signal cal_done is asserted. During Calibration process controller writes pattern data
in to the Data FIFOs. Signal cal_done assures that the clocks are stable, reset process is
completed, calibration is complete and the controller is ready to accept the user data
and commands. Status signal addr_fifo_full is asserted high when Address FIFO is full
and status signal wrdata_fifo_full is asserted high or Data FIFOs or Byte Write FIFO
are full.
6. Both the address FIFO and write data FIFO full flags are deasserted with power-on.
7. User should assert the address FIFO write enable signal user_addr_wr_en along with
address bus user_addr_cmd to store the write address and write command in to the
address FIFO.
8. The write command should be given by setting the user_addr_cmd[0] bit as logic 0.
9. User should assert the data FIFO write enable signal user_wrdata_wr_en along with
write data user_wr_data_rise, user_wr_data_fall and user_bw_n_rise, user_bw_n_fall
to store the rise data and fall data in to rise data FIFO and fall data FIFO and byte write
enable for rise data and fall data in to byte write FIFO respectively.
10. Controller reads the Address, Data and Byte Write FIFOs when they are not empty.
Controller reads the address FIFO by issuing the addr_fifo_rd_en signal. Controller
reads the write data FIFO and byte write FIFOs by issuing the wrdata_fifo_rd_en
signal after the address FIFO is read. Controller decodes the command part after the
address FIFO is read.
clk_0_tb
cal_done
addr_fifo_full
user_addr_wr_en
wrdata_fifo_full
user_wrdata_wr_en
ug086_c12_14_071508
Figure 12-15: Write User Interface Timing Diagram for Burst Length 4
11. Figure 12-15 shows the timing diagram for a write command with a burst length of
four. As shown in the figure the command bit (user_addr_cmd [0]) is a write command
(command bit is indicated as 'W') which is identified with a logic 0 on that bit. The
address should be asserted for one clock cycle as shown. For burst length of four, each
write to address FIFO has two write to the Data FIFO consisting of two rising-edge and
two falling-edge data.
12. Figure 12-16 shows the timing diagram for a write command with a burst length of
two. As shown in the figure, the command bit (user_addr_cmd [0]) is a write
command (command bit is indicated as W) which is identified with a logic 0 on that
bit. The address should be asserted for one clock cycle as shown. For burst length of
two, each write to address FIFO has a single write to the Data FIFO consisting of a
single rising-edge and single falling-edge data.
clk_0_tb
cal_done
addr_fifo_full
user_addr_wr_en
wrdata_fifo_full
user_wrdata_wr_en
ug086_c12_15_071508
Figure 12-16: Write User Interface Timing Diagram for Burst Length 2
13. From the previous user interface timing diagrams, it is clear that writing addresses and
command bits in to address FIFO and write data in to data FIFOs are two different and
independent actions. Users must take complete responsibility for writing bursts of
data bits corresponding to a particular address in to the respective FIFOs at the same
time. If not, the controller will output an undesired data bits which will be written in to
the memory.
Read Interface
Figure 12-17 illustrates the user interface block diagram for read operations.
User Interface
user_addr_cmd
addr_fifo_empty
user_rd_data_rise
user_rd_data_fall
From phy_top
rd_data_valid
ug086_c12_16_071508
The following steps describe the architecture of the read user interface and how to perform
a DDRII SRAM burst read operation
1. The read user interface consists of an Address FIFO built out of a Virtex-5 FPGA
FIFO18 configuration 1024 x 18.
2. User can initiate a read to the memory by writing to the address FIFO only when the
calibration is complete (cal_done signal is asserted high) and FIFOs full flags are
asserted low.
3. User should assert the address FIFO write enable signal user_addr_wr_en along with
address bus user_addr_cmd to store the read address and read command in to the
address FIFO.
4. The read command should be given by setting the user_addr_cmd[0] bit as logic 1.
5. Controller read the address FIFO when it is not empty by issuing the read enable
signal addr_fifo_rd_en. After decoding user_addr_cmd [0] bit, the controller issues a
read command to the memory at the specified address.
6. Prior to the actual read and write command, t he design calibrates the latency in
number of clock cycles from the time the read command is issued to the time the data
is received. Using this pre-calibrated delay information, the controller delays the read
data for required number of clocks.
7. The rd_data_valid signal is asserted high when data is available in the read data
FIFOs.
8. User must access the read data as soon as rd_data_valid signal is asserted high.
clk_0_tb
cal_done
addr_fifo_full
user_addr_wr_en
rd_data_valid
ug086_c12_17 _071508
Figure 12-18: Read User Interface Timing Diagram for Burst Length 4
clk_0_tb
cal_done
addr_fifo_full
user_addr_wr_en
rd_data_valid
ug086_c12_18_071508
Figure 12-19: Read User Interface Timing Diagram for Burst Length 2
9. Figure 12-18 shows the timing diagram for a read command with a burst length of
four. As shown in the figure the command bit (user_addr_cmd [0]) is a read command
(command bit is indicated as 'R') which is identified with a logic 1 on that bit. The
address should be asserted for one clock cycle as shown. For burst length of four, each
read command is associated with four read data's, two rising-edge and two falling-
edge data. The signal rd_data_valid is asserted high for two clocks for each read
command issued to the memory. The read data captured is a valid data as long as the
signal rd_data_valid is asserted high as shown in the figure.
10. Figure 12-19 shows the timing diagram for a read command with a burst length of two.
As shown in the figure the command bit (user_addr_cmd [0]) is a read command
(command bit is indicated as 'R') which is identified with a logic 1 on that bit. The
address should be asserted for one clock cycle as shown. For burst length of two, each
read command is associated with two read data's, one rising-edge and one falling-edge
data. The signal rd_data_valid is asserted high for one clock for each read command
issued to the memory. The read data captured is a valid data as long as the signal
rd_data_valid is asserted high as shown in the figure.
11. After the read command and the corresponding address bits are loaded in to the
address FIFO, it can take a minimum of 14 clock cycles, for the controller to assert
rd_data_valid high.
Table 12-9 shows the read latency of the controller.
MIG shows checkboxes for Data, Address, System_Control and System_Clock in the bank
selection for DDRII SRAM CIO designs.
Pinout Considerations
It is recommended to select banks within the same column on MIG. This helps to avoid the
clock tree skew that the design would incur while crossing from one column to another.
When the data read, data write, address and system control pins are allocated to individual
banks in a column, then the system control pins must be allocated in a bank that is central
to the rest of banks allocated. This helps reduce datapath and clock path skew.
For larger FPGAs (for example, FF1738, FF1760, and similar), it is recommended to place
data read, data write, address and system control pins in the same column to reduce
datapath and clock path skew.
Supported Devices
Supported Devices
Table 12-11 lists the memory parts supported by MIG for DDRII SRAM design. In the
supported devices, X in the memory part column denotes a single alphanumeric character.
For example, K7I321884X can be either K7I321884C or K7I321884M.
Table 12-11: Supported memory parts for DDRII SRAM (Virtex-5 FPGAs)
Memory Part Speed Grade Density in Mb Memory Width Burst Length IO Type Vendor
Chapter 13
Introduction
The sim folder provides a simulation environment for the design generated in the
ModelSim simulator. This folder includes the simulation testbench module (sim_tb_top)
and various other modules to simulate the design properly. The simulation testbench
module also generates system input signals, clocks, and resets to the design.
Figure 13-1 depicts a block diagram of the simulation environment.
Optional
Instances
or
Modules, Design Memory
such as Top Model
Testbench,
DCM Logic,
Etc.
Simulation Testbench
UG086_c13_12_020609
The simulation testbench module integrates the complete system through port maps, a
design clock, a clock for the IDELAYCTRL module, and reset generation logic. With clocks
and system reset signals as inputs, the system can be divided into these blocks:
• Optional instances: These function as testbench modules for designs without
testbenches and as DCM/PLL logic for designs without DCMs or PLLs. The testbench
here refers to a synthesizable test module that provides test inputs such as data,
address, and commands to the design. In designs with a testbench, the testbench is
part of the design top module. Similarly, DCM/PLL logic is part of design top for
designs generated with DCMs or PLLs.
• Design top: In the example design, the design top module connects with the clocks,
reset, memory interface signals, and status signals. In the user design, design top
connects with the user interface signals, clocks, reset, memory interface signals, and
status signals. Design top includes the controller part, an optional testbench, and
DCM/PLL logic.
• Memory model: This is provided with the memory core of the component selected.
MIG provides a memory model in Verilog only. VHDL memory models are not
provided.
The simulation testbench module can be in Verilog or VHDL, depending on the HDL used
in the design.
Supported Features
The MIG simulation environment supports:
• All component widths
• Designs with or without a testbench
• Designs with or without DCM/PLL
• All supported components and DIMMs (UDIMMs, SODIMMs and RDIMMs)
• Deep memories and ECC for Virtex®-4 FPGA DDR2 direct-clocking designs
• Differential and single-ended DQS for Virtex-4 FPGA DDR2 direct-clocking designs
• CIO and SIO for RLDRAM II designs
• CIO and SIO for Virtex-5 FPGA DDRII SRAM designs
• Multicontroller simulation testbenches for the Virtex-5 FPGA (QDRII SRAM and
DDR2 SDRAM)
Unsupported Features
The MIG simulation environment does not support:
• Multicontroller simulation testbenches for Virtex-4 FPGA DDR2 direct-clocking
designs
• VHDL memory models
• Cypress components for all SRAM designs
Note: The simulation testbench is specific to each design that is generated from MIG. Design
parameters should not be changed after generating the design, except for the RESET_ACTIVE_LOW
or RST_ACT_LOW parameters.
DDR2 SDRAM
Table 13-1 lists the files generated in the sim folder for Virtex-5 FPGA DDR2 SDRAM
designs.
DDR SDRAM
Table 13-2 lists the files generated in the sim folder for Virtex-5 FPGA DDR SDRAM
designs.
QDRII SRAM
Table 13-3 lists the files generated in the sim folder for Virtex-5 FPGA QDRII SRAM
designs. Samsung and Cypress memory models are not available in the sim folder for
simulations when the design is generated by the MIG tool. The appropriate memory
model must be downloaded from the vendor’s web site. MIG designs are functionally
verified with the R12 version of the Samsung memory models.
DDRII SRAM
Table 13-4 lists the files generated in the sim folder for Virtex-5 FPGA DDRII SRAM
designs. Samsung and Cypress memory models are not available in the sim folder for
simulations when the design is generated by the MIG tool. The appropriate memory
model must be downloaded from the vendor’s web site. MIG designs are functionally
verified with the R12 version of the Samsung memory models.
Multicontroller
Table 13-5 lists the files generated in the sim folder for Virtex-5 FPGA Multicontroller
designs.
DDR SDRAM
Table 13-8 lists the files generated in the sim folder for Virtex-4 FPGA DDR SDRAM
designs.
RLDRAM II
Table 13-9 lists the files generated in the sim folder for Virtex-4 FPGA RLDRAM II designs.
QDRII SRAM
Table 13-10 lists the files generated in the sim folder for Virtex-4 FPGA QDRII SRAM
designs. Samsung and Cypress memory models are not available in the sim folder for
simulations when the design is generated by the MIG tool. The appropriate memory
model must be downloaded from the vendor’s web site. MIG designs are functionally
verified with the R12 version of the Samsung memory models.
DDRII SRAM
Table 13-11 lists the files generated in the sim folder for Virtex-4 FPGA DDRII SRAM
designs. Samsung and Cypress memory models are not available in the sim folder for
simulations when the design is generated by the MIG tool. The appropriate memory
model must be downloaded from the vendor’s web site. The MIG designs are functionally
verified with the R12 version of the Samsung memory models.
DDR2 SDRAM
Table 13-12 lists the files generated in the sim folder for Spartan-3 FPGA DDR2 SDRAM
designs.
DDR SDRAM
Table 13-13 lists the files generated in the sim folder for Spartan-3 FPGA DDR SDRAM
designs.
The when command checks for completion of calibration and then runs for an additional
50 μs. To increase the run time after completion of calibration, 50 μs should be changed to
some other value, such as 100 μs. There should be a space between the value and the unit:
when {$now = @800 us} {stop}
In the above example, this when command assumes importance if the previous condition
({/sim_tb_top/phy_init_done = 1}) does not become valid up to 800 μs. ModelSim
then pauses in this case and exits from the simulation.
Note: The run time value of the second when command (800 μs) should always be greater than that
of the first when command (50 μs). Otherwise, the simulation result display at the end erroneously
shows that the calibration failed for Virtex-4 and Virtex-5 designs, and initialization failed for
Spartan-3/3E/3A/3AN/3A DSP designs.
Design Notes
This section provides notes on the various designs discussed in this chapter:
• The sim.do file contains commands that suppress Numeric Std package and
Arithmetic operation warnings.
• At the end of simulation, a test result is displayed depending on whether or not the
design generates an error signal. The displayed result does not consider the error or
violations generated by the memory models or the simulator. The transcript file
should be reviewed for any errors or warnings generated.
• If the license agreement is not accepted when generating the design, the memory
model is not generated in the sim folder. In such a case, the memory model might
have to be downloaded from a memory vendor site and then placed in the sim folder.
The files should be renamed accordingly, as described in “Files in sim Folder,” page
504. According to the design generated, the memory model parameters are passed
from the sim.do file. For example, the following command is used for a Micron
DDR2 SDRAM design:
vlog +incdir+. +define+x256Mb +define+sg3 +define+x8 ddr2_model.v
In this case, +define+x256Mb shows the device density. This parameter is not present
in the downloaded memory model and should be ignored. The +define+sg3
segment shows the memory speed grade and +define+x8 shows the device data
width.
• For DIMM designs, MIG uses instantiations of component models.
• In Qimonda parts, DDR2 SDRAM design simulations undergo a memory allocation
failure and the ModelSim GUI closes automatically. This occurs only on certain
systems (based on swap memory in Linux and cache memory in Windows). To avoid
this, the address mapping for the NO_SPARSE_MEM and SPARSE_MEM Qimonda
memory models are modified. The SPARSE_ROW_BITS and SPARSE_COL_BITS
parameter values are modified. Address mapping for the SPARSE_ROW_MAP and
SPARSE_COL_MAP parameters are modified based on the SPARSE_ROW_BITS and
Design Notes
Known Issues
This section discusses some known issues that can occur during simulation.
DDR2 SDRAM
• For VHDL designs, these warning messages might be displayed due to metastable
values during power on:
#Warning:NUMERIC_STD.TO_INTEGER: metavalue detected, returning 0
#Time: 0 ps
Iteration:0
Instance:/ddr2_test_tb/u_mem_controller/u_ddr2_top_0/u_mem_if_top_0/u_
phy_top_0/u_phy_io_0/gen_phy_calib/u_phy_calib_0/gen_chk_cnt__3
These messages are suppressed in the sim.do file and appear only if the design is
simulated without using the sim.do file generated by MIG. These messages can be
ignored.
• In x16 or x8 components with a data width of eight, this warning appears while
compiling the design:
DDR SDRAM
• For VHDL designs, these warning messages might be displayed due to metastable
values during power on:
#Warning: NUMERIC_STD.TO_INTEGER: metavalue detected, returning 0
#Time: 0 ps
Iteration: 0
Instance:/ddr1_tb/u_mem_controller/u_ddr1_top_0/u_mem_if_top/u_phy_top
/u_phy_io/gen_phy_calib_gate/u_phy_calib
These messages are suppressed in the sim.do file and appear only if the design is
simulated without using the sim.do file generated by MIG. These messages can be
ignored.
• In x16 or x8 components with a data width of eight, this warning appears while
compiling the design:
# ** Warning: [3] ../sim/sim_tb_top.vhd(316): Range 0 to -1 is null.
This warning only appears for VHDL designs and can be ignored.
• If the design is rerun without deleting old files that were generated during simulation,
this warning might be displayed:
# ** Warning: (vlib-34) Library already exists at "work".
This warning can be ignored.
QDRII SRAM
• Due to metastable values during power on, warning messages might be displayed.
These messages are suppressed in the sim.do file and appear only if the design is
simulated without using the sim.do file generated by MIG. These messages can be
ignored.
DDRII SRAM
• Due to metastable values during power on, warning messages might be displayed.
These messages are suppressed in the sim.do file and appear only if the design is
simulated without using the sim.do file generated by MIG. These messages can be
ignored.
Design Notes
Multicontroller
• While simulating multicontroller designs involving DDR2 SDRAM and QDRII SRAM
controllers, the simulator displays these warning messages:
# ** Warning: ../sim/k7rxxxx84x_r12_c1.v(336): [TMREN] - Redefinition
of macro: NUM_DATA.
# ** Warning: ../sim/k7rxxxx84x_r12_c1.v(337): [TMREN] - Redefinition
of macro: NUM_BW.
# ** Warning: ../sim/k7rxxxx84x_r12_c1.v(338): [TMREN] - Redefinition
of macro: SIZE_MEM.
These warnings arise because all the memory models are compiled first, and then each
memory model is recompiled with the parameters set in MIG by the user. These
messages can be ignored.
DDR SDRAM
• Due to metastable values during power on, warning messages might be displayed.
These messages are suppressed in the sim.do file and appear only if the design is
simulated without using the sim.do file generated by MIG. These messages can be
ignored.
• In x16 or x8 components with a data width of eight, this warning appears while
compiling the design:
# ** Warning: [3] ../sim/sim_tb_top.vhd(316): Range 0 to -1 is null.
This warning only appears for VHDL designs and can be ignored.
• If the design is rerun without deleting old files that were generated during simulation,
this warning might be displayed:
# ** Warning: (vlib-34) Library already exists at "work".
This warning can be ignored.
RLDRAMII
• Although the DLL is not used, the memory model displays this warning message:
"Read prior to DLL locked. Failing to wait for synchronization to occur
may result in violation of tAC or tdkCk parameters"
• For multiplexed addressing mode, the memory model issues displays this error
message:
"Load mode reserved bits must be set to zero"
• Due to metastable values during power on, warning messages might be displayed.
These messages are suppressed in the sim.do file and appear only if the design is
simulated without using the sim.do file generated by MIG. These messages can be
ignored.
QDRII SRAM
• Due to metastable values during power on, warning messages might be displayed.
These messages are suppressed in the sim.do file and appear only if the design is
simulated without using the sim.do file generated by MIG. These messages can be
ignored.
Design Notes
DDRII SRAM
• Due to metastable values during power on, warning messages might be displayed.
These messages are suppressed in the sim.do file and appear only if the design is
simulated without using the sim.do file generated by MIG. These messages can be
ignored.
DDR2 SDRAM
• Due to metastable values during power on, warning messages might be displayed.
These messages are suppressed in the sim.do file and appear only if the design is
simulated without using the sim.do file generated by MIG. These messages can be
ignored.
• In x16 or x8 components with a data width of eight, this warning appears while
compiling the design:
# ** Warning: [3] ../sim/sim_tb_top.vhd(316): Range 0 to -1 is null.
This warning only appears for VHDL designs and can be ignored.
• If the design is rerun without deleting old files that were generated during simulation,
this warning might be displayed:
# ** Warning: (vlib-34) Library already exists at "work".
This warning can be ignored.
• When simulating a design for Qimonda parts, the simulator displays this error
message:
"# QI ERR: Illegal command".
This error arises before memory initialization and can be ignored.
• While generating parts using Create Custom Part, proper address values must be
entered. For Qimonda parts, 512 MB and 1 GB models are supported. For custom
parts of bank address value 3, a 1 GB model is output. If the column address entered is
less than or equal to 11, this error message is displayed:
"Address is Reversed"
Thus, the column address entered should be greater than 11.
DDR SDRAM
• Due to metastable values during power on, warning messages might be displayed.
These messages are suppressed in the sim.do file and appear only if the design is
simulated without using the sim.do file generated by MIG. These messages can be
ignored.
• In x16 or x8 components with a data width of eight, this warning appears while
compiling the design:
# ** Warning: [3] ../sim/sim_tb_top.vhd(316): Range 0 to -1 is null.
This warning only appears for VHDL designs and can be ignored.
• If the design is rerun without deleting old files that were generated during simulation,
this warning might be displayed:
# ** Warning: (vlib-34) Library already exists at "work".
This warning can be ignored.
Chapter 14
Symptoms in Hardware
- Calibration Failure
- Data Bit/Byte Corruption/Errors
The following sections go into detail on each of these important debugging steps to aid in
providing resolution to calibration failures and data corruptions or errors.
Introduction
There are three main steps in verifying the board layout for a memory interface, as shown
in Figure 14-2.
Symptoms in Hardware
- Calibration Failure Verify Board Layout Guidelines
- Data Bit/Byte Corruption/Errors
Verify Memory Implementation
Guidlines Such as Pin-out,
Termination, and Trace Matching
Verify Board Layout Guidelines are Properly Followed
Calculate WASSO
It is important to take into consideration WASSO limits when generating a MIG pinout.
The FPGA data sheets define the SSO limits for each bank. WASSO calculations take this
into account along with design-specific parameters, such as board-level inductance, input
logic-low threshold, input undershoot voltage, and output loading capacitance. WASSO
ensures even distribution of fast/strong drivers across the package, that the number of
simultaneously switching outputs does not exceed the per-bank limit and that the chip
does not generate excessive ground bounce.
WASSO Calculators for Virtex®-4 devices [Ref 34] or Virtex-5 devices [Ref 35] should be
used to find WASSO limits based on board-specific parameters.
These calculations should be run during both pre-board layout and post-board layout. The
results found can then be entered in the Bank Selection page of the MIG GUI. (Refer to
“Bank Selection,” page 54.) MIG follows these WASSO Limits when generating the pinout.
Please see Appendix C, “WASSO Limit Implementation Guidelines” for further
information.
Introduction
There are four main steps in verifying the design implementation of a MIG output as
shown in Figure 14-3:
Behavioral Simulation
Running behavioral simulation verifies the functionality of the design. Both the
example_design and user_design provided with the MIG DDR2 controllers include a
complete environment which allows the user to simulate the reference design and view the
outputs. Scripts are provided to run behavioral simulation.
• For Virtex-4 family designs, see “Simulating the DDR2 SDRAM Design” in Chapter 3.
• For Spartan®-3/3E/3A/3AN/3A DSP family designs, see “Tool Output” in Chapter 8.
• For Virtex-5 family designs, see “Simulating the DDR2 SDRAM Design” in Chapter 9.
The Xilinx® UNISIM libraries must be mapped into the simulator. If the UNISIM libraries
are not set up for your environment, go to the COMPXLIB chapter of the Development
Systems Reference Guide section for assistance compiling Xilinx simulation models and
setting up the simulator environment. This guide can be found in the ISE® Software
Manuals.
Introduction
For a detailed discussion of the Spartan-3 FPGA DDR2 interface design, see application
notes XAPP454 [Ref 15] and XAPP768c [Ref 24].
Symptoms in Hardware
Spartan-3 FPGA
- Calibration Failure Physical Layer Debug
- Data Bit/Byte Corruption/Errors
Verify Placement and Routing
DQ Routing
The template router set through the environment variable ensures the data bits are routed
from a PAD to a Distributed Memory to capture the data in an Asynchronous FIFO using
the Local Clock to write the data, and a Global Clock to read the data. These routes require
a template to guarantee that the delay remains constant between all data bits.
Once the design is implemented, load the resultant .ncd and .pcf files into FPGA Editor
to visually verify the template routes for the data bits, as follows:
1. Open the design in FPGA Editor by selecting Start → Programs → Xilinx ISE 10.1i
→ Accessories → FPGA Editor, or load through the View/Edit Routed Design
(FPGA Editor) option in the Processes tab of an ISE project.
2. In some cases, turning Stub Trimming off provides a better picture of the route. To do
this, select File → Main Properties and turn off Stub Trimming in the General tab.
When Stub Trimming is enabled, FPGA Editor does not display the entire route. If Stub
Trimming is disabled, you can see the entire length of the routing segment. Stub
Trimming is enabled in Figure 14-5 and Figure 14-6.
3. Search within the List1 window for *dq* under the All Nets pull-down. Select all of the
DQ data bit nets (e.g., main_00/top0/dq(0)) within the window and highlight these
nets by clicking the Hilite button in the right-hand column. This allows for visual
inspection of the delay routes. Zoom into the area with the highlighted nets and verify
that the placement looks like Figure 14-5 or Figure 14-6.
UG086_c13_05_122107
UG086_c13_06_122107
4. Next, verify that the delays on the nets are consistent. Again, select all of the DQ data
bit nets in the List1 window. This time click on the Delay button located in the right-
hand column. This lists the worst-case delay for the DQ bits. Using this delay
information, inconsistent routing can be quickly identified. There should be less than
75 ps of skew (ideally less than 50 ps) between the data nets. The delay values depend
on the device speed grade and Top/Bottom versus Left/Right implementation but
have been observed to range between 300–700 ps.
If preferred, export the delay information to view the report in an Excel spreadsheet. Select
File → Export to export the delay information to a .csv file.
DQS Routing
The delayed strobes (dqs*_delayed_col*) need to use the local clocking resources available
in the device for the clock routing. The local routing resources used depend on the pin
placement specified during generation in the MIG tool. Full hex lines that have low skew
are located throughout the device. Left and right implementations use Vertical Full Hex
(VFULLHEX) lines for local clock routing. Top and bottom implementations use VLONG,
VFULLHEX, and HFULLHEX lines for local clock routing.
PAR routes from the Local Clock PAD to a series of LUTs to implement the scheme
explained in detail in XAPP768c. From the output of the final LUT delay, the delayed
strobe/Local Clock (dqs*_delayed_col*) routes to all of the FIFO bits.
To verify the pinout and usage of the template router, the net skew and max delay on the
local clock (dqs*_delayed_col*) must be within spec. To verify these values, open the PAR
report (.par file) and scroll to the Clock Report section. For most Spartan-3 platform
devices, the Net Skew is less than 40 ps, and the Max Delay is approximately 550 ps. For
Spartan-3A and Spartan-3A DSP devices, the Net Skew is less than 65 ps, and the Max
Delay is approximately 400 ps.
The FPGA Editor can then be used to view the local clock placement. To view the template
routes for the delayed strobes, search in the List1 window for *dqs*_delayed_col* in the All
Nets pull-down. Select all the nets (e.g., main_00/top0/data_path0/dqs0_delayed_col0)
and select Hilite from the right-hand column. This command highlights the nets of interest.
Then zoom into this range of highlighted signals to view the placement. If local clocking is
used, one of the two structures shown in Figure 14-7 and Figure 14-8 is seen.
UG086_c13_07_122107
Figure 14-7: Local Clock (Top/Bottom) for dqs*_delayed_col* LUT Delay Elements
UG086_c13_08_122107
Figure 14-8: Local Clock (Left/Right) for dqs*_delayed_col* LUT Delay Elements
If the skew and delays are within spec and the layout for the Local Clock and Data bits
match the previous figures, the template routes for DQS have been properly implemented.
If the DQ or delayed DQS signals do not verify properly, ensure that the UCF follows the
guidelines specified in Appendix A.
Loopback Timing
The timing on the loopback signal is critical to the proper implementation of the data
capture algorithm because the delayed loopback signal generates the write enable for the
read data FIFOs. The causes for incorrect loopback timing are:
• Incorrect route delay on the loopback signal
• The loopback signal must be delayed by the sum of the FPGA forward clock and
the DQS trace length. This is most commonly implemented through a physical
board trace.
• Changes to the MIG pinout after generation
The symptoms of incorrect loopback timing are:
• The first data in a burst is usually corrupted
• Depending on trace delays, only certain bits in the bus exhibit the problem
Start Calibration
Y Y
tap_sel_done = 1
Calibration Done
init_done = 1
UG086_c13_09_043009
Signals of Interest
The status signals shown in Table 14-1 can be used to help determine where the failure
occurs:
Introduction
This section discusses internal signals to observe in order to assist in isolating problems
that could occur during read data timing calibration in the Virtex-4 FPGA DDR2 SerDes
design. For more information on the calibration algorithm used in this design, refer to
application note XAPP721. [Ref 23]
Start Calibration
Calibration Done
dp_dly_slct_done =1
UG086_c13_10_122107
Signals of Interest
The status signals shown in Table 14-2 can be used to help determine where the failure
occurs:
Introduction
This section discusses internal signals to observe in order to assist in isolating problems
that could occur during read data timing calibration in the Virtex-5 FPGA DDR2 design.
Additional UCF and other parameter requirements of this design are also discussed. For
more information on this design, refer to application note XAPP858 [Ref 27].
By triggering on the end of each stage of calibration using the calib_done signals and
looking at the read data prior to moving to the next stage, it is possible to determine what
caused calibration to fail. The expected data patterns for each stage can be seen in
Table 14-4.
For further debugging, the calibration state machine can be found in the
PHY_CALIB.V/VHD module and is documented in XAPP858 [Ref 27]. The correct
behavior can be confirmed by running the example simulation provided by the MIG tool.
If calibration fails, it is important to verify that the board layout guidelines have been
followed (see “Verifying Board Layout,” page 524) and to proceed to the “General Board-
Level Debug,” page 540 section.
- Calibration Failure
- Data Bit Corruption/Errors
Board Measurements
- Measure Signal Integrity
- Measure Supply & Vref Voltages
Verify Board Layout Guidelines - Measure Bus Timing
UG086_c13_11_122107
• All MIG designs that support multiple DIMM sockets (“deep” configurations)
calibrate only on the first DIMM socket, and the maximum frequency is reduced
from the maximum achievable if only one rank of memory is used. This was done
to account for both the additional loading and the fact that there are no inherent,
process-related timing differences between the DIMM sockets. Factors that cause
the timing to be different between the DIMMs—for example, PCB trace routing
differences between the FPGA and each of the DIMMs—can result in read failures
on all but the very first DIMM.
It might also be necessary to determine whether the data corruption is due to writes or
reads. This can be difficult to determine because, if the writes are the issue, readback of the
data appears corrupted as well. In addition, issues with control/address timing affect both
writes and reads. Some experiments that can be tried to isolate the issue:
• If the errors are intermittent, have the controller issue a small initial number of writes,
followed by continuous reads from those locations. Do the reads intermittently yield
bad data? If so, this might point to a read problem.
• Check/vary the control and address timing:
• For a heavily loaded control/address bus (as is the case for an unregistered or
SO-DIMM), it might be necessary to use 2T timing to allow for more setup and
hold time for the control/address signals.
• Note that the chip select (CS_N) signal to the memory remains a 1T signal, even
though it can also have a heavy load. In this case, it might be necessary to advance
the assertion of CS_N by a quarter of a clock cycle. This requires changing the
code for the CS_N output flop to use CLK90 instead of CLK0.
• Check/Vary only write timing:
• If on-die termination is used, check that the correct value is enabled in the DDR2
device and that the timing on the ODT signal relative to the write burst is correct.
• For Virtex-5 FPGA designs, it is possible to use ODELAY to vary the phase of DQ
relative to DQS. In addition, a PLL (rather than a DCM) can be used to generate
CLK0 and CLK90 used for the write output timing. The phase outputs of a PLL
can be fine-tuned, and in this way the phase between DQ and DQS can be varied.
• Vary only read timing:
• Vary the LUT or IDELAY taps after calibration for the bits that are returning bad
data. This affects only the read capture timing.
• For Virtex-4 and Virtex-5 FPGA designs, check the IDELAY values after
calibration. (For the Virtex-5 FPGA DDR2 design, the PHY layer debug port can
be used.) Look for variations between IDELAY values. IDELAY values should be
very similar for DQs in the same DQS group.
Board Measurements
Refer to the HW-Simulation Correlation Section in the ML561 User Guide [Ref 14] as a
guide for expected bus signal integrity.
• 0.9V: VREF
• 0.9V: VTT Termination
• Internal:
• 1.8V: DDR2 VDD, DDR2 VDDL
• 2.5V: FPGA VCCAUX
• 1.0V or 1.2V: VCCINT
Make sure to check these levels when the bus is active. It is possible these levels are correct
when the bus is idle but droop when the bus is active.
Clocking
If the memory interface is having issues running at the target speed, try running the
interface at a lower speed.
• Unfortunately, not all designs can accommodate this, as it is dependent on the clock
generation scheme used.
• Running at a lower speed increases marginal setup time and/or hold time due to PCB
trace mismatch, poor SI, and excessive loading.
If excessive input/system clock jitter might be an issue, the onboard PLL can be used in
Virtex-5 FPGA designs to filter input clock jitter.
Synthesizable Testbench
MIG provides a “synthesizable testbench” containing a simple state machine. The state
machine takes the place of the user-specific backend logic and issues a simple repeating
write-read memory test. This can be used as an alternative to the user's backend logic to
provide a test of the memory interface during initial hardware bring-up. The advantage of
using the synthesizable testbench is that it rules out any issues with the user's backend
logic interfacing with the MIG User Interface block.
The testbench has limitations. It only checks a limited number of memory locations, and
the data pattern is a repeating pattern. The user can change the testbench code to expand
its capabilities.
Appendix A
FPGA
Bank
rst_dqs_div_out
I/O
rst_dqs_div_in PCB Loopback(1)
I/O
CC I/O P Strobe_P
CC I/O N Strobe_N
Data associated
I/O
with strobe
UG086_aA_01_071509
Notes:
1. Only Spartan FPGA designs require the loopback signal.
Figure A-1: FPGA Bank with Data, Strobes, and PCB Loopback
Timing Analysis
MIG generates timing analysis spreadsheets for all designs of the Virtex®-5 and Virtex-4
families, and Spartan® series under the docs folder. Each design has different timing
analysis spreadsheets for read_data_timing, write_data_timing, and addr_cntrl_timing.
Evaluation of the PERIOD constraint by the static timing analyzer is not sufficient to
determine if the memory interface is functional at a particular frequency. The PERIOD
constraint covers the internal timing between synchronous elements. These spreadsheets
cover the concept of timing budgets for the interface between the FPGA and memory
device.
The spreadsheets provide information about the data valid window and the margins
available at the selected frequency. They also provide information about different
uncertainty parameters that are to be considered for timing analysis.
Pin Assignments
MIG generates pin assignments for a memory interface based on certain rules depending
on the design technique, but does not provide the best possible pin assignment for every
board implementation. During layout it might be necessary to swap pin locations
depending on the number of layers available and the interface topology. The best way to
change the pin assignment is to first apply changes on a byte basis then swap bits within a
byte. Calculate the PCB loopback length, if required, after pin swapping and trace
matching. The following rules of thumb are provided to help designers determine how
pins can be swapped.
Any changes to the pin assignments require modifications to the UCF provided by MIG
and might require changes to the source code depending on the changes made.
For all MIG Virtex and Spartan FPGA designs, the address and control pins can be
swapped with each other as needed to avoid crossing of the nets on the printed circuit
board.
(even though the package is pinout compatible). MIG can be used to generate a design
with the same pinout for multiple devices of a single package. This results in separate
UCF generation for each device because for the same pinout, the associated SLICE
location constraints are different for different FPGA devices.
If a DQS is placed on either the W3 or W4 pins (the two IOBs share a tile) in the
XC3S1500-FG676 device, these +5 tiles can be used for the DQ placement:
W1/W2
U7/V7
V4/V5
V2/V3
U5/U6
These -6 tiles can also be used for the DQ placement:
W5/V6
W6/W7
Y1/Y2
AA1/AA2
Y4/Y5
AA3/AA4
2. MIG designs use two columns of CLBs because the Spartan-3 FPGA architectures have
only two FIFOs per CLB, and because each bit of data requires two FIFOs, one for
rising edge data and the other for falling edge data. Therefore, each bit of data uses one
CLB, and the two pads in an I/O tile use two side-by-side CLBs. Due to restricted
Spartan-3 FPGA routing, the top pad must always be assigned to the first column and
the bottom pad to the second column of CLBs. With this routing implementation, the
DQ lines from both pads have nearly the same route delay. For better convention, one
of the CLB columns is dedicated for the odd-numbered dq bits (dq[1], dq[3]) and the
other is dedicated to the even numbered dq bits (dq[0], dq[2]), depending on the FPGA
family and the side (left or right) on which the data banks reside.
When the data bits are assigned on the left side of a Spartan-3 generation FPGA, the
rule for assigning dq bits differs for the Spartan-3E FPGA. However, on the right side,
it is the same for all Spartan-3 generation FPGAs.
Left side:
• Spartan-3E FPGA: All the even dq signals should be allocated to the bottom pad
and odd dq signals to the top pad in an I/O.
• Others: All the even dq signals should be allocated to the top pad and odd dq
signals to the bottom pad in an I/O.
Right side:
• All the even dq signals should be allocated to the top pad and odd dq signals to
the bottom pad in an I/O.
Figure A-2 shows the DQ bit allocation in an I/O tile of bank 3 (left side) of the
XC3S250E-FT256 Spartan-3E FPGA in accordance to the above rule.
UG086_aA_07_050509
3. A byte can be swapped with another byte as long as the pin names of all the necessary
signals associated with the byte i.e., DQS, all DQ bits and DM, and the corresponding
slice location of all the signals are exchanged.
4. Within a byte, all the even-numbered DQ bits can be swapped only with the other
even-numbered bits, and odd-numbered bits only with the other odd bits because two
copies of delayed DQS strobes are generated internally for the two columns of CLBs.
One of the DQS strobes is used for the even-numbered bits and the other for the
odd-numbered bits. Each copy is delayed a specific amount of time relative to the
placement of the even (or odd) Read Data FIFOs.
5. The DQ and DQS signals should not be allocated to the IOBs of the same tile.
6. DQS and DQS# should be allocated to a differential pair, such as the pair of IOBs of a
single tile, when differential DQS is selected.
7. Memory clock signals ck and ck# should be allocated to a differential pair.
8. DQ, DQS, DM, the memory clocks, and the loopback signals (rst_dqs_div_in and
rst_dqs_div_out) should be on the same side of the FPGA.
9. The signals generated in different phases of the clocks should not be allocated in the
same I/O tile. The DM and DQ signals are generated on clk90. Thus, these signals
cannot be allocated in the same I/O tile where the address, DQS, or control signals that
are generated on clk are allocated.
10. For memory interfaces that do not provide a signal to indicate when the read data is
valid, a data-valid signal must be provided on the PCB. This loopback is used as a
write-enable signal for the Read Data FIFOs. A strobe is used to latch the data. Two
pins are needed per design: one to output the signal, and one to input the return signal.
The length of the loopback is defined as:
PCB loopback = CLK delay to memory + strobe delay
11. The rst_dqs_div_in and rst_dqs_div_out signals constitute the loopback pair. To meet
the loopback delay requirement, the rst_dqs_div_in and rst_dqs_div_out signals can
be allocated in the same I/O tile.
12. The rst_dqs_div_in and rst_dqs_div_out signals should be placed at the center of the
data bits. If the number of data bytes is even, allocate them at the center of the data bits.
If it is odd, allocate them immediately either before or after the centered data byte. If
this is not done, the data capture might not be reliable. This is necessary because the
MIG design uses the RST_DQS_DIV feedback loop to generate a write enable to all the
data capture FIFOs.
If the data width is 32, allocate the rst_dqs_div pins between the second and the third
bytes. However, if the data width is 40, the rst pins can be allocated between either the
second and third bytes or the third and fourth bytes. Refer to “Verify Placement and
Routing,” page 528 for details on these verification steps. For more information on the
Spartan-3 FPGA data capture technique, see XAPP768c [Ref 24].
UG086_aA_06_020109
data, strobe, and data mask) can be swapped with any other DQS group in same bank. The
initial pinout that MIG selects also affects the amount of calibration logic MIG generates.
MIG generates one calibration unit for each bank that contains data bits. Therefore, a DQS
group cannot be swapped with other byte groups on different banks without appropriate
modification to the source code. Within a DQS group, data bits can be swapped with other
data bits, and the data signals should be placed on pins near the associated DQS strobe.
Termination
These rules apply to termination:
1. IBIS simulation is highly recommended for all high-speed interfaces.
2. Single-ended signals are to be terminated with a pull-up of 50Ω to VTT at the load (see
Figure A-4). A split 100Ω termination to VCCO and 100Ω termination to GND can be
used (see Figure A-5), but takes more power. For bidirectional signals, the termination
is needed at both ends of the signal (DCI/ODT or external termination).
VTT
RT =
50Ω
ZQ =
Source Load
50Ω
UG086_aA_02_022208
VCCO
2 * ZQ =
100Ω
ZQ =
Source Load
50Ω
2 * ZQ =
100Ω
UG086_aA_03_022208
ZQ =
Source_P Load_P
50Ω
2 * ZQ =
100Ω
ZQ =
Source_N Load_N
50Ω
UG086_aA_04_020406
4. All termination must be placed as close to the load as possible. The termination can be
placed before or after the load provided that the termination is placed within one inch
of the load pin.
5. DCI can be used at the FPGA as long as the DCI rules are followed (such as
VRN/VRP).
I/O Standards
These rules apply to the I/O standard selection for DDR SDRAMs:
• MIG-generated designs use the SSTL2 CLASS I I/O standard by default for memory
address and control signals, and use the SSTL2 CLASS II I/O standard for memory
data, data-mask, and data-strobe signals. When DCI is selected in MIG, DCI for SSTL2
CLASS I can be applied only to memory interface signals that are inputs to the FPGA.
• The user can select CLASS II or CLASS I I/O standards from MIG. When SSTL2
CLASS II is selected in MIG, it is applied to all the memory interface signals.
• When DCI is selected in MIG, the DCI I/O standard is applied to all the memory
interface signals.
These rules apply to the I/O standard selection for DDR2 SDRAMs:
• MIG-generated designs use the SSTL18 CLASS II I/O standard by default for all
memory interface signals. When DCI is selected in MIG, DCI for SSTL18 CLASS II is
applied on input, output, and in-out memory interface signals.
• CLASS II is recommended for all SSTL signals in memory interfaces. However, better
signal integrity can sometimes be achieved with CLASS I for the address/control
group. The user can select between CLASS II or CLASS I I/O standards from the MIG
tool. Based on the IBIS simulation results, the CLASS should be selected in the GUI.
• Selection of the CLASS option is allowed only for the address/control group in
Virtex-4 and Virtex-5 FPGA designs. Data group signals always use the SSTL18
CLASS II I/O standard. When SSTL18 CLASS I is selected in the MIG tool, the
I/O standard for differential signals (such as memory clocks) remains SSTL18
CLASS II.
• CLASS can be selected for both data and address/control groups in
Spartan-3 FPGA designs. When SSTL18 CLASS I is selected in the MIG tool, the
I/O standard for all signals of the selected group is SSTL18 CLASS I.
• When DCI is selected in MIG for SSTL18 CLASS I, the DCI I/O standard is applied
only to memory interface signals that are outputs to the FPGA.
Trace Lengths
Trace length matching must also include the package delay information. The PARTGen
utility [Ref 33] generates a .pkg file that contains the package trace length in microns for
every pin of the device under consideration.
For example, to obtain the package delay information for a Virtex-5 LX50T-FF1136 device
used on an ML561 board, issue the following command within a DOS command shell:
partgen -v xc5vlx50tff1136
This generates an xc5vlx50tff1136.pkg file in the current directory with package trace
length information for each pin (unit: micron or µm). Use the typical 6.5 fs per micron
(6.5 ps per millimeter) conversion formula to obtain the corresponding electrical
propagation delay. While applying specific trace-matching guidelines for each of the
memory interfaces as described below, consider this additional package delay term for the
overall electrical propagation delay. Note that different die in the same package may have
different delays for the same package pin. If this case is expected, average the values
appropriately.
Calibration factors out PCB trace mismatches during reads, but the trace matching
requirements are needed during writes.
Memory-Specific Guidelines
Each memory interface has three sections:
• Pin assignments
• Termination
• Trace lengths
Trace lengths given are for high-speed operation and can be relaxed depending on the
applications target bandwidth requirements. Be sure to include the package delay when
determining the effective trace length. These internal delays can be found through the
PACE tool.
Memory-Specific Guidelines
DDR/DDR2 SDRAM
Pin Assignments
These rules apply to pin assignments for DDR and DDR2 SDRAM:
1. The DQ and DM bits of a byte are to be placed in the same bank as the associated DQS.
The DQ bits must be kept close together for better routing.
2. Address and control signals are to be placed in the same bank or placed in banks near
each other.
If all control signals cannot fit in one bank, CK, ODT, and CKE should be selected first
for placement in another bank.
3. Spartan FPGA designs require a loopback signal. The loopback signal should be
placed at the center of the DQ bits.
If a bank is pin-limited and there is a need to free up a few pins, the following actions are
to be considered:
1. The loopback signals can be eliminated in Virtex-4 FPGA MIG designs because they
are no longer required.
2. The CKE signals can be tied together for multiple devices.
3. For DIMMs, non-critical features need not be implemented, such as
PAR_IN/PAR_OUT and the SPD interface (SA, SPD, SCL).
The loading of address (A, BA), command (RAS_N, CAS_N, WE_N), and control (CS_N,
ODT) signals depends on various factors, such as speed requirements, termination
topology, use of unbuffered DIMMs, and multiple rank DIMMs.
The address and command signals should be implemented with 2T clocking, i.e., asserted
for two cycles, so these signals can handle higher loading without impacting the timing
budget. Virtex-4 FPGA SerDes designs and Virtex-5 FPGA DDR2 designs are implemented
with 2T clocking of address and command signals.
The control signals (CS_N and ODT) have 1T clocking, and so their replication is
recommended when the loading is higher. If the application is pin-limited to implement
lighter loading on critical clock signals going to memory, it might be necessary to use an
external PLL to generate multiple copies of the clock signals.
For descriptions of 1T and 2T clocking, refer to Micron technical note TN-47-01[Ref 36].
Termination
These rules apply to termination for DDR/DDR2 SDRAM:
1. For DIMMs, the CK signals are to be terminated by a 5 pF capacitor between the two
legs of the differential signal instead of the 100Ω resistor termination, because these
signals are already terminated on each DIMM.
ZQ =
CK_P Load_P
50Ω
5 pF
ZQ =
CK_N Load_N
50Ω
UG086_aA_05_020406
2. The ODT and CKE signals are not terminated. These signals are required to be pulled
down during memory initialization with a 4.7 kΩ resistor connected to GND.
3. ODT, which terminates a signal at the memory, applies to the DQ/DQS/DM signals
only. If ODT is used, the Mode register must be set appropriately in the RTL design.
4. The Virtex-5 FPGA DDR2 interface requires that if parallel termination is used at the
memory end, it must be ODT rather than external termination resistor(s). This is a
requirement of the read capture scheme used.
To save board space, DCI at the FPGA and ODT at the memory can be used to minimize the
number of external resistors on the board.
Trace Lengths
These rules indicate the maximum electrical delays between DDR/DDR2 SDRAM signals
at 333 MHz:
1. ± 25 ps maximum electrical delay between any DQ and its associated DQS/DQS#
2. ± 50 ps maximum electrical delay between any address and control signals and the
corresponding CK/CK#
3. ± 100 ps maximum electrical delay between any DQS/DQS# and CK/CK#
QDRII SRAM
Pin Assignments
These rules apply to pin assignments for Virtex-4 FPGA QDRII SRAM:
1. All CQ signals are placed on clock-capable pins, if the Use CC option is selected;
otherwise any I/O pin is used. CQ is only connected to the P side of the P-N pair.
2. The Q bits of a byte are placed in the same bank as its associated CQ.
The Q bits must be kept close together for optimal routing.
These rules apply to pin assignments for Virtex-5 FPGA QDRII SRAM:
1. All CQ/CQ# signals are placed on clock-capable pins. CQ and CQ# are connected only
to the P side of the CC P-N pair.
2. The Q bits of a byte are placed in the same bank as its associated CQ/CQ#.
The Q bits must be kept close together for optimal routing.
Termination
These rules apply to termination of QDRII SRAM signals:
1. Termination of the qdr_dll_off_n signal should be done based on the recommendation
of the memory vendor. If the vendor requires this signal to be driven by the FPGA, this
signal should be pulled down with a 4.7 kΩ resistor connected to GND. However, if
the vendor requires this signal not to be driven by the FPGA, this signal should be
connected to a pull-up resistor with a value recommended by the SRAM vendor.
2. DCI can also be used on CK for QDRII+ support (QVLD signal from memory to
FPGA).
To save board space, DCI is to be used at the FPGA to minimize the number of external
resistors on the board.
Memory-Specific Guidelines
I/O Standards
These rules apply to the I/O Standard selection for QDRII SRAM.
• MIG-generated designs use the HSTL CLASS I I/O standard by default for all
memory interface signals.
• When DCI is selected in MIG, the DCI standard for HSTL CLASS I is applied only to
memory interface signals that are inputs to FPGA.
Trace Lengths
These rules provide the maximum electrical delays between QDRII SRAM signals:
1. ± 25 ps maximum electrical delay between data and its associated CQ.
2. ± 50 ps maximum electrical delay between address and control signals.
3. ± 100 ps maximum electrical delay between address/control and data.
4. ± 100 ps maximum electrical delay between address/control and K/K# clocks.
5. ± 25 ps maximum electrical delay between data (write port) and K/K# clocks.
6. There is no relation between CQ and the K clocks. K should be matched with D, and
CQ should be matched with Q (read data).
DDRII SRAM
Pin Assignments
These rules apply to pin assignments for Virtex-4 FPGA DDRII SRAM:
1. All CQ signals are placed on clock-capable pins, if the Use CC option is selected;
otherwise, any I/O pin is used. CQ is only connected to the P side of the P-N pair.
2. The Q bits of a byte are placed in the same bank as its associated CQ. The Q bits must
be kept close together for optimal routing.
These rules apply to pin assignments for Virtex-5 FPGA DDRII SRAM:
1. All CQ/CQ# signals are placed on clock-capable pins. CQ and CQ# are connected only
to the P side of the CC P-N pair.
2. The Q bits (SIO designs) or DQ bits (CIO designs) of a byte are placed in the same bank
as its associated CQ/CQ#. The Q bits or DQ bits must be kept close together for
optimal routing.
Termination
These rules apply to termination of DDRII SRAM signals:
1. Termination of the ddr_dll_off_n signal should be done based on the recommendation
of the memory vendor. If the vendor requires this signal to be driven by the FPGA, this
signal should be pulled down with a 4.7 kΩ resistor connected to GND. If the vendor
requires this signal not to be driven by the FPGA, this signal should be connected to a
pull-up resistor with a value recommended by the SRAM vendor.
To save board space, DCI is to be used at the FPGA to minimize the number of external
resistors on the board.
I/O Standards
These rules apply to the I/O Standard selection for DDRII SRAM:
• MIG-generated designs use the HSTL CLASS I I/O standard by default for all
memory interface signals except the DQ/Q and CQ/CQ# signals.
• When DCI is selected in the MIG tool, the DCI standard for HSTL CLASS I is applied
only to memory interface signals that are inputs to the FPGA, except the CQ/CQ# and
DQ/Q signals where the HSTL CLASS II DCI standard is applied.
Trace Lengths
These rules provide the maximum electrical delays between DDRII SRAM signals:
1. ± 25 ps maximum electrical delay between data and its associated CQ.
2. ± 50 ps maximum electrical delay between address and control signals.
3. ± 100 ps maximum electrical delay between address/control and data.
4. ± 100 ps maximum electrical delay between address/control and K/K# clocks.
5. ± 25 ps maximum electrical delay between data (write port) and K/K# clocks.
6. There is no relation between CQ and the K clocks. K should be matched with D, and
CQ should be matched with Q (read data).
RLDRAM II
Pin Assignments
These rules apply to pin assignments for RLDRAM II:
1. All QK signals are to be placed on Clock-Capable I/O pairs if the Use CC option is
selected in the tool; otherwise normal I/O pins are used. P is connected to the P side
and N is connected to the N side of the pair.
2. The DQ bits of a byte are placed in the same bank as the associated QK.
The DQ bits must be kept as close as possible for optimal routing.
3. The loopback signal is not required because RLDRAM II provides a data valid signal
for capturing the read data.
If the design is pin constrained, only common I/O (CIO) can use a bidirectional DQ data
bus.
Termination
This rule applies to termination of RLDRAM II signals:
1. DCI can be used on DQ/QK at the FPGA provided that DCI rules are followed (such
as VRN/VRP).
To save board space, use DCI at the FPGA and ODT at the memory to minimize the
number of external resistors on the board.
I/O Standards
These rules apply to the I/O Standard selection for RLDRAM II:
Memory-Specific Guidelines
• MIG-generated designs use the HSTL CLASS II I/O standard by default for all
memory interface signals. When DCI is selected in MIG, DCI for HSTL CLASS II is
applied on input, output, and in-out memory interface signals.
• The user can change the I/O standard to HSTL CLASS I. When DCI is selected in
MIG, DCI for HSTL CLASS I is applied only to the memory interface signals that are
inputs to the FPGA.
• To have HSTL CLASS I on the required pins, the user must manually edit the UCF
constraint file for the corresponding design generated.
Trace Lengths
These rules provide the maximum electrical delays between RLDRAM II signals:
1. ± 25 ps maximum electrical delay between data and its associated QK.
2. ± 50 ps maximum electrical delay between address and control signals.
3. ± 100 ps maximum electrical delay between address/control and data.
Appendix B
XIL_PAR_DELAY are embedded in the RTL to allow the ISE® tools to place and route
this circuit and meet the required net delay and intra-net skew without the use of the
RPM and directed routing constraints.
• For each DQS, a circuit is added to disable the clock enable (CE) pin to each of the
corresponding DQ capture IDDRs at the end of a read burst (“DQS Gate”). The
placement and routing of this circuit is also critical and is defined by a combination of
LOC and MAXDELAY constraints in the UCF.
Figure B-1 shows the read capture path architecture for the MIG Virtex-5 FPGA DDR2
design, as well as the various portions of the capture path that are affected by the
additional UCF constraints.
The ISE software places and routes to these fabric flip-flops based on
XIL_PAR_SKEW and XIL_PAR_DELAY attributes embedded in the RTL
IDDR
DQ
IDELAY D Q1 D Q
CE
Q2 Read
BUFIO Data
DQS Transfer
IDELAY D Q D Q
Logic
D Q
DQS Gate
D Q D Q
DQS Gate
IOB Flip-Flop
D
IDELAY
Q Q D
FPGA Clock
PHY
Control
Logic
Figure B-1: Virtex-5 FPGA DDR2 Read Capture Path, MIG 3.1 or Later
fluctuations on this line due to voltage/temperature. The rules for determining this
value are outlined in “Setting UCF Constraints.”
The fabric flip-flop driving the IDELAY with the DQS Gate control pulse must also be
location constrained to be near the corresponding IDELAY/IOB flip-flop. The rules for
determining this are:
• Locate the IOB where the corresponding IDELAY and IOB flip-flop are location
constrained.
• Use the appropriate package file to find the “nearest CLB.” Location constraint this
flip-flop to this location.
For example, on an XC5VLX50T-FF1136 device, if DQS_N[0] is placed on pin N30, the
location constraint for the corresponding DQS Gate fabric flip-flop is:
INST "*/gen_dqs[0].u_iob_dqs/u_iodelay_dq_ce" LOC = "IODELAY_X2Y218";
The reason for this requirement is to minimize the net delay from the output of the DQS
Gate fabric flip-flop to the synchronizing IDELAY (see the discussion of why MAXDELAY
constraints are used in this design in “Setting UCF Constraints,” page 563). It is possible to
not constrain this flip-flop to a specific location (or constrain it to a different location) as
long as the corresponding MAXDELAY for this net can be met (i.e., by allowing MAP to
place this flip-flop).
Appendix C
Appendix D
Requirements
SSO is not required to be considered in these conditions:
• The SSO value is greater than the available I/O pin count in a bank
• The simultaneously switching signal count is less than the SSO value of the bank for
the given I/O standard
The applicable I/O standards for DDR/DDR2 SDRAM address, control, and data are
SSTL2_I, SSTL2_II, SSTL18_I, and SSTL18_II. For the Spartan®-3 and Spartan-3E
platforms, the SSO limit for these standards is greater than the available pin count. For
Spartan-3A and Spartan-3A DSP devices, although the SSO limit is less than the I/O pin
count, the output signals that switch simultaneously are always less than the SSO count
value. Thus, for all the Spartan devices described here, the SSO limit does not have to be
considered for pin allocation. The DQ, DQS, address, and control signals never switch
simultaneously. During write operations, DQS is center aligned with DQ, and therefore,
they do not switch together. The address and control signals are asserted before enabling
the data.
Table D-1 shows how the number of pins allocated is less than the SSO limit for Bank 3 of
the XC3S1400A-FG676 device in a DDR2 SDRAM design.
Table D-1: XC3S1400A-FG676 Device in DDR2 SDRAM Design
Parameter Value
Device XC3S1400A-FG676
SSO limit for the SSTL18_II I/O standard for Bank 3 81
Available I/O pin count for Bank 3 103
Memory type DDR2 SDRAM
Memory part MT47H256M8HG-3
Appendix E
Debug Port
Overview
Starting with MIG 2.2, the memory controller interface design HDL for Virtex®-5, Virtex-4,
and Spartan®-3 FPGAs adds ports to the top-level design file to allow debugging and
monitoring of the physical layer read timing calibration logic and timing. This port
consists of signals brought to the top-level HDL from the Read Calibration module (where
the read timing calibration logic resides). These signals provide information for debugging
hardware issues when calibration does not complete or read timing errors are observed in
the system even after calibration completes. For Virtex FPGA designs, these signals also
allow the user to adjust the read capture timing by adjusting the various IDELAY elements
used for data synchronization. Whereas, for Spartan-3 FPGA designs, these signals allow
the user to adjust the read capture timing by adjusting the delays on data_strobe and
rst_dqs_div signals.
Specifically, the Debug port allows the user to:
• Observe calibration status signals.
• Observe current tap values for IDELAYs used for read data synchronization for Virtex
FPGA designs.
• Observe current tap_delay values for Spartan-3 FPGA designs.
• Dynamically vary these tap values. Possible uses of this functionality include:
• Debug read data corruption issues
• Support periodic readjustment of the read data capture timing by adjusting the
tap values
• Use as a tool during product margining to determine actual timing margin
available on read data captures
ChipScope tool debug EDIF or NGC files should be manually generated. Refer to
readme.txt located in the par folder for a set of commands to be executed before
starting synthesis and PAR to generate the EDIF or NGC files.
Signal Descriptions
The tables in this section provide the Debug port signal descriptions for the various
memory and FPGA combinations. All the signal directions are with respect to the RTL
design.
• Table E-1, “DDR2 SDRAM Signal Descriptions (Virtex-5 FPGAs),” page 572
• Table E-2, “DDR SDRAM Signal Descriptions (Virtex-5 FPGAs),” page 575
• Table E-3, “QDRII SRAM Signal Descriptions (Virtex-5 FPGAs),” page 578
• Table E-4, “DDR2 SDRAM Signal Descriptions (Virtex-4 FPGAs),” page 582
• Table E-5, “DDR SDRAM Signal Descriptions (Virtex-4 FPGAs),” page 583
• Table E-6, “DDRII SRAM Signal Descriptions (Virtex-4 FPGAs),” page 584
• Table E-7, “QDRII SRAM Signal Descriptions (Virtex-4 FPGAs),” page 586
• Table E-8, “RLDRAM II Signal Descriptions (Virtex-4 FPGAs),” page 588
• Table E-9, “DDR/DDR2 SDRAM Signal Descriptions (Spartan-3 FPGAs),” page 589
Signal Descriptions
Signal Descriptions
Signal Descriptions
Signal Descriptions
Signal Descriptions
Signal Descriptions
Signal Descriptions
Signal Descriptions
Signal Descriptions
3. Adjust the tap delay values for all the strobes (DQS) and rst_dqs_div:
a. Set vio_out_dqs_en = 1.
b. Set vio_out_rst_dqs_div_en = 1.
c. Set the tap values for rst_dqs_div and all the strobes from Table E-10 and
Table E-11 by changing vio_out_dqs[4:0] and vio_out_rst_dqs_div[4:0].
Appendix F
Appendix G
Notes:
1. Maximum design frequencies for HIGH_PERFORMANCE_MODE using a -3 FPGA speed grade.
Appendix H
Table H-2 is an example showing the pin mapping for x4 DDR registered DIMMs between
the memory data sheet and the UCF.
Table H-2: Pin Mapping for x4 DDR DIMMs
Memory Data Sheet MIG UCF
DQ[63:0] DQ[63:0]
CB3 - CB0 DQ[67:64]
CB7 - CB4 DQ[71:68]
DQS0 DQS[0]
DQS1 DQS[2]
DQS2 DQS[4]
DQS3 DQS[6]
DQS4 DQS[8]
DQS5 DQS[10]
DQS6 DQS[12]
DQS7 DQS[14]
DQS8 DQS[16]
DQS9 DQS[1]
DQS10 DQS[3]
DQS11 DQS[5]
DQS12 DQS[7]
DQS13 DQS[9]
DQS14 DQS[11]
DQS15 DQS[13]
DQS16 DQS[15]
DQS17 DQS[17]