Ug904 Vivado Implementation
Ug904 Vivado Implementation
of this document
Implementation
UG904 (v2024.1) June 5, 2024
Chapter 1
• RTL designs
• Netlist designs
• IP-centric design flows
The Figure 1: Vivado Design Suite High-Level Design Flow shows the Vivado tools flow.
Vivado implementation includes all steps necessary to place and route the netlist onto device
resources, within the logical, physical, and timing constraints of the design.
For more information about the design flows supported by the Vivado tools, see the Vivado
Design Suite User Guide: Design Flows Overview (UG892).
IP Integration
High-Level DSP Design (System Custom IP
C Sources (Embedded, Logic,
Synthesis Generator)
DSP…)
IP Packaging
IP Catalog
Sources-RTL,
Netlist, RTL System-Level Integration
Xilinx IP
Constraints
Third-Party IP
User IP
Synthesis
Design Analysis
Constraints
Implementation Simulation
Debugging
Cross Probing
Programming ECO
and Debug
X12973-040716
1. Opt Design: Optimizes the logical design to make it easier to fit onto the target AMD device.
2. Power Opt Design (optional): Optimizes design elements to reduce the power demands of
the target AMD device.
3. Place Design: Places the design onto the target AMD device and performs fanout replication
to improve timing.
4. Post-Place Power Opt Design (optional): Additional optimization to reduce power after
placement.
5. Post-Place Phys Opt Design (optional): Optimizes logic and placement using estimated timing
based on placement. Includes replication of high fanout drivers.
6. Route Design: Routes the design onto the target AMD device.
7. Post-Route Phys Opt Design (optional): Optimizes logic, placement, and routing using actual
routed delays.
8. Write Bitstream: Generates a bitstream for AMD device configuration. Typically, bitstream
generation follows implementation.
For more information about writing the bitstream, see section Generating the Bitstream or
Device Image in the Vivado Design Suite User Guide: Programming and Debugging (UG908).
• DRC reporting: 8
• Static timing analysis: 8
• Placement: 8
• Routing: 8
• Physical optimization: 8
The default number of maximum simultaneous threads is based on the OS. For Windows
systems, the limit is 2; for Linux systems the default is 8. The limit can be changed using a
parameter called general.maxThreads. To change the limit use the following Tcl command:
This means all tasks are limited to two threads regardless of number of processors or the task
being executed. If the system has at least eight processors, you can set the limit to 8 and allow
each task to use the maximum number of threads.
To summarize, the number of simultaneous threads is the smallest of the following values:
Parallel Runs
Vivado supports launching design runs in parallel by providing the launch_runs -jobs option to
specify the number of simultaneous runs. Each simultaneous run is an independent process,
requiring its own CPU and memory resources.
It is important to allocate sufficient resources to handle the total peak computing requirements.
For example, consider a design run that typically reports a peak usage of 20 GB RAM with
general.maxThreads set to 8. Launching 4 similar runs in parallel would require 32 processor
cores and roughly 80 GB RAM to avoid performance degradation due to competition for
computing resources by the 4 processes.
Note: For more information about Tcl commands, see the Vivado Design Suite Tcl Command Reference Guide
(UG835) or type <command> -help.
• Hardware, IP, and Platform Development: Creating the PL IP blocks for the hardware
platform, creating PL kernels, functional simulation, and evaluating the AMD Vivado™ timing,
resource use, and power closure. Also involves developing the hardware platform for system
integration. Topics in this document that apply to this design process include:
Managing Implementation
The Vivado Design Suite includes a variety of design flows and supports an array of design
sources. To generate a bitstream that can be downloaded onto an AMD device, the design must
pass through implementation.
Implementation is a series of steps that takes the logical netlist and maps it into the physical
array of the target AMD device. Implementation comprises:
• Logic optimization
• Placement of logic cells
• Routing of connections between cells
Project Mode
The Vivado Design Suite lets you create a project file (.xpr) and directory structure that allows
you to:
The automated management of the design data, process, and status requires a project
infrastructure that is stored in the Vivado project file (.xpr).
In Project Mode, the Vivado tools automatically write checkpoint files into the local project
directory at key points in the design flow.
To run implementation in Project Mode, you click the Run Implementation button in the IDE or
use the launch_runs Tcl command. See section Using Project Mode in the Vivado Design Suite
User Guide: Design Flows Overview (UG892) for more information about using projects in the
Vivado Design Suite.
Flow Navigator
The complete design flow is integrated in the Vivado Integrated Design Environment (IDE). The
Vivado IDE includes a standardized interface called the Flow Navigator.
The Flow Navigator appears in the left pane of the Vivado Design Suite main window. From the
Flow Navigator you can assemble, implement, and validate the design and IP. It features a
pushbutton interface to the entire implementation process to simplify the design flow. The
following figure shows the Implementation section of the Flow Navigator.
IMPORTANT! This guide does not give a detailed explanation of the Vivado IDE, except as it applies to
implementation. For more information about the Vivado IDE as it relates to the entire design flow, see the
Vivado Design Suite User Guide: Using the Vivado IDE (UG893).
Non-Project Mode
The Vivado tools also let you work with the design in memory, without the need for a project file
and local directory. Working without a project file in the compilation style flow is called Non-
Project Mode. Source files and design constraints are read into memory from their current
locations. The in-memory design is stepped through the design flow without being written to
intermediate files.
In Non-Project Mode, you must run each design step individually, with the appropriate options
for each implementation Tcl command.
Non-Project Mode allows you to apply design changes and proceed through the design flow
without needing to save changes and rerun steps. You can run reports and save design
checkpoints (.dcp) at any stage of the design flow.
IMPORTANT! In Non-Project Mode, when you exit the Vivado design tools, the in-memory design is lost.
For this reason, AMD recommends that you write design checkpoints after major steps such as synthesis,
placement, and routing.
You can save design checkpoints in both Project Mode and Non-Project Mode. You can only open
design checkpoints in Non-Project Mode.
There are many differences between Project Mode and Non-Project Mode. Features not available
in Non-Project Mode include:
• Flow Navigator
• Design status indicators
• IP catalog
• Implementation runs and run strategies
• Design Runs window
• Messages window
• Reports window
Note: This list illustrates features that are not supported in Non-Project Mode. It is not exhaustive.
You must implement the non-project based design by running the individual Tcl commands:
• opt_design
• power_opt_design (optional)
• place_design
• phys_opt_design (optional)
• route_design
• phys_opt_design (optional)
• write_bitstream
You can run implementation steps interactively in the Tcl Console, in the Vivado IDE, or by using
a custom Tcl script. You can customize the design flow as needed to include reporting commands
and additional optimizations. For more information, see Running Implementation in Non-Project
Mode.
The details of running implementation in Project Mode and Non-Project Mode are described in
this guide.
For more information on running the Vivado Design Suite using either Project Mode or Non-
Project Mode, see:
There are two ways to begin the implementation flow with a synthesized design:
• Run Vivado synthesis. In Project Mode, the synthesis run contains the synthesis results and
those results are automatically used as the input for implementation run. In Non-Project
Mode, the synthesis results are in memory after synth_design completes, and implementation
can continue from that point.
• Load a synthesized netlist. Synthesized netlists can be used as the input design source, for
example when using a third-party tool for synthesis.
To initiate implementation:
To analyze and refine constraints, the synthesized design is loaded without running
implementation.
• In Project Mode, you accomplish this by opening the Synthesized Design, which is the result
of the synthesis run.
• In Non-Project Mode, you use the link_design command to load the design.
You can also drive the implementation flow using design checkpoints (.dcp) in Non-Project Mode.
Opening a checkpoint loads the design and restores it to its original state, which might include
placement and routing data. This enables re-entrant implementation flows, such as loading a
routed design and editing the routing, or loading a placed design and running multiple routes
with different options.
• Structural Verilog
• Structural SystemVerilog
• EDIF
• AMD NGC
• Synthesized Design Checkpoint (DCP)
IMPORTANT! NGC format files are not supported in the Vivado Design Suite for UltraScale and later
devices. It is recommended that you regenerate the IP using the Vivado Design Suite IP customization tools
with native output products. Alternatively, convert_ngc Tcl utility to convert NGC files to EDIF or
Verilog formats. However, AMD recommends using native Vivado IP rather than XST-generated NGC
format files going forward.
IMPORTANT! When using IP in Project Mode or Non-Project Mode, always use the XCI file and not the
DCP file. This ensures that IP output products are used consistently during all stages of the design flow. If
the IP was synthesized out-of-context and already has an associated DCP file, the DCP file is automatically
used and the IP is not re-synthesized. For more information, see section Adding Existing IP to a Project in
the Vivado Design Suite User Guide: Designing with IP (UG896).
For more information on the source files and project types supported by the Vivado Design Suite,
see the Vivado Design Suite User Guide: System-Level Design Entry (UG895).
IMPORTANT! If you start from RTL sources, you must first run Vivado synthesis before implementation
can begin. The Vivado IDE manages this automatically if you attempt to run implementation on an un-
synthesized design. The tools allow you to run synthesis first.
For information on running Vivado synthesis, see the Vivado Design Suite User Guide: Synthesis
(UG901).
Without timing constraints, the Vivado Design Suite optimizes the design solely for wire length
and routing congestion, and makes no effort to assess or improve design performance.
• Operating conditions such as voltage settings, power and current budgets, and operating
environment details.
• Switching activity rates for:
○ Design objects: individual nets and pins.
Vivado power analysis uses timing constraints to determine switching rates and applies
vectorless propagation to determine toggle rates throughout the design. Without power
constraints, a default 12.5% toggle rate is used. However, applying accurate switching activity to
override defaults is essential for accurate power calculations.
For further information see the Vivado Design Suite User Guide: Power Analysis and Optimization
(UG907).
For information on migrating UCF constraints to XDC commands, see section Migrating UCF
Constraints to XDC in the ISE to Vivado Design Suite Migration Guide (UG911).
TIP: Separate constraints by function into different constraint files to (a) make your constraint strategy
clearer, and (b) to facilitate targeting timing and implementation changes.
You can have multiple constraint sets for a project. Multiple constraint sets allow you to use
different implementation runs to test different approaches.
For example, you can have one constraint set for synthesis, and a second constraint set for
implementation. Having two constraint sets allows you to experiment by applying different
constraints during synthesis, simulation, and implementation.
Organizing design constraints into multiple constraint sets can help you:
• Target various AMD devices for the same project. Different physical and timing constraints
might be needed for different target devices.
• Perform what-if design exploration. Use constraint sets to explore various scenarios for
floorplanning and over-constraining the design.
• Manage constraint changes. Override master constraints with local changes in a separate
constraint file.
For more information on defining and working with constraints that affect placement and
routing, see section Physical Constraint in the Vivado Design Suite User Guide: Using Constraints
(UG903).
In some cases, constraints are available only as HDL attributes, and are not available in XDC. In
those cases, the constraint must be specified as an attribute in the HDL source file. For example,
Relatively Placed Macros (RPMs) must be defined using HDL attributes. An RPM is a set of logic
elements (such as FF, LUT, DSP, and RAM) with relative placements.
You can define RPMs using U_SET and HU_SET attributes and define relative placements using
Relative Location Attributes.
For more information about Relative Location Constraints, see section Migrating UCF
Constraints to XDC in the Vivado Design Suite User Guide: Using Constraints (UG903).
For more information on constraints that are not supported in XDC, see the ISE to Vivado Design
Suite Migration Guide (UG911).
Checkpoint designs can be run through the remainder of the design flow using Tcl commands.
They cannot be modified with new design sources.
IMPORTANT! In Project Mode, the Vivado design tools automatically save and restore checkpoints as the
design progresses. In Non-Project Mode, you must save checkpoints at appropriate stages of the design
flow, otherwise, progress is lost.
Chapter 2
• Opt Design (opt_design): Optimizes the logical design to make it easier to fit onto the
target AMD device.
• Place Design (place_design): Places the design onto the target AMD device and replicates
logic to improve timing.
• Post-Place Phys Opt Design (phys_opt_design) (optional): Optimizes logic and placement
using estimated timing based on placement. Includes replication of high fanout drivers.
• Route Design (route_design): Routes the design onto the target AMD device.
For more information about writing the bitstream or creating a device image, see section
Generating Bitstream in the Vivado Design Suite User Guide: Programming and Debugging (UG908).
These steps are collectively known as implementation. Enter the commands in any of the
following ways:
Note: The read_xdc step reads XDC constraints from the XDC files and applies constraints to design
objects. Therefore all netlist files must be read into Vivado and link_design should be run before
read_xdc to ensure that the XDC constraints can be applied to their intended design objects.
source run.tcl
Use the read_checkpoint command to add synthesized design checkpoint files as sources.
The read_* Tcl commands are designed for use with Non-Project Mode. The read_* Tcl
commands allow the Vivado tools to read a file on the disk and build the in-memory design
without copying the file or creating a dependency on the file.
This approach makes Non-Project Mode highly flexible with regard to design.
IMPORTANT! You must monitor any changes to the source design files, and update the design as needed.
• The -top option specifies the top design for implementation. If the top-level netlist is EDIF
and the -top option is not specified, the Vivado tools will use the top design embedded in
the EDIF netlist. If the top-level netlist is not EDIF but structural Verilog, the -top option is
required. The -top option can also be used to specify a submodule as the top, for example
when running the Module Analysis flow to estimate performance and utilization.
All actions taken in Non-Project Mode are directed at the in-memory database within the Vivado
tools.
The in-memory design resides in the Vivado IDE for interaction with the design data in a
graphical form. tools, whether running in batch mode, Tcl shell mode for interactive Tcl
commands, or in the
The read_xdc command reads an XDC constraint file, then applies it to the in-memory design.
TIP: Although Project Mode supports the definition of constraint sets, containing multiple constraint files
for different purposes, Non-Project Mode uses multiple read_xdc commands to achieve the same effect.
The Vivado netlist optimizer includes many different types of optimizations to meet varying
design requirements. For more information, see Logic Optimization.
In Non-Project Mode, you must use the appropriate Tcl command to specify each report that you
want to create. Each reporting command supports the -file option to direct output to a file.
See Vivado Design Suite Tcl Command Reference Guide (UG835) for further information on the
report_timing_summary command and on the report_utilization command.
You can output reports to files for later review, or you can send the reports directly to the Vivado
IDE to review now. For more information, see Viewing Implementation Reports.
• Logical netlist
• Physical and timing related constraints
• AMD part data
• Placement and routing information
In Non-Project Mode, the design checkpoint file saves the design and allows it to be reloaded for
further analysis and modification.
For more information, see Using Checkpoints to Save and Restore Design Snapshots.
• Define implementation runs that are configured to use specific synthesis results and design
constraints.
• Run multiple strategies on a single design.
• Customize implementation strategies to meet specific design requirements.
• Save customized implementation strategies to use in other designs.
IMPORTANT! Non-Project Mode does not support predefined implementation runs and strategies. Non-
project based designs must be manually moved through each step of the implementation process using Tcl
commands. For more information, see Running Implementation in Non-Project Mode.
On Linux systems, you can launch runs on remote servers. For more information, see Appendix A:
Using Remote Hosts and Compute Clusters.
a. In the Name column, enter a name for the run or accept the default name.
b. Select a Synth Name to choose the synthesis run that will generate (or that has already
generated) the synthesized netlist to be implemented. The default is the currently active
synthesis run in the Design Runs window. For more information, see Appendix B:
Implementation Categories, Strategy Descriptions, and Directive Mapping.
Note: In the case of a netlist-driven project, the Create Run command does not require the name
of the synthesis run.
Alternatively, you can select a synthesized netlist that was imported into the project from
a third-party synthesis tool. For more information, see the Vivado Design Suite User Guide:
Synthesis (UG901).
c. Select a Constraints Set to apply during implementation. The optimization, placement,
and routing are largely directed by the physical and timing constraints in the specified
constraint set.
For more information on constraint sets, see the Vivado Design Suite User Guide: Using
Constraints (UG903).
d. Select a target Part.
The default values for Constraints Set and Part are defined by the Project Settings when
the Create New Runs command is executed.
For more information on the Project Settings, see section Configuring Project Settings in
the Vivado Design Suite User Guide: System-Level Design Entry (UG895).
TIP: To create runs with different constraint sets or target parts, use the Create New Runs
command. To change these values on existing runs, select the run in the Design Runs window and
edit the Run Properties.
e. Select a Strategy.
Strategies are a defined set of Vivado implementation feature options that control the
implementation results. Vivado Design Suite includes a set of pre-defined strategies. You
can also create your own implementation strategies.
Select from among the strategies shown in Appendix B: Implementation Categories,
Strategy Descriptions, and Directive Mapping. The strategies are broken into categories
according to their purposes, with the category name as a prefix. The categories are shown
in Appendix B: Implementation Categories, Strategy Descriptions, and Directive Mapping.
For more information see Defining Implementation Strategies.
TIP: The optimal strategy can change between designs and software releases.
IMPORTANT! Strategies containing the terms SLL or SLR are for use with SSI devices only.
TIP: Before launching a run, you can change the settings for each step in the implementation
process, overriding the default settings for the selected strategy. You can also save those new
settings as a new strategy. For more information, see Changing Implementation Run Settings.
f. Click More to define additional runs. By default, the next strategy in the sequence is
automatically chosen. Specify names and strategies for the added runs.
g. Use the Make Active check box to select the runs you wish to initiate.
h. Click Next.
4. The Launch Options page appears, as shown in the following figure. Specify options as
described in the steps below the figure.
Note: The Launch runs on remote hosts and Launch runs on Cluster options shown in the previous
figure are Linux-only. They are not visible on Windows machines.
a. Specify the Launch directory, the location at which implementation run data is created
and stored.
The default directory is located in the local project directory structure. Files for
implementation runs are stored by default at: <project_name>/
<project_name>.runs/<run_name>.
TIP: Defining a directory location outside the project directory structure makes the project non-
portable, because absolute paths are written into the project files.
b. Use the radio buttons and drop-down options to specify settings appropriate to your
project. Choose from the following:
• Select the Launch runs on local host option if you want to launch the run on the local
machine.
• Use the Number of jobs drop-down menu to define the number of local processors to
use when launching multiple runs simultaneously.
• Select Launch runs on remote hosts (Linux only) if you want to use remote hosts to
launch one or more jobs.
• Use the Configure Hosts button to configure remote hosts. For more information, see
Appendix A: Using Remote Hosts and Compute Clusters.
• Select Launch runs on Cluster (Linux only) if you want to use a compute cluster
command to launch one or more jobs. Use the drop down menu to select one of the
natively supported Vivado Clusters (lsf, sge or slurm) or a User Define Cluster that has
been added previously.
• Select the Generate scripts only option if you want to export and create the run
directory and run script but do not want the run script to launch at this time. The
script can be run later outside the Vivado IDE tools.
• Select Do not launch now if you want to save the new runs, but you do not want to
launch or create run scripts at this time.
5. Click Next to review the Create New Runs Summary.
6. Click Finish to create the defined runs and execute the specified launch options.
New runs are added to the Design Runs window. See Using the Design Runs Window.
For more information on working with the columns to sort the data in this window, see section
Using Data Table Windows in the Vivado Design Suite User Guide: Using the Vivado IDE (UG893).
Run Status
The Design Runs window reports the run status, including when:
Run Times
The Design Runs window reports start and elapsed run times.
The Design Runs window reports timing results for implementation runs including WNS, TNS,
WHS, THS, and TPWS.
Out-of-Date Runs
Runs can become out-of-date when source files, constraints, or project settings are modified. You
can reset and delete stale run data in the Design Runs window.
Active Run
All views in the Vivado IDE reference the active run. The Log window, Report window, Status
Bar, and Project Summary display information for the active run. The Project Summary window
displays only compilation, resource, and summary information for the active run.
TIP: Only one synthesis run and one implementation run can be active in the Vivado IDE at any time.
For more information on the Run Properties window, see section Using the Run Properties
Window in the Vivado Design Suite User Guide: Using the Vivado IDE (UG893).
TIP: You can change the settings only for a run that has a Not Started status. Use Reset Run to return
a run to the Not Started status. See Resetting Runs.
Strategy
Selects the strategy to use for the implementation run. Vivado Design Suite includes a set of pre-
defined implementation strategies, or you can create your own.
Description
Describes the selected implementation strategy.
Options
When you select a strategy, each step of the Vivado implementation process displays in a table in
the lower part of the dialog box:
Click the command option to view a brief description of the option at the bottom of the Design
Run Settings dialog box.
• Select options with predefined settings from the pull down menu.
• Select or deselect a check box to enable or disable options.
Note: The most common options for each implementation command are available through the check
boxes. Add other supported command options using the More Options field. Syntax: precede option
names with a hyphen and separate options from each other with a space.
TIP: Relative paths in the tcl.pre and tcl.post scripts are relative to the appropriate run directory
of the project they are applied to: <project>/<project.runs>/<run_name>.
Use the DIRECTORY property of the current project or current run to define the relative paths in
your Tcl scripts:
Save Strategy As
Select the Save Strategy As icon next to the Strategy field to save any changes to the strategy as
a new strategy for future use.
CAUTION! If you do not select Save Strategy As , changes are saved to the current implementation run,
but are not preserved for future use.
• If the status of the run is Not Started, the run begins immediately.
• If the status of the run is Error, the tools reset the run to remove any incomplete run data,
then restarts the run.
• If the status of the run is Complete (or Out-of-Date), the tools prompt you to confirm that the
run should be reset before proceeding with the run.
Resetting Runs
To reset a run:
Resetting an implementation run returns it to the first step of implementation (opt_design) for
the selected run.
As shown in the following figure, the Vivado tools prompt you to confirm the Reset Runs
command, and optionally delete the generated files from the run directory.
TIP: The default setting is to delete the generated files. Disable this check box to preserve the generated
run files.
Deleting Runs
To delete runs from the Design Runs window:
As shown in the following figure, the Vivado tools prompt you to confirm the Delete Runs
command.
Figure 6: Implementation Settings shows the Implementation page in the Settings dialog box. To
open this dialog box from the Vivado IDE, select Tools → Settings from the main menu.
TIP: The Settings command is not available in the Vivado IDE when running in non-project mode. In this
case, you can define and preserve implementation strategies as Tcl scripts that can be used in batch mode,
or interactively in the Vivado IDE.
• Default constraint set: Select the constraint set to be used by default for the implementation
run.
• Report Settings: Use this menu to select the report strategy. You can choose from a preset
report strategy or define your own strategy to choose which reports to run at each design
step.
• Strategy: Select the strategy to use for the implementation run. The Vivado Design Suite
includes a set of pre-defined strategies. You can also create your own implementation
strategies and save changes as new strategies for future use. For more information see
Defining Implementation Strategies.
• Strategies are defined in pre-configured sets of options for the Vivado implementation
features.
• Strategies are tool and version specific.
• Each major release of the Vivado Design Suite includes version-specific strategies.
Vivado implementation includes several commonly used strategies that are tested against
internal benchmarks.
TIP: You cannot save changes to the predefined implementation strategies. However, you can copy,
modify, and save the predefined strategies to create your own.
4. In the Flow pull-down menu, select the appropriate Vivado Implementation version for the
available strategies. A list of included strategies is displayed.
5. Create a new strategy or copy an existing strategy:
• To create a new strategy, click the Create Strategy button on the toolbar or select it
from the right-click menu.
• To copy an existing strategy, select Copy Strategy from the toolbar or from the popup
menu. The Vivado design tools create a copy of the currently selected strategy and add it
to the User Defined Strategies list. Vivado then displays the strategy options on the right
side of the dialog box for you to modify.
6. Provide a name and description for the new strategy as follows:
• Name: Enter a strategy name to assign to a run.
• Type: Specify Synthesis or Implementation.
• Tool Version: Specify the tool version.
• Description: Enter the strategy description displayed in the Design Run results table.
7. Edit the Options for the various implementation steps:
• Design Initialization (init_design)
• Opt Design (opt_design)
• Power Opt Design (power_opt_design) (optional)
• Place Design (place_design)
• Post-Place Power Opt Design (power_opt_design) (optional)
• Post-Place Phys Opt Design (phys_opt_design) (optional)
• Route Design (route_design)
• Post-Route Phys Opt Design (phys_opt_design) (optional)
• Write Bitstream (write_bitstream) (all devices except Versal)
TIP: Select an option to view a brief description of the option at the bottom of the Design Run Settings
dialog box.
8. Click the right-side column of a specific option to modify command options. See the previous
figure for an example.
You can then:
• Select predefined options from the pull down menu.
• Enable or disable some options with a check box.
• Type a user-defined value for options with a text entry field.
• Use the file browser to specify a file for options accepting a file name and path.
• Insert a custom Tcl script (called a hook script) before and after each step in the
implementation process (tcl.pre and tcl.post). This lets you perform specific tasks
either before or after each implementation step (for example, generating a timing report
before and after Place Design to compare timing results).
For more information on defining Tcl hook scripts, see Vivado Design Suite User Guide: Using
Tcl Scripting (UG894).
Relative paths in the tcl.pre and tcl.post scripts are relative to the appropriate run
directory of the project they are applied to: <project>/<project.runs>/
<run_name>.
You can use the DIRECTORY property of the current project or current run to define the
relative paths in your scripts:
get_property DIRECTORY [current_project]
get_property DIRECTORY [current_run]
The new strategy is listed under User Defined Strategy. The Vivado tools save user-defined
strategies to the following locations:
Launching a single implementation run initiates a separate process for the implementation.
TIP: Select a run in the Design Runs window to launch a run other than the active run.
2. Select Launch Runs to open the Launch Runs dialog box, shown in the following figure.
Note: You can select Launch Runs from the popup menu, or from the Design Runs window toolbar
menu.
TIP: Defining any non-default location outside the project directory structure makes the project non-
portable because absolute paths are written into the project files.
4. Specify Options.
• Select Launch runs on local host if you want to launch the run on the local machine.
• Use the Number of jobs drop-down menu to define the number of local processors to use
when launching multiple runs simultaneously.
• Select Launch runs on remote hosts (Linux only) if you want to use remote hosts to launch
one or more jobs.
• Use the Configure Hosts button to configure remote hosts. For more information, see
Appendix A: Using Remote Hosts and Compute Clusters.
• Select Launch runs using LSF (Linux only) if you want to use LSF (Load Sharing Facility)
bsub command to launch one or more jobs. Use the Configure LSF button to set up the
bsub command options and test your LSF connection.
TIP: LSF, the Load Sharing Facility, is a subsystem for submitting, scheduling, executing, monitoring,
and controlling a workload of batch jobs across compute servers in a cluster.
• Select the Generate scripts only option if you want to export and create the run directory
and run script but do not want the run script to launch at this time. The script can be run
later outside the Vivado IDE tools.
Putting this process into the background releases the Vivado IDE to perform other functions
while it completes the background task. The other functions can include functions such as
viewing reports and opening design files. You can use this time, for example, to review previous
runs, or to examine reports.
CAUTION! When you put this process into the background, the Tcl Console is blocked. You cannot
execute Tcl commands, or perform tasks that require Tcl commands, such as switching to another open
design.
The Vivado tools let you run implementation as a series of steps, rather than as a single process.
1. Right-click a run in the Design Runs window and select Launch Next Step: <Step> or Launch
Step To from the popup menu shown in the following figure.
Valid <Step> values depend on which run steps have been enabled in the Run Settings. The
steps that are available in an implementation run are:
• Opt Design: Optimizes the logical design and fits it onto the target AMD device.
• Power Opt Design: Optimizes elements of the design to reduce power demands of the
implemented device.
• Place Design: Places the design onto the target AMD device.
• Post-Place Power Opt Design: Additional optimization to reduce power after placement.
• Post-Place Phys Opt Design: Performs timing-driven optimization on the negative-slack
paths of a design.
• Route Design: Routes the design onto the target AMD device.
• Post-Route Phys Opt Design: Optimizes logic, placement, and routing, using actual routed
delays.
• Write Bitstream (all devices except Versal devices): Generates a bitstream for AMD device
configuration. Although not technically part of an implementation run, bitstream
generation is available as an incremental step.
• Write Device Image (Versal devices): Generates a programmable device image for
programming a Versal device.
2. Repeat Launch Next Step: <Step> or Launch Step To as needed to move the design through
implementation.
3. To back up from a completed step, select Reset to Previous Step: <Step> from the Design
Runs window popup menu.
Select Reset to Previous Step to reset the selected run from its current state to the prior
incremental step. This allows you to:
• Step backward through a run.
• Make any needed changes.
• Step forward again to incrementally complete the run.
Non-Project based designs must be manually taken through each step of the implementation
process using Tcl commands or Tcl scripts.
Note: For more information about Tcl commands, see the Vivado Design Suite Tcl Command Reference Guide
(UG835), or type <command> -help.
Implementation Sub-Processes
In project mode, the implementation commands are run in a fixed order. In non-project mode the
commands can be run in a similar order, but can also be run repeatedly, iteratively, and in a
different sequence than in project mode.
Implementation commands are re-entrant, which means that when an implementation command
is called in non-project mode, it reads the design in memory, performs its tasks, and writes the
resulting design back into memory. This provides more flexibility when running in non-project
mode.
Examples:
Putting a design through the Vivado implementation process, whether in project mode or non-
project mode, consists of several sub-processes:
• Open Synthesized Design: Combines the netlist, the design constraints, and AMD target part
data, to build the in-memory design to drive implementation.
• Opt Design: Optimizes the logical design to make it easier to fit onto the target AMD device.
• Power Opt Design (optional): Optimizes design elements to reduce the power demands of the
target AMD device.
• Place Design: Places the design onto the target AMD device.
• Post-Place Power Opt Design (optional): Additional optimization to reduce power after
placement.
• Post-Place Phys Opt Design (optional): Optimizes logic and placement using estimated timing
based on placement. Includes replication of high fanout drivers.
• Route Design: Routes the design onto the target AMD device.
• Post-Route Phys Opt Design: Optimizes logic, placement, and routing using actual routed
delays (optional).
• Write Bitstream: Generates a bitstream for AMD device configuration (except Versal device).
• Write Device Image: Generates a programmable device image for programming a Versal
device.
Note: Although not technically part of an implementation run, Write Bitstream and Write Device Image are
available as a separate step.
To provide a better understanding of the individual steps in the implementation process, the
details of each step, and the associated Tcl commands, are documented in this chapter. The
following table provides a list of sub-processes and their associated Tcl commands.
For a complete description of the Tcl reporting commands and their options, see the Vivado
Design Suite Tcl Command Reference Guide (UG835).
IMPORTANT! NGC format files are not supported in the Vivado Design Suite for AMD UltraScale™
devices. It is recommended that you regenerate the IP using the Vivado Design Suite IP customization
tools with native output products. Alternatively, you can use the convert_ngc Tcl utility to convert
NGC files to EDIF or Verilog formats. However, AMD recommends using native Vivado IP rather than
XST-generated NGC format files going forward.
2. Transforms legacy netlist primitives to the currently supported subset of Unisim primitives.
IMPORTANT! Review critical warnings that identify failed constraints. Constraints might be placed
on design objects that have been optimized or no longer exist. The Tcl command 'write_xdc -
constraints INVALID' also captures invalid XDC constraints.
Tcl Commands
The Tcl commands shown in the following table can be used to read the synthesized design into
memory, depending on the source files in the design, and the state of the design.
synth_design
The synth_design command can be used in both Project Mode and Non-Project Mode. It runs
Vivado synthesis on RTL sources with the specified options, and reads the design into memory
after synthesis.
synth_design Syntax
The following is an excerpt from the create_bft_batch.tcl script found in the examples/
Vivado_Tutorials directory of the software installation.
For more information on using the synth_design example script, see the Vivado Design Suite
Tutorial: Design Flows Overview (UG888) and the Vivado Design Suite User Guide: Synthesis
(UG901).
The synth_design example script reads VHDL and Verilog files, reads a constraint file, and
synthesizes the design on the specified part. The design is opened by the Vivado tools into
memory when synth_design completes. A design checkpoint is written after completing
synthesis.
For more information on the synth_design Tcl command, see Vivado Design Suite Tcl Command
Reference Guide (UG835). This reference guide also provides a complete description of the Tcl
commands and their options.
open_checkpoint
The open_checkpoint command opens a design checkpoint file (DCP), creates a new in-
memory project and initializes a design immediately in the new project with the contents of the
checkpoint. This command can be used to open a top-level design checkpoint, or the checkpoint
created for an out-of-context module.
Note: In previous releases, the read_checkpoint command was used to read and initialize checkpoint
designs. Beginning in version 2014.1, this function is provided by the open_checkpoint command. The
behavior of read_checkpoint has been changed such that it only adds the checkpoint file to the list of
source files. This is consistent with other read commands such as read_verilog, read_vhdl, and
read_xdc. A separate link_design command is required to initialize the design and load it into
memory when using read_checkpoint.
When opening a checkpoint, there is no need to create a project first. The open_checkpoint
command reads the design data into memory, opening the design in Non-Project Mode. Refer
section Understanding Project Mode and Non-Project Mode in the Vivado Design Suite User
Guide: Design Flows Overview (UG892) for more information on Project Mode and Non-Project
Mode.
IMPORTANT! In the incremental compile flow, the read_checkpoint command is still used to specify
the reference design checkpoint.
open_checkpoint Syntax
The open_checkpoint example script opens the post synthesis design checkpoint file.
open_run
The open_run command opens a previously completed synthesis or implementation run, then
loads the in-memory design of the Vivado tools.
IMPORTANT! The open_run command works in Project Mode only. Design runs are not supported in
Non-Project Mode.
Use open_run before implementation on an RTL design to open a previously completed Vivado
synthesis run then load the synthesized netlist into memory.
TIP: Because the in-memory design is updated automatically, you do not need to use open_run after
synth_design . You need to use open_run only to open a previously completed synthesis run from an
earlier design session.
The open_run command is for use with RTL designs only. To open a netlist-based design, use
link_design.
open_run Syntax
The open_run example script opens a design (synth_1) into the Vivado tools memory from the
completed synthesis run (also named synth_1).
If you use open_run while a design is already in memory, the Vivado tools prompt you to save
any changes to the current design before opening the new design.
link_design
The link_design command creates an in-memory design from netlist sources (such as from a
third-party synthesis tool), and links the netlists and design constraints with the target part.
TIP: The link_design command supports both Project Mode and Non-Project Mode to create the
netlist design. Use link_design -part <arg> without a netlist loaded, to open a blank design for
device exploration.
link_design Syntax
If you use link_design while a design is already in memory, the Vivado tools prompt you to
save any changes to the current design before opening the new design.
RECOMMENDED: After creating the in-memory synthesized design in the Vivado tools, review Errors and
Critical Warnings for missing or incorrect constraints. After the design is successfully created, you can
begin running analysis, generating reports, applying new constraints, or running implementation.
Note: For more information on the Partial Reconfiguration options of link_design, see section Reading
Design Modules in the Vivado Design Suite User Guide: Dynamic Function eXchange (UG909).
BUFG Optimization
Mandatory logic optimization (MLO), which occurs at the beginning of link_design, supports the
use of the CLOCK_BUFFER_TYPE property to insert global clock buffers. Supported values are
BUFG for 7 series, and BUFG and BUFGCE for UltraScale, AMD UltraScale+™, and Versal
devices. The value NONE can be used for all architectures to suppress global clock buffer
insertion through MLO and opt_design. For BUFG and BUFGCE, MLO inserts the
corresponding buffer type to drive the specified net.
Use of CLOCK_BUFFER_TYPE provides the advantage of controlling buffer insertion using XDC
constraints so that no design source or netlist modifications are required. Buffers inserted using
CLOCK_BUFFER_TYPE are not subject to any limits, so the property must be used cautiously to
avoid introducing too many global clocks into the design, which may result in placement failures.
For more information, see the Vivado Design Suite Properties Reference Guide (UG912).
Logic Optimization
Logic optimization ensures the most efficient logic design before attempting placement. It
performs a netlist connectivity check to warn of potential design problems such as nets with
multiple drivers and un-driven inputs. Logic optimization also performs block RAM power
optimization.
Often design connectivity errors are propagated to the logic optimization step where the flow
fails. It is important to ensure valid connectivity using DRC Reports before running
implementation.
Logic optimization skips optimization of cells and nets that have DONT_TOUCH properties set
to a value of TRUE. Logic optimization also skips optimization of design objects that have directly
applied timing constraints and exceptions. This prevents constraints from being lost when their
target objects are optimized away from the design. Logic optimization also skips optimization of
design objects that have physical constraints such as LOC, Bel, RLOC, LUTNM HLUTNM
ASYNC_REG, and LOCK_PINS. An Info message at the end of each optimization stage provides a
summary of the number of optimizations prevented due to constraints. Specific messages about
which constraint prevented which optimizations can be generated with the -debug_log switch.
This error often occurs when the connection was omitted while assembling logic from multiple
sources. Logic optimization identifies both the cell name and the pin, so that it can be traced back
to its source definition.
IMPORTANT! Logic optimization can be limited to specific optimizations by choosing the corresponding
command options. Only those specified optimizations are run, while all others are disabled, even those
normally performed by default.
The following table describes the order in which the optimizations are performed when more
than one option is selected. This ordering ensures that the most efficient optimization is
performed.
14 Remap -remap
When an optimization is performed on a primitive cell, the OPT_MODIFIED property of the cell
is updated to reflect the optimizations performed on the cell. When multiple optimizations are
performed on the same cell, the OPT_MODIFIED value contains a list of optimizations in the
order they occurred. The following table lists the OPT_MODIFIED property value for the various
opt_design options:
Retargeting (Default)
When retargeting the design from one device family to another, retarget one type of block to
another. For example, retarget instantiated MUXCY or XORCY components into a CARRY4
block; or retarget DCM to MMCM. In addition, simple cells such as inverters are absorbed into
downstream logic. When the downstream logic cannot absorb the inverter, the inversion is
pushed in front of the driver, eliminating the extra level of logic between the driver and its loads.
After the transformation, the driver’s INIT value is inverted and set/reset logic is transformed to
ensure equivalent functionality.
• Eliminated logic:
For example, an AND with a constant 0 input.
• Reduced logic:
For example, A 3-input AND with a constant 1 input is reduced to a 2-input AND.
• Redundant logic:
For example, A 2-input OR with a logic 0 input is reduced to a wire.
Sweep (Default)
Sweep removes load-less cells and unconnected nets and does other optimizations, such as the
following:
Mux Optimization
Remaps MUXF7, MUXF8, and MUXF9 primitives to LUT3 to improve routability. You can limit
the scope of mux remapping by using the MUXF_REMAP cell property instead of the -
muxf_remap option. Set the MUXF_REMAP property to TRUE on individual MUXF primitives.
TIP: To further optimize the netlist after the mux optimization is performed, combine the mux optimization
with remap (opt_design -muxf_remap -remap).
Carry Optimization
Remaps CARRY4 and CARRY8 primitives of carry chains to LUTs to improve routability. When
running with the -carry_remap option, only single-stage carry chains are converted to LUTs. You
can control the conversion of individual carry chains of any length by using the CARRY_REMAP
cell property. The CARRY_REMAP property is an integer that specifies the maximum carry chain
length to be mapped to LUTs. The CARRY_REMAP property is applied to CARRY4 and CARRY8
primitives and each CARRY primitive within a chain must have the same value to convert to
LUTs. The minimum supported value is 1.
Example: A design contains multiple carry chains of lengths 1, 2, 3, and 4 CARRY8 primitives. The
following assigns a CARRY_REMAP property on all CARRY8 primitives:
After opt_design, only carry chains of length 3 or greater CARRY8 primitives remain mapped
to CARRY8. Chains with a length of 1 and 2 are mapped to LUTs.
TIP: Remapping long carry chains to LUTs may significantly increase delay even with further optimization
by adding the remap option. AMD recommends only remapping smaller carry chains, those consisting of
one or two cascaded CARRY primitives.
You can limit the scope of equivalent driver and control set merging by using the
EQUIVALENT_DRIVER_OPT cell property. Setting the EQUIVALENT_DRIVER_OPT property to
MERGE on the original driver and its replicas triggers the merge equivalent driver phase during
opt_design and merges the drivers with that property. Setting the
EQUIVALENT_DRIVER_OPT property to KEEP on the original driver and its replicas prevents the
merging of the drivers with that property during the equivalent driver merging and the control
set merging phase.
Note: Some interfaces require a one to one mapping from FF driver to interface pin and merging these
logically-equivalent signals to a single driver can result in unroutable nets. In that case set a
DONT_TOUCH property to TRUE or set the EQUIVALENT_DRIVER_OPT property to KEEP on those
registers.
For 7 series designs, clock buffers are inserted as long as 12 total global clock buffers are not
exceeded.
For UltraScale, UltraScale+, and Versal designs, clock buffers are inserted as long as 24 total
global clock buffers are not exceeded, not including BUFG_GT buffers.
Note: To prevent BUFG Optimization on a net, assign the value NONE to the CLOCK_BUFFER_TYPE
property of the net. Some clock buffer insertion that is required to legalize the design can also occur in
mandatory logic optimization.
MBUFG Optimization
For Versal devices, a new Multi-Clock Buffer (MBUFG) provides divide by 1, 2, 4, 8 clocks of the
clock input on its O1, O2, O3, O4 outputs. The MBUFG clock outputs are all routed on the same
global clock routing resources and only divided once they reach the BUFDIV_LEAF route-thru
Bels. MBUFG driven clocks consume less routing resources and clock skew is minimized for
synchronous CDC paths between clocks driven by the same MBUFG because the common node
is closer to the source and destination.
The MBUFG optimization transforms parallel clock buffers driven by a common driver or clock
modifying block (CMB), such as MMCM, DPLL, or XPLL, to MBUFG. The transformation occurs if
the divide factors of the parallel clocks are divide by 1, 2, 4, 8 of a common clock. For CMB
driven clocks, the phase shift has to be 0 and the duty cycle 50%. If the clock nets driven by the
BUFGs have conflicting constraints such as CLOCK_DELAY_GROUP or USER_CLOCK_ROOT the
transformation is also prevented. The transformation is only occurring when it is safe to do so
without corrupting timing constraints. The following transformations are supported:
In addition to the global optimization using the -mbufg_opt option, you can control the
conversion of selected BUFGs to MBUFG using the MBUFG_GROUP property. You must set the
MBUFG_GROUP constraint on the net segment directly connected to the clock buffer. The
following example shows the property applied to two clock nets, which are directly driven by the
clock buffers:
The picture in the following figure shows an MMCM driving several BUFGCE buffers. The
CLKOUTn driven clocks are integer divides of 1, 2, 4, 8 of the CLKOUT1 driven clock. After the
MBUFG optimization the four BUFGCEs are transformed to a single MBUFGCE and the
CLKOUT1 driven clock is connected to the MBUFGCE I pin. The loads that were driven by the
BUFGCEs are connected to the MBUFGCE O1, O2, O3, O4 pins.
• SRL fanout optimization: if an SRL (LUT-based shift register) primitive drives a fanout of 100
or greater, a register stage is taken from the end of the SRL chain and transformed into a
register primitive. This enables more flexible downstream replication if the net becomes
timing-critical. In general it is easier to replicate high-fanout register drivers compared to high
fanout SRL drivers.
Note: All transforms from registers to SRLs are only possible if control sets are compatible.
The optimizations are accessed using the -srl_remap_mode option which takes a Tcl list of
lists as an argument to define the mode. Following are the different types of optimizations.
• Converting small SRLs to registers: For this optimization use the max_depth_srl_to_ffs mode:
○ opt_design -srl_remap_modes {{max_depth_srl_to_ffs <depth>}}
○ Here all SRLs of depth <depth> and smaller are remapped to register chains.
• Converting large shift register chains to SRLs: For this optimization use the
min_depth_ffs_to_srl mode:
○ opt_design -srl_remap_modes {{min_depth_ffs_to_srl <depth>}}
○ Here all register chains greater than depth <depth> are remapped to SRL primitives.
• Automatic target utilization optimizations: This mode uses the following syntax:
○ -srl_remap_modes {{target_ff_util <ff_util> target_lutram_util
<lutram_util>}}
Here you specify percent utilization targets (0 to 100) for both registers and LUTRAMs. If the
current utilization exceeds a target, Vivado will convert from the overutilized resource type to
the other until the utilization target is met. When converting from SRLs to registers, Vivado
begins with the smallest SRLs. When converting from registers to SRLs, Vivado begins with the
largest register chains.
Note: The max_depth_srl_to_ffs and min_depth_ffs_to_srl can be used simultaneously but cannot be used
with the target utilization settings.
AREG and BREG to MREG AREG=1/2, BREG=1/2, AREG=0/1, BREG=0/1, Timing from AREG/BREG is
MREG=0 MREG=1 critical (slack less than 0.5
ns), and timing to AREG/
BREG is not critical (slack
greater than 1 ns)
MREG to AREG AREG=0, BREG=0, MREG=1 AREG=1, BREG=1, MREG=0 Timing to MREG is critical
and BREG (slack less than 0.5 ns), and
timing from MREG is not
critical (slack greater than 1
ns)
PREG push out to fabric PREG=1 PREG=0, FDRE in fabric Timing from PREG is critical
(slack less than 0.5 ns), and
timing to PREG is not critical
(slack greater than 1 ns)
PREG pull in from fabric PREG=0, FDRE in fabric PREG=1 Timing from DSP output is
critical (slack less than 0.5
ns)
Note: This optimization is automatically triggered when the CONTROL_SET_REMAP property is detected
on any register.
For each hierarchical instance driven by the high-fanout net, if the fanout within the hierarchy is
greater than the specified limit, then the net within the hierarchy is driven by a replica of the
driver of the high-fanout net.
IMPORTANT! Each use of logic optimization affects the in-memory design, not the synthesized design
that was originally opened.
Remap
Remap combines multiple LUTs into a single LUT to reduce the depth of the logic. Selective
remap can be triggered by applying the LUT_REMAP property to a group of LUTs. Chains of LUTs
with LUT_REMAP values of TRUE are collapsed into fewer logic levels where possible. Remap
optimization can combine LUTs that belong to different levels of logical hierarchy into a single
LUT to reduce logic levels. Remapped logic is combined into the LUT that is furthest downstream
in the logic cone.
This optimization also replicates LUTs with the LUT_REMAP property that have fanout greater
than one before the transformation.
Note: Setting the LUT_REMAP property to FALSE does not prevent LUTs from getting remapped when
running opt_design with the -remap option.
Aggressive Remap
Similar to Remap, Aggressive Remap combines multiple LUTs into a single LUT to reduce logic
depth. Aggressive Remap is a more exhaustive optimization than Remap, and may achieve further
logic level reduction than Remap at the expense of longer runtime.
Resynth Area
Resynth Area performs re-synthesis in area mode to reduce the number of LUTs.
Property-Only Optimization
This is a non-default option where opt_design runs only those phases that are triggered by
opt_design properties. If no such properties are found, opt_design exits and leaves the
design unchanged.
opt_design lists the opt_design cell properties that trigger optimizations when using this
option.
Resynth Remap
Remaps the design to improve the critical paths in timing-driven mode by performing re-
synthesis to reduce the depth of logic. This timing-based approach will replicate LUTs with fanout
and collapse smaller LUTs into bigger functions at the expense of longer optimization runtime.
Note: LUTs with BEL constraints will still be optimized by Resynth Remap. To prevent optimization on LUTs
with BEL constraints, add a DONT_TOUCH property with value TRUE to the LUT.
opt_design
The opt_design command runs Logic Optimization.
opt_design Syntax
# Run logic optimization with the remap optimization enabled, save results
in a checkpoint, report timing estimates
opt_design -directive ExploreArea
write_checkpoint -force $outputDir/post_opt
report_timing_summary -file $outputDir/post_opt_timing_summary.rpt
The opt_design example script performs logic optimization on the in-memory design, rewriting
it in the process. It also writes a design checkpoint after completing optimization, and generates
a timing summary report and writes the report to the specified file.
Use command line options to restrict optimization to one or more of the listed types. For
example, the following is another method for skipping the block RAM optimization that is run by
default:
Using Directives
Directives provide different modes of behavior for the opt_design command. Only one
directive can be specified at a time. The directive option is incompatible with other options. The
following directives are available:
• ExploreWithRemap: Same as the Explore directive but includes the Remap optimization.
The following table provides an overview of the optimization phase for the different directives.
ExploreSequentialAre
Default Explore ExploreArea ExploreWithRemap RuntimeOptimized
a
Retarget Retarget Retarget Retarget Retarget Retarget
Constant propagation Constant propagation Constant propagation Constant propagation Constant propagation Constant propagation
Sweep Sweep Sweep Sweep Sweep Sweep
BUFG optimization BUFG optimization BUFG optimization BUFG optimization BUFG optimization BUFG optimization
Shift Register Optimization Shift Register Optimization Shift Register Optimization Shift Register Optimization Shift Register Optimization Shift Register Optimization
Block RAM Power Opt1 Control Set Optimization2 Control Set Optimization2 Control Set Optimization2 Control Set Optimization2
MBUFG optimization2 Resynthesis Resynthesis Remap
MBUFG optimization2 MBUFG optimization2 MBUFG optimization2
Notes:
1. Phase run in UltraScale/UltraScale+ designs.
2. Phase run in Versal designs.
The log also displays detailed messages about optimizations that are prevented due to
constraints. Use the -verbose option to see full details of all logic optimization performed by
opt_design. The -verbose option is off by default due to the potential for a large volume of
additional messages. Use the -verbose option if you believe it might be helpful.
RECOMMENDED: To improve tool run time for large designs, use the -verbose option only in shell or
batch mode and not in the GUI mode.
IMPORTANT! The opt_design command operates on the in-memory design. If run multiple times, the
subsequent run optimizes the results of the previous run. Therefore you must reload the synthesized design
before adding either the -debug_log or -verbose options.
You would typically apply the DONT_TOUCH property to leaf cells to prevent them from being
optimized. DONT_TOUCH on a hierarchical cell preserves the cell boundary, but optimization
might still occur within the cell and constants can still be propagated across the boundary. To
preserve a hierarchical net, apply the DONT_TOUCH property to all net segments using the -
segments option of get_nets.
The tools automatically add DONT_TOUCH properties of value TRUE to nets that have
MARK_DEBUG properties of value TRUE. This is done to keep the nets intact throughout the
implementation flow so that they can be probed at any design stage. This is the recommended
use of MARK_DEBUG. However, on rare occasions DONT_TOUCH might be too restrictive and
could prevent optimization such as constant propagation, sweep, or remap, leading to more
difficult timing closure. In such cases, you can set DONT_TOUCH to a value of FALSE, while
keeping MARK_DEBUG TRUE. The risk in doing this is that nets with MARK_DEBUG can be
optimized away and no longer probed.
Property Description
MUXF_REMAP Set to TRUE on MUXF primitives to convert them to LUTs
CARRY_REMAP Set the threshold on CARRY primitives to convert to LUTs
SRL_TO_REG Set to TRUE on SRL primitives to convert them to register chains
REG_TO_SRL Set to TRUE on register chains to convert them to SRL primitives
SRL_STAGES_TO_REG_INPUT Set to the appropriate value on an SRL primitive to move a register
across its input
SRL_STAGES_TO_REG_OUTPUT Set to the appropriate value on an SRL primitive to move a register
across its output
LUT_REMAP Set to TRUE on cascaded LUTs to reduce LUT levels
CONTROL_SET_REMAP Set on registers to specify the type of control signal to remap to LUTs
EQUIVALENT_DRIVER_OPT Set on logically-equivalent drivers to force or prevent merging
CLOCK_BUFFER_TYPE Set on nets to insert corresponding Global Clock buffers
LUT_DECOMPOSE Set on LUTs (LUT5, LUT6) for decomposition to reduce congestion
Power Optimization
Power optimization is an optional step that optimizes dynamic power using clock gating. It can be
used in both Project Mode and Non-Project Mode, and can be run after logic optimization or
after placement to reduce power demand in the design. Power optimization includes AMD
intelligent clock gating solutions that can reduce dynamic power in your design, without altering
functionality.
For more information, see the Vivado Design Suite User Guide: Power Analysis and Optimization
(UG907).
Note that in actual silicon, CEs are actually gating the clock rather than selecting between the D
input and feedback Q output of the flip-flop. This increases the performance of the CE input but
also reduces clock power.
Before After
Power Power
Consumption Consumption
sig sig
CE
X16625-040716
Intelligent clock gating also reduces power for dedicated block RAMs in either simple dual-port
or true dual-port mode, as shown in the following figure.
• Array enable
• Write enable
• Output register clock enable
Most of the power savings comes from using the array enable. The Vivado power optimizer
implements functionality to reduce power when no data is being written and when the output is
not being used.
Before After
address address
data data
data in
out out
data in
ce
X16626-040716
power_opt_design
The power_opt_design command analyzes and optimizes the design. It analyzes and
optimizes the entire design as a default. The command also performs intelligent clock gating to
optimize power.
power_opt_design Syntax
If you do not want to analyze and optimize the entire design, configure the optimizer with
set_power_opt. This lets you specify the appropriate cell types or hierarchy to include or
exclude in the optimization. You can also use set_power_opt to specify the specific Block
RAM cells for optimization in opt_design.
RECOMMENDED: If you want to prevent block RAM Power Optimization on specific block RAMs during
opt_design, use set_power_opt -exclude_cells [get_cells <bram_insts>].
Placement
The Vivado Design Suite placer places cells from the netlist onto specific sites in the target AMD
device. Like the other implementation commands, the Vivado placer works from, and updates,
the in-memory design.
• Timing slack: Placement of cells in timing-critical paths is chosen to minimize negative slack.
• Congestion: The Vivado placer monitors pin density and spreads cells to reduce potential
routing congestion.
Placer Targets
When placing unfixed logic during this stage of placement, the placer adheres to physical
constraints, such as LOC properties and Pblock assignments. It also validates existing LOC
constraints against the netlist connectivity and device sites. Certain IP (such as Memory IP and
GTs) are generated with device-specific placement constraints.
IMPORTANT! Due to the device I/O architecture, a LOC property often constrains cells other than the cell
to which LOC has been applied. A LOC on an input port also fixes the location of its related I/O buffer,
IDELAY, and ILOGIC. Conflicting LOC constraints cannot be applied to individual cells in the input path.
The same applies for outputs and GT-related cells.
Clock resources must follow the placement rules described in the 7 Series FPGAs Clocking
Resources User Guide (UG472), UltraScale Architecture Clocking Resources User Guide (UG572) and
Versal Adaptive SoC Clocking Resources Architecture Manual (AM003). For example, an input that
drives a global clock buffer must be located at a clock-capable I/O site, must be located in the
same upper or lower half of the device for 7 series devices, and in the same clock region for
UltraScale devices. These clock placement rules are also validated against the logical netlist
connectivity and device sites.
If the Vivado placer fails to find a solution for the clock and I/O placement, the placer reports the
placement rules that were violated, and briefly describes the affected cells.
In some cases, the Vivado placer provisionally places cells at sites, and attempts to place other
cells as it tries to solve the placement problem. The provisional placements often pinpoint the
source of clock and I/O placement failure. Manually placing a cell that failed provisional
placement might help placement converge.
TIP: Use place_ports to run the clock and I/O placement step first. Then run place_design . If
port placement fails, the placement is saved to memory to allow failure analysis. For more information, run
place_ports -help from the Vivado Tcl command prompt.
For more information about UltraScale clock tree placement and routing, see the UltraFast Design
Methodology Guide for FPGAs and SoCs (UG949).
Design Creation
NoC Compiler
Performance Analysis
Synthesis
Route
X21272-022321
Following are implementation requirements that might cause the NoC Compiler to be invoked
during design placement:
• Physical location or Pblock constraints applied to the Programmable Logic (PL) that influences
NoC NoC Master Unit (NMU)/ NoC Slave Unit (NSU) placement
• Resolution of the NoC interface between CIPS and NoC for proper assignment to the targeted
device
• Top-level port assignment of DDR memory controller interfaces that results in a change in
DDR memory controller assignment
• Global placement of programmable logic that would influence NoC NMU/NSU placement
TIP: In the IP integrator, you can constrain the location of the DDR memory controller to the
appropriate site in the NoC View to reflect the assignment to perform during design placement. This
improves the NoC QoS results correlation between IP integrator and a fully implemented design.
• The NOC compiler runs in the preplace mode, so the placement of fabric is driven by the
placement of NoC instances that result in better NoC QoS.
Global Placement
Global placement consists of two major phases: floorplanning and physical synthesis.
Floorplanning Phase
During floorplanning, the design is partitioned into clusters of related logic and initial locations
are chosen based on placement of I/O and clocking resources. Pblock constraints are treated as
hard during this phase, even if they have the IS_SOFT property set to True. When targeting SSI
devices, the design is also partitioned into different SLRs to minimize SLR crossings and their
associated delay penalties. Soft SLR floorplan constraints can be applied to guide the logic
partitioning during this phase. For more information about Using Soft SLR Floorplan Constraints,
see the UltraFast Design Methodology Guide for FPGAs and SoCs (UG949).
• Control Set Optimization: Performs control set reduction with more accurate placement
location information. With meaningful initial placement result, the flops are distributed among
the placement area, and flop minimal resource usage for the exactly flops in a small region is
calculated. For the hotspot regions, it finds an optimal solution to reduce the resource usage.
Therefore, reduce the legalization effort to push cells further in downstream and leave more
room for other optimizations. This phase is activated during placer's Explore directive only.
• LUT Decomposition and Combining: LUT Decomposition breaks LUT shapes if it improves
timing (only LUTs with SOFT_HLUTNM property are considered). LUT combining combines
LUTs if it improves utilization.
• Critical Cell Optimization: Critical-Cell Optimization replicates cells in failing paths. If the
loads on a specific cell are placed far apart, the cell might be replicated with new drivers
placed closer to load clusters. This optimizations often applies to nets driving large block RAM
or URAM arrays or large number of DSPs as the sites for these blocks are spread over a wider
area of the device. High fanout is not a requirement for this optimization to occur (slack < 0.5
ns).
• Fanout Optimization: Nets with a MAX_FANOUT property value that is less than the actual
fanout of the net are considered for fanout optimization. The user can force the replication of
a register or a LUT driving a net by adding the FORCE_MAX_FANOUT property to the net.
The value of the FORCE_MAX_FANOUT specifies the maximum physical fanout the nets
should have after the replication optimization. The physical fanout in this case refers to the
actual site pin loads, not the logical loads. For example if the replica drives multiple LUTRAM
loads that are all grouped in the same slice, the combined fanout will be 1 for all of the
LUTRAMs in the same slice. The FORCE_MAX_FANOUT forces the replication during physical
synthesis regardless of the slack of the signal. The user can force replication based on physical
device attributes with the MAX_FANOUT_MODE property. The property can take on the
value of CLOCK_REGION, SLR, or MACRO. For example, the MAX_FANOUT_MODE property
with a value of CLOCK_REGION replicates the driver based on the physical clock region, the
loads placed into same clock region will be clustered together. The MAX_FANOUT_MODE
property takes precedence over the FORCE_MAX_FANOUT property and physical synthesis
will try to honor both by applying MAX_FANOUT_MODE based optimization first and then all
its replicated drivers will inherit the FORCE_MAX_FANOUT property to do further replication
within a clock region. This is illustrated in the following figure example where a register drives
four loads; two registers and two MACRO loads (Block RAM, UltraRAM or DSP). Replication
provides separate drivers for the register loads and MACRO loads and then the driver for the
MACRO loads is replicated until the FORCE_MAX_FANOUT property value is satisfied.
Note: This optimization happens early in the placer. In the later stages of the placer as the timing
accuracy improves, both the replicated source and/or load registers may be moved to different clock
regions or SLRs if the timing estimate improves.
• DSP Register Optimization: DSP Register Optimization can move registers out of the DSP cell
into the logic array or from logic to DSP cells if it improves the delay on the critical path.
• Shift Register to Pipeline Optimization: Shift Register to Pipeline Optimization turns a shift
register with fixed length to dynamically adjusted register pipeline and places the pipeline
optimally to improve timing. Only SRLs with the PHYS_SRL2PIPELINE attribute set to TRUE
are considered for this optimization. The pull/push of FFs happens on the SRL's Q-pin. The
SRL length needs to be fixed and dynamic SRLs are not supported for this optimization.
• Shift Register Optimization: The shift register optimization improves timing on negative slack
paths between shift register cells (SRLs) and other logic cells.
• Block RAM Register Optimization: Block RAM Register Optimization can move registers out
of the block RAM cell into the logic array or from logic to block RAM cells if it improves the
delay on the critical path.
• URAM Register Optimization: UltraRAM Register Optimization can move registers out of the
UltraRAM cell into the logic array or from logic to UltraRAM cells if it improves the delay on
the critical path.
For more information on these optimizations see Available Physical Optimizations in the Physical
Optimization section. Physical synthesis in the placer is run by default in all of the placer
directives. At the end of the physical synthesis phase, a table shows the summary of
optimizations.
Detailed Placement
Detailed placement takes the design from the initial global placement to a fully-placed design,
generally starting with the largest structures (which serve as good anchors) down to the smallest.
The detail placement process begins by placing large macros such as multi-column URAM, block
RAM, and DSP block arrays, followed by LUTRAM array macros, and smaller macros such as
user-defined XDC Macros. Logic placement is iterated to optimize wirelength, timing, and
congestion. LUT-FF pairs are packed into CLBs with the additional constraints that registers in
the CLB must share common control sets.
Post-Placement Optimization
After all logic locations have been assigned, Post-Placement Optimization performs the final
steps to improve timing and congestion. These include improving critical path placement, BUFG
Replication, and the optional BUFG insertion phase. In the BUFG Replication phase, BUFG driven
nets that span multiple SLRs will receive their own BUFG driver for each SLR for non-Versal
devices. For Versal devices, the replication is done per VNOC base and per SLR replication is
done before clock region placement. The optimization is skipped in case of placement or routing
conflicts, constraints that would prevent replication, or timing degradation. In the BUFG insertion
phase, the placer can route high fanout nets on global routing tracks to free up fabric routing
resources. High-fanout nets (fanout > 1,000 for UltraScale and UltraScale+ and fanout > 10,000
for Versal) driving control signals with a slack greater than 1.0 ns are considered for this
optimization. The loads are split between critical loads and high positive slack loads. The high
positive slack loads are driven through a BUFGCE which is placed at the nearest available site to
the original driver, whereas the critical loads remain connected to the original driver. This
optimization is performed only if there is no timing degradation. The optimization is also skipped
if netlist editing required by the optimization fails. BUFG Insertion is on by default and can be
disabled with the -no_bufg_opt option.
RECOMMENDED: Run report_timing_summary after placement to check the critical paths. Paths
with very large negative setup slack might need logic restructuring, physical optimization, or floorplanning
to achieve timing closure.
place_design
The place_design command runs placement on the design. Like the other implementation
commands, place_design is re-entrant in nature. For a partially placed design, the Vivado
placer uses the existing placement as the starting point instead of starting from scratch.
place_design Syntax
The place_design example script places the in-memory design. It then writes a design
checkpoint after completing placement, generates a timing summary report, and writes the
report to the specified file.
place_design -clock_vtree_type
The -clock_vtree_type option is used in place_design to specify the type of clock tree to
be used. The valid values are balanced, intraSLR, and interSLR.
Use the -clock_vtree_type option to select the clock tree that minimizes the clock skew for
the types of timing challenges seen in the design:
Note: Placer clock v-tree type properties are case sensitive. Using the wrong case generates an error
message stopping the placer flow.
Using Directives
Directives provide different modes of behavior for the place_design command. Only one
directive can be specified at a time. The directive option is incompatible with other options with
the exception of -no_fanout_opt, -no_bufg_opt, -quiet, and -verbose. Use the -
directive option to explore different placement options for your design.
Placer Directives
Because placement typically has the greatest impact on overall design performance, the Placer
has the most directives of all commands. The following table shows which directives might
benefit which types of designs.
Available Directives
• Explore: Higher placer effort in detail placement and post-placement optimization.
• EarlyBlockPlacement: Timing-driven placement of RAM and DSP blocks. The RAM and DSP
block locations are finalized early in the placement process and are used as anchors to place
the remaining logic.
• ExtraNetDelay_high: Increases estimated delay of high fanout and long-distance nets. This
directive can improve timing of critical paths that meet timing after place_design but fail
timing in route_design due to overly optimistic estimated delays. Two levels of pessimism are
supported: high and low. ExtraNetDelay_high applies the highest level of pessimism.
• ExtraNetDelay_low: Increases estimated delay of high fanout and long-distance nets. This
directive can improve timing of critical paths that have met timing after place_design but
fail timing in route_design due to overly optimistic estimated delays. Two levels of
pessimism are supported: high and low. ExtraNetDelay_low applies the lowest level of
pessimism.
• SSI_SpreadLogic_high: Spreads logic throughout the SSI device to avoid creating congested
regions. Two levels are supported: high and low. SpreadLogic_high achieves the highest level
of spreading.
• SSI_SpreadLogic_low: Spreads logic throughout the SSI device to avoid creating congested
regions. Two levels are supported: high and low. SpreadLogic_low achieves a minimal level of
spreading.
• AltSpreadLogic_low: Spreads logic throughout the device to avoid creating congested regions.
Three levels are supported: high, medium, and low. AltSpreadLogic_low achieves a minimal
level of spreading.
• ExtraTimingOpt: Use an alternate set of algorithms for timing-driven placement during the
later stages.
• SSI_SpreadSLLs: Partition across SLRs and allocate extra area for regions of higher
connectivity.
• SSI_BalanceSLLs: Partition across SLRs while attempting to balance SLLs between SLRs.
• SSI_HighUtilSLRs: Force the placer to attempt to place logic closer together in each SLR.
• RuntimeOptimized: Run fewest iterations, trade higher design performance for faster run
time.
• Quick: Absolute, fastest run time, non-timing-driven, performs the minimum required for a
legal design.
Auto Directives
When closing timing on challenging designs, users may choose to run many different
place_design directives in order to select the best timing result. Auto directives use machine
learning to predict the best directives to run. Users can benefit by only running these directives
instead of the full sweep of directives listed in Available Directives.
Note: When running with the Auto directives, the directive setup happens slightly later in the flow than
when the directive is directly specified which can result is slightly different results.
To enable the feature, set the place_design -directive <value> where value is:
INFO: [Place 30-746] Post Placement Timing Summary WNS=0.022. For the most
accurate timing information please run report_timing.
For greater accuracy at the expense of slightly longer run time, you can use the -
timing_summary option to force the placer to report the timing summary based on the results
from the static timing engine.
where:
The -verbose option is off by default due to the potential for a large volume of additional
messages. Use the -verbose option if you believe it might be helpful.
Auto-Pipelining
You can optionally insert additional pipeline registers during placement to address timing closure
challenges on specific buses and interfaces.
You can enable this feature in the IP Configuration Wizard. Set the Register Slice Options (REG_*)
to Multi SLR Crossing. In addition, set the Use timing-driven pipeline insertion for all Multi-SLR
channels option to 1 to enable auto-pipelining. The following figure shows an example.
Figure 15: Example AXI Register Slice IP Settings to Enable Auto-Pipelining Feature
All nets that belong to the same AUTOPIPELINE_GROUP must have an equal number of pipeline
registers inserted on each tagged signal. Following are additional considerations:
• Only apply the AUTOPIPELINE_* properties to registers with no clock enable and no reset
control signals.
• Create distinct hierarchies for both sides of the interface, and apply a different
USER_SLR_ASSIGNMENT with a different string to each side. The strings must not be
SLR<n>. The soft floorplanning constraints guide the Vivado placer to move the two groups of
registers to different SLRs as needed to improve timing QoR. For example, if hierarchy hierA
includes the source registers, and hierB includes the destination registers, you must add the
following constraints:
set_property USER_SLR_ASSIGNMENT apSrcGrpA [get_cells hierA]
set_property USER_SLR_ASSIGNMENT apDstGrpB [get_cells hierB]
IMPORTANT! The auto-pipelining feature changes the latency of the design. Therefore, you must ensure
the functionality remains correct for the specified AUTOPIPELINE_LIMIT range. If the handshake circuitry
is required, you must add appropriate logic, such as a FIFO, with enough depth to support backpressure
without losing data. The Vivado tools do not verify the correctness of the design logic.
Note: For the best timing QoR results, the auto-pipeline properties must be set on registers without clock
enable or reset logic.
The following figure shows how the auto-pipeline properties are used in the AXI Register Slice
RTL.
The following logic diagram shows one AXI channel of the AXI Register Slice with nets tagged
with auto-pipeline properties.
AXI Master
AUTOPIPELINE_MODULE=1
AUTOPIPELINE_GROUP=”fwd”
AUTOPIPELINE_GROUP=”resp” AUTOPIPELINE_LIMIT=24
AUTOPIPELINE_INCLUDE=”resp”
PUSH
FIFO
32-deep (fixed)
POP EMPTY
AXI Slave
X22928-061419
• Summary of Latency Increase due to Auto-Pipeline Insertion: This table details the number of
pipeline stages inserted for each group.
• Summary of Physical Synthesis Optimizations: This table shows the total number of inserted
pipeline registers and the number of auto-pipeline groups optimized (Optimized Cells/Nets).
The following figure shows an example of the Summary of Latency Increase Due to Auto-Pipeline
Insertion table.
The following figure shows an example of the Summary of Physical Synthesis Optimizations
table.
Figure 19: Summary of Physical Synthesis Options for Auto Pipeline Table
The inserted pipeline registers can be retrieved based on their names as follows:
The following figure shows the path from SLR2 to SLR0 where nine pipeline stages were
automatically inserted during place_design.
The following figure shows the same example in the Device view.
Physical Optimization
Physical optimization performs timing-driven optimization on the negative-slack paths of a
design. Physical optimization has two modes of operation: post-place and post-route.
In post-place mode, optimization occurs based on timing estimates based on cell placement.
Physical optimization automatically incorporates netlist changes due to logic optimizations and
places cells as needed.
IMPORTANT! Post-route physical optimization is most effectively used on designs that have a few failing
paths. Using post-route physical optimization on designs with WNS<-0.200 ns or more than 200 failing
end points can result in long run time with little improvement to QoR.
Overall physical optimization is more aggressive in post-place mode, where there is more
opportunity for logic optimization. In post-route mode, physical optimization tends to be more
conservative to avoid disrupting timing-closed routing. Before running, physical optimization
checks the routing status of the design to determine which mode to use, post-place or post-
route.
If a design does not have negative slack, and a physical optimization with a timing based
optimization option is requested, the command exits quickly without performing optimization. To
balance runtime and design performance, physical optimization does not automatically attempt
to optimize all failing paths in a design. Only the top few percent of failing paths are considered
for optimization. So it is possible to use multiple consecutive runs of physical optimization to
gradually reduce the number of failing paths in the design.
post-place post-route
Option Name
valid default valid default
Critical Cell Y1 Y1 Y1 N
Optimization
Fanout Optimization Y1 Y1 N N/A
Very High Fanout Y1 Y1 N N/A
Optimization
Interconnect Retiming Y2 Y2 Y2 Y2
Critical Cell Group Y2 Y2 N N/A
Optimization
post-place post-route
Option Name
valid default valid default
Clock Optimization Y2 Y2 Y Y
DSP Register Y Y N N/A
Optimization
Block RAM Register Y Y N N/A
Optimization
URAM Register Y Y N N/A
Optimization
Shift Register Y Y N N/A
Optimization
Critical Pin Y Y Y Y
Optimization
LUT Restructure Y Y Y Y
Optimization
Single LUT Y2 Y2 Y2 Y2
Optimization
LUT Cascade Y2 Y2 N N/A
Optimization
Placement Y1 Y1 Y1 Y1
Optimization
Routing Optimization N N/A Y Y
Block RAM Enable Y1 N N N/A
Optimization
Hold-Fixing Y N Y N
Negative-Edge FF Y N N N/A
Insertion
Laguna Hold-Fix N N/A Y1 N
Optimization
Forced Net Replication Y N N N/A
SLR-Crossing Y1 Y1 Y1 Y1
Optimization
Notes:
1. For UltraScale only.
2. For Versal devices only.
Fanout Optimization
High-Fanout Optimization works as follows:
1. High fanout nets, with negative slack within a percentage of the WNS, are considered for
replication.
2. Loads are clustered based on proximity, and drivers are replicated and placed for each load
cluster.
TIP: Replicated objects are named by appending _replica to the original object name, followed by the
replicated object count.
Placement Optimization
Optimizes placement on the critical path by re-placing all the cells in the critical path to reduce
wire delays.
Routing Optimization
Optimizes routing on critical paths by re-routing nets and pins with shorter delays.
Restructure Optimization
Optimizes the critical path by swapping connections on LUTs to reduce the number of logic levels
for critical signals. LUT equations are modified to maintain design functionality.
Critical-Cell Optimization
Critical-Cell Optimization replicates cells in failing paths. If the loads on a specific cell are placed
far apart, the cell might be replicated with new drivers placed closer to load clusters. High fanout
is not a requirement for this optimization to occur, but the path must fail timing with slack within
a percentage of the worst negative slack.
If there are timing violations to or from shift register cells (SRL16E or SRLC32E), the optimization
extracts a register from the beginning or end of the SRL register chain and places it into the logic
fabric to improve timing. The optimization shortens the wirelength of the original critical path.
The optimization only moves registers from a shift register to logic fabric, but never from logic
fabric into a shift register, because the latter never improves timing.
• The SRL address must be one or greater, such that there are register stages that can be moved
out of the SRL.
• The SRL address must be a constant value, driven by logic 1 or logic 0.
• There must be a timing violation ending or beginning from the SRL cell that is among the
worst critical paths.
• SRLC32E that are chained together to form larger shift registers are not optimized.
• SRLC32E using a Q31 output pin.
• SRL16E that are combined into a single LUT with both O5 and O6 output pins used.
Registers moved from SRLs to logic fabric are FDRE cells. The FDRE and SRL INIT properties are
adjusted accordingly as is the SRL address. Following is an example.
A critical path begins at a shift register (SRL16E) srl_inste, as shown in the following figure.
After shift register optimization, the final stage of the shift register is pulled from the SRL16E and
placed in the logic fabric to improve timing, as shown in the following figure.
The srl_inste SRL16E address is decremented to reflect one fewer internal register stage.
The original critical path is now shorter as the srlopt register is placed closer to the downstream
cells and the FDRE cell has a relatively faster clock-to-output delay.
Consider the following logical path, SRL + FFs + SRL, where registers between SRLs have
AUTOPIPELINE attributes set.
Although FFs have the AUTOPIPLINE attribute, they are combined into SRL/s after shift register
optimization.
As a result, the above circuit is converted into the following SRL cell.
Pre-placement block RAM power optimization restructures the logic driving block RAM read and
write enable inputs, to reduce dynamic power consumption. After placement, the restructured
logic might become timing-critical. The block RAM enable optimization reverses the enable-logic
optimization to improve the slack on the critical enable-logic paths.
Hold-Fixing
Hold-Fixing attempts to improve the slack of high-hold violators by increasing the delay on the
hold critical path.
Aggressive Hold-Fixing
Performs optimizations to insert data path delay to fix hold-time violations. This optimization
considers significantly more hold violations than the standard hold-fix algorithm.
TIP: Hold-Fixing only fixes hold time violations above a certain threshold. This is because the router is
expected to fix any hold time violations that are less than the threshold.
Interconnect Retiming
Performs interconnect retiming to improve critical path timing by movement or replication of a
FF or LUT-FF pair. This is applicable to Versal devices only.
Replication is based on load placements and requires manual analysis to determine if replication
is sufficient. If further replication is required, nets can be replicated repeatedly by successive
commands. Although timing is ignored, the net must be in a timing-constrained path to trigger
the replication.
SLR-Crossing Optimization
Performs post-place or post-route optimizations to improve the path delay of inter-SLR
connections. The optimization adjusts the locations of the driver, load, or both along the SLR
crossing. Replication is supported in post-route optimization if the driver has inter- and intra-SLR
loads. A TNS cleanup option is supported with the -tns_cleanup switch with the -
slr_crossing_opt switch. TNS cleanup allows some slack degradation on other paths when
performing inter-SLR path optimization as long as the overall WNS of the design does not
degrade. For UltraScale devices, either a TX_REG or an RX_REG SLL register can be targeted. In
UltraScale+ devices both, TX_REG and RX_REG registers on the same inter-SLR connection can
be targeted.
Clock Optimization
Creates useful skew between critical path start and endpoints. To improve setup timing, buffers
are inserted to delay the destination clock.
TIP: Use the group_path Tcl command to set up the path groups that are targeted for optimization.
A summary, as shown in the following figure, is provided at the end of physical optimization
showing statistics of each optimization phase and its impact on design performance. This
highlights the types of optimizations that are most effective for improving WNS.
phys_opt_design
The phys_opt_design command runs physical optimization on the design. It can be run in
post-place mode after placement and in post-route mode after the design is fully-routed.
phys_opt_design Syntax
Note: The -tns_cleanup option can only be run in conjunction with the -slr_crossing_opt option.
open_checkpoint top_placed.dcp
The phys_opt_design example script runs both post-place and post-route physical
optimization. First, the placed design is loaded from a checkpoint, followed by post-place
phys_opt_design. The checkpoint and timing results are saved. Next the design is routed,
with progress saved afterwards. That is followed by post-route phys_opt_design and saving
the results. Note that the same command phys_opt_design is used for both post-place and
post-route physical optimization. No explicit options are used to specify the mode.
Using Directives
Directives provide different modes of behavior for the phys_opt_design command. Only one
directive can be specified at a time, and the directive option is incompatible with other options.
The available directives are described below.
• Explore: Run different algorithms in multiple passes of optimization, including replication for
very high fanout nets, SLR crossing optimization, and a final phase called Critical Path
Optimization where a subset of physical optimizations are run on the top critical paths of all
endpoint clocks, regardless of slack.
TIP: Hold-Fixing only fixes hold time violations above a certain threshold. This is because the router is
expected to fix any hold time violations that are less than the threshold.
• AggressiveExplore: Similar to Explore but with different optimization algorithms and more
aggressive goals. Includes a SLR crossing optimization phase that is allowed to degrade WNS
which should be regained in subsequent optimization algorithms. Also includes a hold
violation fixing optimization.
• AddRetime: Performs the default phys_opt_design flow and adds register retiming.
• AlternateFlowWithRetiming: Perform more aggressive replication and DSP and block RAM
optimization, and enable register retiming.
• RuntimeOptimized: Run fewest iterations, trade higher design performance for faster run
time.
TIP: All directives are compatible with both post-place and post-route versions of phys_opt_design.
The -verbose option is off by default due to the potential for a large volume of additional
messages. Use the -verbose option if you believe it might be helpful.
IMPORTANT! The phys_opt_design command operates on the in-memory design. If run twice, the
second run optimizes the results of the first run.
For more information, see section Synthesis Attributes in the Vivado Design Suite User Guide:
Synthesis (UG901).
The DONT_TOUCH property is typically placed on leaf cells to prevent them from being
optimized. DONT_TOUCH on a hierarchical cell preserves the cell boundary, but optimization
can still occur within the cell.
The tools automatically add DONT_TOUCH properties of value TRUE to nets that have
MARK_DEBUG properties of value TRUE. This is done to keep the nets intact throughout the
implementation flow so that they can be probed at any design stage. This is the recommended
use of MARK_DEBUG. However, there might be rare occasions on which the DONT_TOUCH is
too restrictive and prevents optimizations such as replication and retiming, leading to more
difficult timing closure. In those cases DONT_TOUCH can be set to a value of FALSE while
keeping MARK_DEBUG TRUE. The consequence of removing the DONT_TOUCH properties is
that nets with MARK_DEBUG can be optimized away and no longer probed. If a MARK_DEBUG
net is replicated, only the original net retains MARK_DEBUG, not the replicated nets.
The reports are available only for post-placement phys_opt_design optimizations. The
reports are not cumulative. Each phys_opt run has a different phys_opt report that only
accounts for the changes made during that particular run of phys_opt_design.
The following report example shows the first entry of a fanout optimization involving a register
named pipeline_en. The following details are shown in the report:
1. The original driver pipeline_en drives 816 loads and the paths containing this high fanout net
fail timing with WNS of -1.057 ns.
2. The driver pipeline_en was replicated to create one new cell, pipeline_en_replica.
3. The 816 loads were split between pipeline_en_replica, which takes 386 loads, and the original
driver pipeline_en, which takes the remaining 430 loads.
4. After replication and placement of pipeline_en_replica, the WNS of pipeline_en_replica paths
is +0.464 ns, and the WNS of pipeline_en paths is reduced to zero.
5. The placement of the original driver pipeline_en was changed to improve WNS based on the
locations of its reduced set of loads.
opt_design opt_design
read_iphys_opt_tcl
place_design place_design
iphys_opt_design
Tcl script
phys_opt_design phys_opt_design
write_iphys_opt_tcl
... ...
X15050-040716
Two runs are involved, which are the “original run,” where phys_opt_design is run after
place_design and the “replay run,” where phys_opt_design netlist changes are performed
before placement.
After the original run, the phys_opt_design optimizations are saved to a Tcl script file using
the Tcl command write_iphys_opt_tcl. The script contains a series of
iphys_opt_design Tcl commands to recreate exactly the design changes performed by
phys_opt_design in the original run. You can save the optimizations from the current design
in memory or after opening an implemented design or checkpoint where phys_opt_design
has performed optimization.
The same design and constraints are used for the replay run. Before place_design runs, the
read_iphys_opt_tcl command processes the iphys_opt_design command script and
applies the netlist changes from the original run. As a result of the netlist changes, the design in
the replay run might be more suitable for placement than the original run. The design now
incorporates the benefits of the phys_opt_design optimizations before placement, such as
fewer high-fanout nets after replication and fewer long distance paths from block RAM outputs.
opt_design opt_design
place_design place_design
write_iphys_opt_tcl read_iphys_opt_tcl
... ...
X15049-040716
Typically, you would use this flow to gain more control over the post-place phys_opt_design
step. Custom "recipes" are created from combinations of replayed optimizations and new
optimizations resulting in many possibilities for exploration of design closure.
write_iphys_opt_tcl
This command writes a file containing the iphys_opt_design Tcl commands corresponding to
the physical optimizations performed in the current design.
Syntax:
The -place option directs the command to include placement information with the
iphys_opt_tcl commands. Use this option when you intend to apply placement with netlist
changes during iphys_opt_design command replay.
The write_iphys_opt_tcl command can be used any time after phys_opt_design has
been run.
read_iphys_opt_tcl
This command reads a file containing the iphys_opt_design Tcl commands corresponding to
the physical optimizations performed in a previous run.
Syntax:
[-restruct_opt] [-equ_drivers_opt]
[-include_skipped_optimizations] [-create_bufg]
[-insert_negative_edge_ffs] [-hold_fix]
[-slr_crossing_opt] [-quiet]
[-verbose] [<input>]
• -fanout_opt
• -critical_cell_opt
• -placement_opt
• -restruct_opt
• -dsp_register_opt
• -bram_register_opt
• -uram_register_opt
• -shift_register_opt
• -insert_negative_edge_ffs
• -slr_crossing_opt
• -critical_pin_opt
• -replicate_cell
• -forward_retime
• -backward_retime
• -shift_register_to_pipeline
• -auto_pipeline
• -pipeline_to_shift_register
• -equ_drivers_opt
• -create_bufg
Apply the skipped optimizations that are defined in the input Tcl script, as well as the standard
optimizations. These are optimizations identified by phys_opt_design that are skipped
because suitable locations for optimized logic cannot be found. When this option is specified, the
iphys_opt_design command will attempt to use the included skipped optimizations in the
pre-placement netlist.
iphys_opt_design
RECOMMENDED: Avoid using the Tcl source command to execute a script of iphys_opt_design
commands. For most efficient processing of commands and for fastest runtime, use the
read_iphys_opt_tcl command instead.
Syntax
Routing
The Vivado router performs routing on the placed design, and performs optimization on the
routed design to resolve hold time violations.
The Vivado router starts with a placed design and attempts to route all nets. It can start with a
placed design that is unrouted, partially routed, or fully routed.
For a partially routed design, the Vivado router uses the existing routes as the starting point,
instead of starting from scratch. For a fully-routed design, the router checks for timing violations
and attempts to re-route critical portions to meet timing.
The router provides options to route the entire design or to route individual nets and pins.
When routing the entire design, the flow is timing-driven, using automatic timing budgeting
based on the timing constraints.
Routing individual nets and pins can be performed using two distinct modes:
• Auto-Delay mode
The Interactive Router mode uses fast, lightweight timing modeling for greater responsiveness in
an interactive session. Some delay accuracy is sacrificed with the estimated delays being
pessimistic. Timing constraints are ignored in this mode, but there are several choices to
influence the routing:
• Resource-based routing (default): The router chooses from the available routing resources,
resulting in the fastest router runtime.
• Smallest delay (the -delay option): The router tries to achieve the smallest possible delay
from the available routing resources.
• Delay-driven (the -max_delay and -min_delay options): Specify timing requirements
based on a maximum delay, minimum delay, or both. The router tries to route the net with a
delay that meets the specified requirements.
In Auto-Delay mode, the router runs the timing-driven flow with automatic timing budgeting
based on the timing constraints, but unlike the default flow, only the specified nets or pins are
routed. This mode is used to route critical nets and pins before routing the remainder of the
design. This includes nets and pins that are setup-critical, hold-critical, or both. Auto-Delay mode
is not intended for routing individual nets in a design containing a significant amount of routing.
Interactive routing should be used instead.
For best results when routing many individual nets and pins, prioritize and route these
individually. This avoids contention for critical routing resources.
Routing requires a one-time “run time hit” for initialization, even when editing routes of nets and
pins. The initialization time increases with the size of the design and with the size of the device.
The router does not need to be re-initialized unless the design is closed and reopened.
Routing Priorities
The Vivado Design Suite routes global resources first, such as clocks, resets, I/O, and other
dedicated resources.
This default priority is built into the Vivado router. The router then prioritizes data signals
according to timing criticality.
Before you experiment with router settings, make sure that you have validated the constraints
and the timing picture seen by the router. Validate timing and constraints by reviewing timing
reports from the placed design before routing.
• Cross-clock paths and multi-cycle paths in which a positive hold time requirement causes
route delay insertion
• Congested areas, which can be addressed by targeted fanout optimization in RTL synthesis or
through physical optimization
RECOMMENDED: Review timing constraints and correct those that are invalid (or consider RTL changes)
before exploring multiple routing options. For more information, see section Checking That Your Design is
Properly Constrained in UltraFast Design Methodology Guide for FPGAs and SoCs (UG949).
TIP: When you run route_design -directive Explore, the router timing summary is based on
signoff timing.
IMPORTANT! You must check the actual signoff timing using report_timing_summary or run
route_design with the -timing_summary option.
route_design
The route_design command runs routing on the design.
route_design Syntax
Using Directives
When routing the entire design, directives provide different modes of behavior for the
route_design command. Only one directive can be specified at a time. The directive option is
incompatible with most other options to prevent conflicting optimizations. The following
directives are available:
• Explore: Allows the router to explore different critical path placements after an initial route.
• AggressiveExplore: Directs the router to further expand its exploration of critical path
routes while maintaining original timing budgets. The router runtime might be significantly
higher compared to the Explore directive because the router uses more aggressive
optimization thresholds to attempt to meet timing constraints.
• NoTimingRelaxation: Prevents the router from relaxing timing to complete routing. If the
router has difficulty meeting timing, it runs longer to try to meet the original timing
constraints.
• MoreGlobalIterations: Uses detailed timing analysis throughout all stages instead of just
the final stages, and runs more global iterations even when timing improves only slightly.
• HigherDelayCost: Adjusts the internal cost functions of the router to emphasize delay
over iterations, allowing a tradeoff of compile time for better performance.
• RuntimeOptimized: Run fewest iterations, trade higher design performance for faster run
time.
• Quick: Absolute, fastest compile time, non-timing-driven, performs the minimum required for
a legal design.
• NoTimingRelaxation
• MoreGlobalIterations
• HigherDelayCost
• AdvancedSkewModeling
• AggressiveExplore
• -nets: This limits operation to only the list of nets specified. The option requires an argument
that is a Tcl list of net objects. Note that the argument must be a net object, the value
returned by get_nets, as opposed to the string value of the net names.
• -pins: This limits operation only to the specified pins. The option requires an argument,
which is a Tcl list of pin objects. Note that the argument must be a pin object, the value
returned by get_pins, as opposed to the string value of the pin names.
• -delay: By default, the router routes individual nets and pins with the fastest run time, using
available resources without regard to timing criticality. The -delay option directs the router
to find the route with the smallest possible delay.
• -min_delay and -max_delay: These options can be used only with the pin option and to
specify a desired target delay in picoseconds. The -max_delay option specifies the
maximum desired slow-max corner delay for the routing of the specified pin. Similarly the -
min_delay option specifies the minimum fast-min corner delay. The two options can be
specified simultaneously to create a desired delay range.
• -auto_delay: Use with -nets or -pins option to route in timing constraint-driven mode.
Timing budgets are automatically derived from the timing constraints so this option is not
compatible with -min_delay, -max_delay, or -delay.
• -preserve: This option routes the entire design while preserving existing routing. Without -
preserve, the existing routing is subject to being unrouted and re-routed to improve critical-
path timing. This option is most commonly used when "pre-routing" critical nets, that is,
routing certain nets first to ensure that they have best access to routing resources. After
achieving those routes, the -preserve option ensures they are not disrupted while routing
the remainder of the design. Note that -preserve is completely independent of the
FIXED_ROUTE and IS_ROUTE_FIXED net properties. The route preservation lasts only for the
duration of the route_design operation in which is it used. The -preserve option can be
used with -directive, with one exception, the -directive Explore option, which
modifies placement, which in turn modifies routing.
• -unroute: The -unroute option removes routing for the entire design or for nets and pins,
when combined with the nets or pin options. The option does not remove routing for nets
with FIXED_ROUTE properties. Removing routing on nets with FIXED_ROUTE properties
requires the properties to be removed first.
• -timing_summary: The router outputs a final timing summary to the log, based on its
internal estimated timing which might differ slightly from the actual routed timing due to
pessimism in the delay estimates. The -timing_summary option forces the router to call the
Vivado static timing analyzer to report the timing summary based on the actual routed delays.
This incurs additional run time for the static timing analysis. The -timing_summary is
ignored when the -directive Explore option is used.
When the -directive Explore option is used, routing always calls the Vivado static
timing analyzer for the most accurate timing updates, whether or not the -timing_summary
option is used.
• -tns_cleanup: For optimal run time, the router focuses on improving the Worst Negative
Slack (WNS) path as opposed to reducing the Total Negative Slack (TNS). The -tns_cleanup
option invokes an optional phase at the end of routing, during which the router attempts to fix
all failing paths to reduce the TNS. Consequently, this option might reduce TNS at the
expense of run time but might not affect WNS. Use the -tns_cleanup option during
routing when you intend to follow router runs with post-route physical optimization. Use of
this option during routing ensures that physical optimization focuses on the WNS path and
that effort is not wasted on non-critical paths that can be fixed by the router. Running
route_design -tns_cleanup on an already routed design only invokes the TNS cleanup
phase of the router and does not affect WNS (TNS cleanup is re-entrant). This option is
compatible with -directive.
• -ultrathreads: This option shortens router runtime at the expense of repeatability. With -
ultrathreads, the router runs faster but there is a very small variation in routing between
identical runs.
For UltraScale+ designs, this step is required if placement and routing of registers was
changed as part of an ECO task.
• -no_timing_driven: This option disables timing-driven routing and is used primarily for
testing the routing feasibility of a design.
• -eco: This option is used with incremental mode to get a shorter runtime after some ECO
modifications to the design while keeping the routability and timing closure.
The router provides info in the log to indicate progress, such as the current phase (initialization,
global routing iterations, and timing updates). At the end of global routing, the log includes
periodic updates showing the current number of overlapping nets as the router attempts to
achieve a fully legalized design. For example:
The timing updates are provided throughout the flow showing timing closure progress.
Timing Summary
where:
Note: Hold time analysis can be skipped during intermediate routing phases. If hold time is not performed,
the router shows a value of "N/A" for WHS and THS.
After routing is complete, the router reports a routing utilization summary and a final estimated
timing summary.
# preserve the routing for $preRoutes and continue with the rest of the
design
route_design -preserve
In this example script, a few critical nets are routed first, followed by routing of the entire design.
It illustrates routing individual nets and pins (nets in this case), which is typically done to address
specific routing issues such as:
• Pre-routing critical nets and locking down resources before a full route.
• Manually unrouting non-critical nets to free up routing resources for more critical nets.
The first route_design command initializes the router and routes essential nets, such as
clocks.
# preserve the routing for $preRoutes and continue with the rest of the
design
route_design -preserve
As in example 2, a few critical nets are routed first, followed by routing of the entire design. The
difference is the use of -auto_delay instead of -delay. The router performs timing-driven
routing of the critical nets, which sacrifices some runtime for greater accuracy. This is particularly
useful for situations in which nets are involved in both setup-critical and hold-critical paths, and
the routes must fall within a delay range to meet both setup and hold requirements.
The strategy in this example script illustrates one possible way to address timing failures due to
congestion. In the example design, some critical nets represented by $myCritNets need routing
resources in the same device region as the nets in instance u0/u1. The nets in u0/u1 are not as
timing-critical, so they are unrouted to allow the critical nets $myCritNets to be routed first,
with the smallest possible delay. Then route_design -preserve routes the entire design.
The -preserve switch preserves the routing of $myCritNets while the unrouted u0/u1 nets
are re-routed. Table 12 summarizes the commands in the example.
Router Messaging
The router provides helpful messages when it struggles to meet timing goals due to congestion
or excessive hold violation fixing. The router commonly exhibits these symptoms when it
struggles:
The router might provide further warning messages when any of the following occurs:
• Congestion is expected to have negative timing closure impact, which typically occurs when
the congestion level is 5 or greater. Level 5 indicates a congested region measuring 32x32
(2^5 = 32).
• The overall router hold-fix effort is expected to be very high, which impacts the ability to meet
overall setup requirements.
• Specific endpoint pins become both setup-critical and hold-critical and it is difficult or
impossible to satisfy both. The message includes the names of up to ten pins for design
analysis. In addition, the router also generates a tight_setup_hold_pins.txt text file
that contains a list of the endpoint pins and the launch and capture clock.
TIP: Use the Pin value from tight_setup_hold_pins.txt file and use the following for
improving the timing paths.
• Specific CLBs experience high pin utilization or high routing resource utilization which results
in local congestion. The messages will include the names of up to ten of the most congested
CLBs.
• In extreme cases with severe congestion, the router warns that congestion is preventing the
router from routing all nets, and the router will prioritize the successful completion of routing
all nets over timing optimizations.
When targeting UltraScale devices or later, the router generates a table showing initial estimated
congestion when congestion might affect timing closure. The table does not show specific
regions but gives a measure of different types of congestion for an overall assessment. The
congestion is categorized into bins of Global (design-wide), Long (connections spanning several
CLBs), and Short Congestion. The tables of different runs can be compared to determine which
have better chances of meeting performance goals without being too negatively impacted by
congestion.
Report Design Analysis provides complexity and congestion analysis that can give further insight
into the causes of congestion and potential solutions. The congestion reporting also includes an
Average Initial Routing Congestion, which is not exactly the same as the congestion reported by
the router, but can be analyzed against the pre-route design to determine which regions are
causing problems. For further information on Report Design Analysis, refer to the Vivado Design
Suite User Guide: Design Analysis and Closure Techniques (UG906).
Note: In some cases, when the design is congested, router completes while leaving a few nets that are not
fully routed. In such cases, Vivado issues a critical warning instead of an error. This is done to ensure that
the flow does not stop abruptly and a dcp is available for further debugging.
Users can change the severity of the error message using the following command: set_msg_config -
id "Route 35-1" -new_severity "ERROR"
In order for the message severity change to take effect, it should be applied before running
route_design for any message that is printed out during route_design. In a project flow, this can be
added to the pre-route Tcl script.
Use the report_route_status command to identify nets with routing errors. For more
information see section Routing in the UltraFast Design Methodology Guide for FPGAs and SoCs
(UG949).
The router reports routing congestion during Route finalize. The highest congested regions are
listed for each direction (North, East, South, and West). For each region, the information includes
the dimensions in routing tiles, the routing utilization labeled "Max Cong," and the bounding box
coordinates (lower-left corner to upper-right corner). The “INT_xxx” numbers are the coordinates
of the interconnecting routing tiles that are visible in the device routing resource view.
Command Function
report_route_status Reports route status for nets
report_timing Performs path endpoint analysis
report_design_analysis Provides information about congested areas
For a complete description of the Tcl reporting commands and their options, see the Vivado
Design Suite Tcl Command Reference Guide (UG835).
Incremental Implementation
Incremental Implementation refers to the implementation phase of the incremental compile
design flow that:
• Preserves QoR predictability by reusing prior placement and routing from a reference design.
• Speeds up place and route compile time or attempts last mile timing closure.
A diagram of the incremental implementation design flow is provided in the following figure.
This diagram also illustrates the incremental synthesis flow. For more details about incremental
synthesis flow, see the "Incremental Synthesis" section in the Vivado Design Suite User Guide:
Synthesis (UG901).
Synthesis Synthesis
Netlist
Reference Revised
Change
Netlist Netlist
Incrementa
l Place &
Normal Route
Place &
Route
Revised
Checkpoint
Reference
Checkpoint Incremental
Run
X16627-040716
Reference Design
The reference design is preferably a fully routed checkpoint from a recent iteration of the same
design. To use a different variant of a design, it is important that the hierarchy names from the
reference design match the incremental design. Whilst it is possible to use a placed checkpoint as
a reference, there will be reduced benefits when compared to a routed checkpoint; timing will
not be as consistent and compile time will be higher.
The reference design must match the device and it is recommended to match the tool version
but not a strict requirement.
Incremental Design
The incremental design is the updated design that is to be run through the implementation tools.
It can include RTL changes, netlist changes, or both but these changes should be typically < 5%.
Prior to issuing the read_checkpoint -incremental command, there is no knowledge that
the incremental implementation flow is being used. Therefore it is important to not introduce
significant netlist changes by changing synth_design or opt_design tool options when
compared with the reference design.
Constraint changes are allowed but general tightening of constraints will significantly impact
placement and routing and is generally best added outside of the incremental flow.
• Physical optimizations that match the ones in the reference run are carried out on the
incremental design automatically.
• The netlist in the incremental design is compared to the reference design to identify matching
cells and nets.
• Placement from the reference design checkpoint is reused to place matching cells in the
incremental design.
• Routing is reused to route matching nets on a per-load-pin basis. If a load pin disappears due
to netlist changes, then its routing is discarded, otherwise it is reused. It is possible to have
partially-reused routes.
Placement and routing information that is reused initially can be discarded throughout the flow if
it improves the performance or aids routability.
Design objects that do not match between the reference design and the current design are
placed after incremental placement is complete and routed after routing is complete.
Incremental Mode
When incremental mode is selected by the user, the tool might still not run the full incremental
flow if it determines the design change to too much. Incremental implementation is typically run
if cell reuse is above 80%.
• A full place and route using incremental optimized algorithms. Placement and routing are
reused as much as is possible. Target WNS is determined by a combination of both the
reference checkpoint and the directive. Directives are selected based on the directive supplied
to the read_checkpoint -incremental -directive <directive> switch.
• A full place and route using the default algorithms. Placement and routing are not reused.
Target WNS is always 0.000. Directives are taken from the directive switch supplied to either
the place_design -directive <directive> or route_design -directive
<directive> commands.
This decision is taken after the design modifications and cell matching process during the
read_checkpoint -incremental command. As this assessment is made after changes to
the design are made to improve matching so it is not the same as running purely a default flow as
the changes are persistent.
INFO: [Place 46-42] Incremental Compile tool flow is being used. Default
place and route algorithms are bypassed.
Automatic Incremental
Automatic Incremental Implementation is designed to leverage the faster compile times of
incremental implementation whilst not impacting quality of results such as WNS. It is a subset of
the full incremental flow with tighter controls to ensure performance does not degrade. It works
to the following criteria:
1. Updating the reference checkpoint only when WNS is >=-0.250 ns. This is only actively
managed in project mode. In non project flow, users must follow the script provided below.
2. Setting higher targets for WNS and reuse during the read_checkpoint -incremental
phase.
• 94% cell matching
• 90% net matching
• WNS >= -0.250 ns
When updating the checkpoint, the following script will ensure that WNS has not degraded
beyond acceptable limits:
Incremental Directives
There are three directives that control how the incremental flow behaves. Incremental directives
are set using the command:
RuntimeOptimized
The RuntimeOptimized directive tries to reuse as much placement and routing information from
the reference run as possible. The timing target will be the same as the reference run. If the
reference run has WNS -0.050, then the incremental run will not try to close timing on this
design and instead also target -0.050. This impacts setup time only. This is the default behavior
when no directive is specified.
TimingClosure
The TimingClosure directive will reuse placement and routing from the reference but it will rip up
paths that do not meet timing and try to close them. Some run time intensive algorithms are run
to get as much timing improvement as possible but as the placement is largely given up front
gains are limited. This technique can be effective on designs with a reference WNS > -0.250 ns.
Note: For further chance of closing timing, run report_qor_suggestions to generate automated
design enhancements.
Quick
Quick is a special mode that does not call the timer during place and route and instead uses the
placement of related logic as a guide. It is the fastest mode but not applicable for most designs.
Designs will need WNS > 1.000 ns to be effective. These are typically ASIC emulation or
prototyping designs.
Note: In versions 2019.1 and before, the same behavior was achieved via directive mapping at
place_design and route_design. The Explore directive was mapped to TimingClosure, Quick mapped to
Quick and other directives mapped to RuntimeOptimized.
CAUTION! Users upgrading from 2019.1 and earlier who are specifying the Explore or Quick directives for
place_design will need to specify the incremental directive to achieve the equivalent functionality in
2020.1.
Further Options
The following options are available when using the read_checkpoint -incremental
command.
-auto_incremental Option
-fix_objects Option
The -fix_objects option can be used to lock a subset of cells. These cells are not touched by
the incremental place and route tools. The -fix_objects option only works on cells that
match and are identified for cell reuse.
Examples
-force_incr Option
The -force_incr option can be used to force the incremental implementation flow
irrespective of the incremental criteria checks. When not specified the incremental
implementation flow might exit and continue in non-incremental or default flow.
This option can be used instead of modifying the incremental implementation configurations
values to update the minimum thresholds for cell matching, net matching, and WNS in the
automatic incremental flow.
• Examine cell, net, I/O and pin reuse in the current run
• Runtimes
• Timing WNS at each stage of the flow
• Tool options
• Tool versions
• iphys_opt_design replaying optimization
• QoR suggestions applied with the incremental flow
By examining the cell reuse and the other factors mentioned above, a user can determine the
effectiveness of the incremental. Where the flow is judged ineffective, a user would typically
update the checkpoint to a newer version of the design or adjust the tool flow. The report is split
into seven sections.
Flow Summary
This reports the general information for the current whole incremental flow:
Reuse Summary
This contains an overview of the cells, nets, pins, and ports that are reused. An example is:
• Matched - Cells that have the same instance name, REF_NAME property or have been
deemed a match if some information differs slightly. This is calculated at the end of
read_checkpoint -incremental
• Initial reuse - Once items are matched, this indicates if the matching information on location
was reused or not. Sometimes items are matched not possible to be reused due to illegal
connectivity or other design requirements.
• Current reuse - This indicates the reuse at the current design phase. Useful when compared
with initial reuse to see how much reuse is lost as you go through the tool flow to generate a
legal solution.
• Fixed - items that incremental flow can not touch. When high, the tools might struggle to
generate a legal solution.
This contains information about the reference checkpoint. From this section you can examine the
following:
An example is:
This contains useful metrics about a comparison with the reference run. From this section you
can compare the following:
• Runtime information
• WNS at each stage of the flow
• Tool options at each stage of the flow.
When using this report it is important to understand that the Incremental run should be
compared to the post route numbers of the Reference run as the placement and routing should
be reused. For the sections that are not reused, these might contain a mixture of unplaced,
placed, or routed information that can impact the accuracy of the metrics reported.
Note: For further understanding, conduct a timing analysis on a checkpoint written out after
read_checkpoint -incremental.
This section contains the iphys_opt_design replaying information which is retrieved from the
reference dcp, along with the RQS suggestions derived, generated, and applied in the current
incremental flow. An example is:
Note: Physical optimizations that have failed to replay result in lower reuse can impact the likelihood of
maintaining the current timing picture.
This section contains the commands executed for flow command comparison. An example is:
Note: In Incremental flow, iphys_opt_replay might not replay all optimizations from the reference run.
Any Non-reused optimization has an impact on the cell or net reuse.
Pay particular attention to the commands used prior to read_checkpoint -incremental to confirm
they are the same and reuse is maximized.
Non-reuse Information
This contains metrics about what was not reused and why. The following is an example:
• The totals used to calculate matched %'s are based on the total number of cells returned by
the list of cells. When not used the totals are the number of cells in the full design
• Nets are assumed to be the nets attached to the list of cells provided
In the following example, the report is limited to only report reuse on block RAM cells:
Note: The sample report has been truncated horizontally and vertically to fit.
The reuse status of each cell is reported, beginning with the top-level hierarchy, then covering
each level hierarchy contained within that level. The report progresses to the lowest level of
hierarchy contained within the first submodule, then moves on to the next one.
In this example, the top level cell is mb_preset_wrapper with a cumulative reuse total of 5339
cells with 0 new cells. The row with mb_preset_wrapper in parentheses shows the cell reuse
status contained within mb_preset_wrapper and but not its submodules. Of the 5339 cells, only
37 are within mp_preset_wrapper and the remainder are within its submodules.
There are five columns indicating cell reuse status at each level, although only the first one
Discarded(Illegal) is shown. These columns have footnote references in the report with further
reasons for discarding reused placement.
Instead of reporting all hierarchical levels, you can use the -hierarchical_depth option to
limit the number of submodules to an exact number of levels. The following is the previous
example, adding -hierarchical_depth of 1:
This limits reporting to the top level mb_preset_wrapper. If you had used a -
hierarchical_depth of 2, the top and each level of hierarchy contained within
mb_preset_wrapper would be reported, but nothing below those hierarchical cells.
Timing Reports
After completing an incremental place and route, you can analyze timing with details of cell and
net reuse. Objects are tagged in timing reports to show the level of physical data reuse. This
identifies whether or not your design updates are affecting critical paths.
• (ROUTING): Both the cell placement and net routing are reused.
• (PLACEMENT): The cell placement is reused but the routing to the pin is not reused.
• (MOVED): Neither the cell placement nor the routing to the pin is reused.
• (NEW): The pin, cell, or net is a new design object, not present in the reference design.
To remove the labels from the timing report, use the report_timing -no_reused_label
option.
Object Properties
The read_checkpoint -incremental command assigns two cell properties which are
useful for analyzing incremental flow results using scripts or interactive Tcl commands.
• IS_REUSED: A boolean property on cell, port, net, and pin objects. The property is set to
TRUE on the respective object if any of the following incremental data is reused:
• A cell placement
• A package pin assignment for a port
• REUSE_STATUS: A string property on cells and nets denoting the reuse status after
incremental placement and routing.
• New
• Reused
• Discarded placement to improve timing
• Discarded illegal placement due to netlist changes
• REUSED
• NON_REUSED
• PARTIALLY_REUSED
TIP: AMD has published several applications in XHUB, in the Incremental Compile package. These
applications include visualization of placement and routing reuse when analyzing critical path and other
design views. Also included is an application for automatic Incremental Compile for the project flow, which
automatically manages reference checkpoints for incremental design runs.
TIP: For more information on how to effectively use incremental compile, see section Incremental Flows in
the UltraFast Design Methodology Guide for FPGAs and SoCs (UG949).
The following is an example of Tcl commands that can set up the incremental flow to use the
TimingClosure directive and reference a static checkpoint:
The following is an example of the Tcl commands required to set up the incremental flow to use
the RuntimeOptimized directive and automatically update the checkpoint:
To disable incremental compile for the current run (or clear the reference to start over again
without a reference checkpoint in Automatic mode), do one of the following:
Vivado Synthesis
Vivado synthesis can be run incrementally with little setup required. This requires a checkpoint to
be read before synthesis. It is setup automatically when running in project mode but for not
project mode it must be added to scripts.
For more information, see Incremental synthesis in Vivado Design Suite User Guide: Synthesis
(UG901).
IP and Block Designs are automatically synthesized in out of context mode and will reuse cached
results when available. For more information see Out-of-Context Design Flow in Vivado Design
Suite User Guide: Design Flows Overview (UG892).
Synplify Synthesis
Synplify can preserve results using the Compile Point flow. Synplify provides two different
compile point flows, which are automatic and manual. In the automatic compile point mode,
compile points are automatically chosen by synthesis, based on existing hierarchy and utilization
estimates. This is a push button mode. Aside from enabling the flow, there is no action required
on your part. To enable, check the Auto Compile Point check box in the GUI or add the following
setting to the Synplify project:
set_option -automatic_compile_point 1
The manual compile point flow offers more flexibility, but requires more interaction to choose
compile points. The flow involves compiling the design, then using either the SCOPE editor
Compile Points tab or the define_compile_point setting. For further information on compile
point flows, see the Synplify online help.
Upon read_checkpoint incr.dcp, the Vivado tools determine that incremental data exists,
and the subsequent place_design and route_design commands run incrementally.
Even if you exit and restart the Vivado Design Suite, in the following command sequence the
route_design command is run in incremental mode, using the routing data from the original
reference checkpoint reference.dcp:
read_checkpoint top_placed.dcp
phys_opt_design
route_design
Constraint Conflicts
Constraints of the revised design can conflict with the physical data of the reference checkpoint.
When conflicts occur, the behavior depends on the constraint used. This is illustrated in the
following examples.
A constraint assigns a fixed location RAMB36_X0Y0 for a cell cell_A. However in the reference
checkpoint reference.dcp, cell_A is placed at RAMB36_X0Y1 and a different cell cell_B is
placed at RAMB36_X0Y0.
In the reference checkpoint there are no Pblocks, but one has been added to the current run.
Where there is a conflict, the placement data from the reference checkpoint is used.
• Low levels of reuse. When reuse is low, having extensive pre-placement of cells can limit the
ability to find optimal locations for new logic. It is recommended to use the default flow in
these cases.
• Change in the critical path. If the path is made worse, for example by adding more cells, timing
is not able to maintain the same performance. For these cases it is recommended to review
the change and further optimize.
• Change is in the congested area of the die. When there is limited scope in the design to accept
new cells sometimes it can be preferable to find a new solution using the default flow. In
these case, users may benefit from running both the default flow and the incremental flow
and seeing which performs the best for each case.
• Directive setting. When RuntimeOptimized is set as the directive, the tools will not try to
improve beyond what the reference is set to.
• The amount of change in timing-critical areas. If critical path placement and routing cannot be
reused, more effort is required to preserve timing. Also, if the small design changes introduce
new timing problems that did not exist in the reference design, higher effort and run time
might be required, and the design might not meet timing.
• The initialization portion of the place and route run time. In short place and route runs, the
initialization overhead of the Vivado placer and router might eliminate any gain from the
incremental place and route process. For designs with longer run times, initialization becomes
a small percentage of the run time.
INFO: [Place 46-2] During incremental compilation, routing data from the
original checkpoint is applied during place_design. As a result, dangling
route segments and route conflicts may appear in the post place_design
implementation due to changes between the original and incremental
netlists. These routes can be ignored as they will be subsequently resolved
by route_design. This issue will be cleaned up automatically in
place_design in a future software release.
report_config_implementation
config_implementation { {incr.ignore_user_clock_uncertainty true} }
Note: You can update more than one element at a time by grouping key value pairs in the same method
shown above within the outer brackets.
• Minimum thresholds for cell matching, net matching, and WNS in the automatic incremental
flow.
• Behavior of both synthesis and implementation when the automatic incremental flow criteria
is not met. This check happens at the beginning of the synthesis run and during
read_checkpoint -incremental for implementation. It can be set to Terminate which
stops the flow or SwitchToDefaultFlow which exits the incremental flow but continues
with default flow settings.
• Whether the flow ignores user clock uncertainty constraints that are typically used to
overconstrain the placer and force closer placement.
Chapter 3
• You can find a run status indicator in the project status bar at the upper right corner of the
AMD Vivado™ IDE, as shown in the following figure. The run status indicator displays a
scrolling bar to indicate that the run is in process. You can click Cancel to end the run.
• You can also find a run status indicator in the Design Runs window, as shown at the bottom
left of the following figure. It displays a circular arrow (noted in red in the figure) to indicate
that the run is in process. You can select the run and use the Reset Run command from the
popup menu to cancel the run.
Select Delete Generated Files to clear the run data from the local project directories.
RECOMMENDED: Delete any data created as a result of a canceled run to avoid conflicts with future
runs.
The Log window, shown in the following figure, can help you understand where different
messages originate to aid in debugging the implementation run.
Pausing Output
Click the Pause output button to pause the output to the Log window. Pausing allows you to
read the log while implementation continues running.
The project status is displayed in the Project summary and the Status bar. It allows you to
immediately see the status of a project when you open the project, or while you are running the
design flow commands, including:
• RTL elaboration
• Synthesis
• Implementation
• Bitstream generation
As the run progresses through the Synthesize, Implement, and Write Bitstream commands, the
Project Status Bar changes to show either a successful or failed attempt. Failures are displayed in
red text.
The project status bar shows an Out-of-Date status. Click more info to display which aspects of
the design are out of date. It might be necessary to rerun implementation, or both synthesis and
implementation.
TIP: The Force-up-to-date command is also available from the popup menu of the Design Runs window
when an out-of-date run is selected.
• Is the design fully placed and routed, or are there issues that need to be resolved?
• Have the timing constraints and design requirements been met, or are their additional changes
required to complete the design?
• Are you ready to generate the bitstream for the AMD part?
For more information on analysis of the implemented design, see section Interactive Design
Analysis in the IDE in the Vivado Design Suite User Guide: Design Analysis and Closure Techniques
(UG906).
In Project Mode, after an implementation run is complete in the Vivado IDE, you are prompted
for the next step, as shown in the following figure.
Viewing Messages
IMPORTANT! Review all messages. The messages might suggest ways to improve your design for
performance, power, area, and routing. Critical warnings might also expose timing constraint problems that
must be resolved.
RECOMMENDED: Open the log file in the Vivado text editor and review the results of all commands for
valuable insights.
• Click the expand and collapse tree widgets to view the individual messages.
• Check the appropriate check box in the banner to display errors, critical warnings, warnings,
and informational messages in the Messages window.
• Select a linked message in the Messages window to open the source file and highlight the
appropriate line in the file.
• Run Search for Answer Record from the Messages window popup menu to search the AMD
Customer Support database for answer records related to a specific message.
The following example of the Incremental Placement Summary includes a final assessment of cell
placement reuse and run time statistics.
+-------------------------------------------------------------------------------+
|Incremental Placement Summary |
+-------------------------------------------------------------------------------+
| Type | Count | Percentage |
+-------------------------------------------------------------------------------+
| Total instances | 33406 | 100.00 |
| Reused instances | 32390 | 96.96 |
| Non-reused instances | 1016 | 3.04 |
| New | 937 | 2.80 |
| Discarded illegal placement due to netlist changes | 16 | 0.05 |
| Discarded to improve timing | 63 | 0.19 |
+-------------------------------------------------------------------------------+
|Incremental Placement Runtime Summary |
+-------------------------------------------------------------------------------+
| Initialization time(elapsed secs) | 79.99 |
| Incremental Placer time(elapsed secs) | 31.19 |
+-------------------------------------------------------------------------------+
The Incremental Routing Summary displays reuse statistics for all nets in the design. The
categories reported include:
• Fully Reused: The entire routing for a net is reused from the reference design.
• Partially Reused: Some of the routing for a net from the reference design is reused. Some
segments are re-routed due to changed cells, changed cell placements, or both.
• New/Unmatched: The net in the current design was not matched in the reference design.
---------------------------------------------------------
|Incremental Routing Reuse Summary |
---------------------------------------------------------
|Type | Count | Percentage |
---------------------------------------------------------
|Fully reused nets | 30393| 96.73 |
|Partially reused nets | 0| 0.00 |
|Non-reused nets | 1028| 3.27 |
---------------------------------------------------------
• Uses the -file option to direct the output of the report to a file.
• Uses the -name option to direct the output of the report to a Vivado IDE window.
Figure 41: Control Sets Report shows an example of a report opened in a Vivado IDE window.
TIP: The directory to which the reports are to be written must exist before running the report, or the file
cannot be saved, and an error message will be generated.
The Reports window usually opens automatically after synthesis or implementation commands
are run. If the window does not open do one of the following:
TIP: The tcl.pre and tcl.post options of an implementation run let you output custom reports at
each step in the process. These reports are not listed in the Reports window, but can be customized to
meet your specific needs. For more information, see Changing Implementation Run Settings.
The reports available from the Reports window contain information related to the run. The
selected report opens in text form in the Vivado IDE, as shown in the following figure.
For example, the Reports window includes a text-based Timing Summary Report under Route
Design (as shown in Figure 40).
When analyzing timing, it is helpful to see the design data associated with critical paths, including
placement and routing resources in the Device window.
To regenerate the report in the Vivado IDE, select Tools → Timing → Report Timing Summary.
The resulting report allows you to cross-probe among the various views of the design.
For more information on analyzing reports and strategies for design closure, see the Vivado
Design Suite User Guide: Design Analysis and Closure Techniques (UG906).
Modifying Placement
The Vivado tools track two states for placed cells, Fixed and Unfixed, which describes the way in
which the Vivado tools view placed cells in the design.
Fixed Cells
Fixed cells are those that you have placed yourself, or the location constraints for the cells have
been imported from an XDC file.
Unfixed Cells
Unfixed cells have been placed by the Vivado tools in implementation, during the
place_design command, or on execution of one of optimization commands.
• The Vivado Design Suite treats these placed cells as Unfixed (or loosely placed).
• These cells can be moved by the implementation tools as needed in design iterations.
• The LUT in the following figure is shown in blue (default) to indicate that it is Unfixed.
Both LOCS and BELS can be fixed. The placement above generates the following constraints:
There is no placement constraint on the LUT. Its placement is unfixed, indicating that the
placement should not go into the XDC.
For more information on Tcl commands, see the Vivado Design Suite Tcl Command Reference Guide
(UG835), or type <command> -help.
TIP: When dragging logic to a location in the Device Window, the GUI allows you to drop the logic only on
legal locations. If the location is illegal (for example, because of control set restriction for Slice FFs), the
logic does not "snap" to the new location in the Device view, and it cannot be assigned.
Hand-placing logic can be slow, and used in specific situations only. The constraints are fragile
with respect to design changes because the cell name is used in the constraint.
TIP: When assigning logic to an illegal location (for example, because of control set restriction for Slice
FFs), the Tcl Console issues an error message, and the assignment is ignored.
Cells that have been placed using the place_cell Tcl command are treated as Fixed by the
Vivado tool.
Modifying Routing
The Device View allows you to modify the routing for your design. You can Unroute, Route, and
Fix Routing on any individual net.
TIP: All net commands are available from the context menu on a net.
Manual Routing
Manual routing allows you to select specific routing resources for your nets. This gives you
complete control over the routing paths that a signal is going to take. Manual routing does not
invoke route_design. Routes are directly updated in the route database.
You might want to use manual routing when you want to precisely control the delay for a net. For
example, assume a source synchronous interface, in which you want to minimize routing delay
variation to the capture registers in the device. To accomplish this, you can assign LOC and BEL
constraints to the registers and I/Os, and then precisely control the route delay from the IOB to
the register by manual routing the nets.
Manual routing requires detailed knowledge of the device interconnect architecture. It is best
used for a limited number of signals and for short connections.
• The driver and the load require a LOC constraint and a BEL constraint.
• Branching is not allowed during manual routing, but you can implement branches by starting a
new manual route from a branch point.
• LUT loads must have their pins locked.
• You must route to loads that are not already connected to a driver.
• Only complete connections are permitted. Antennas are not allowed.
• Overlap with existing unfixed routed nets is allowed. Run route_design after manual
routing to resolve any conflicts due to overlapping nets.
You are now in Manual Routing Mode. A Routing Assignment window, shown in the following
figure, appears next to the Device window.
The Routing Assignment window is divided into the Options, Assigned Nodes, and Neighbor
Nodes sections:
• The Options section, shown in the following figure, controls the settings for the Routing
Assignment window.
○ The Number of hops value allows you to specify the number of routing hops that can be
assigned for neighbor nodes. This also affects the Neighbor Nodes displayed. If the number
of hops is greater than 1, only the last node of the route is displayed in the Neighbor Nodes
section.
○ The Maximum number of neighbors value allows you to limit the number of neighbor
nodes that are displayed in the Neighbor Nodes section. Only the last node of the route is
displayed.
○ The Allow overlap with unfixed nets switch controls whether overlaps of assigned routing
with existing unfixed routing is allowed. Any overlaps need to be resolved by running the
route_design command after fixed route assignment.
The Options section is hidden by default. To show the Options section, click Show.
• The Assigned Nodes section shows the nodes that already have assigned routing. Each
assigned node is displayed as a separate line item.
In the Device window, nodes with assigned routing are highlighted in orange. Any gaps
between assigned nodes are shown in the Assigned Nodes section as a GAP line item. To
auto-route gaps:
○ Right-click a net gap in the Assigned Nodes section.
To assign the next routing segment, select an assigned node before or after a gap, or the last
assigned node in the Assigned Nodes section.
• The Neighbor Nodes section (shown in the following section) displays the allowed neighbor
nodes, highlights the current selected nodes (in white). and highlights the allowed neighbor
nodes (white dotted) in the Device window.
• Right-click the node in the Neighbor Nodes section and select Assign Node.
• Double-click the node in the Neighbor Nodes section.
• Click the node in the Device View.
After you have assigned routing to a Neighbor Node, the node is displayed in the assigned nodes
section and highlighted in orange in the Device View.
Assign nodes until you have reached the load, or until you are ready to assign routing with a gap.
The Assign Routing dialog box is displayed, as shown in the following figure, allowing you to
verify the assigned nodes before they are committed.
When the routes are committed, the driver and load BEL and LOC are also fixed.
The following figure shows an example of an assigned and partially assigned route.
Branching
When assigning routing to a net with more than one load, you must route the net in the following
steps:
1. Assign routing to one load following the steps provided in Entering Assign Routing Mode.
2. Assign routing to all the branches of the net.
The following figure shows an example of a net that has assigned routing to one load and
requires routing to two additional loads.
1. Go to Device window.
2. Select the net to be routed.
3. Right-click and select Enter Assign Routing Mode.
The Assign Routing Mode: Target Load Cell Pin window opens, showing all loads.
Note: The loads that already have assigned routing have a checkmark in the Routed column of the
table.
6. Select the node from which you want to branch off the route for your selected load.
7. Click OK.
8. Follow the steps shown in Assigning Routing Nodes.
To prevent pin swapping in Physical Synthesis in the Placer, a DONT_TOUCH constraint needs to
be applied to the LUT cell. The Tcl command is:
For nets that have fixed routing and multiple LUT loads, the following Tcl script can be used to
lock the cell inputs of all the LUT loads.
For example, consider the route described in the following figure. In this simplified illustration of
a route, the various elements are indicated as shown in the following table (Directed Routing
Constraints).
Elements Indicated By
Driver and Loads Orange Rectangles
Nodes Red lines
Switchboxes Blue rectangles
{A B { D E T } C { F G H I M N } {O P Q} R J K L S }
E T
L1
D I M N
L2
A B H
D
G
C
O P
R Q
L3
J
K L S
L4
X16628-040716
For partially routed nets, the nodes can be found associated directly to the net. Refer to the
Vivado Design Suite Properties Reference Guide (UG912) for more information on the relationship
between these objects.
• A list of nodes representing the route path found from the start point to the end point.
Modifying Logic
Properties on logical objects that are not Read Only can be modified after Implementation in the
Vivado IDE as well as Tcl.
Note: For more information about Tcl commands, see the Vivado Design Suite Tcl Command Reference Guide
(UG835), or type <command> -help.
These properties can include everything from block RAM INITs to the clock modifying
properties on MMCMs. There is also a special dialog box to set or modify INIT on LUT
objects. This dialog box allows you to specify the LUT equation and have the tools determine
the appropriate INIT.
Saving Modifications
• To capture the changes to the design made in memory, write a checkpoint of the design.
Because the assignments are not back-annotated to the design, you must add the assignments
to the XDC for them to impact the next run.
• To save the constraints to your constraints file in Project Mode, select File → Constraints →
Save.
• create_port
• remove_port
• create_cell
• remove_cell
• create_pin
• remove_pin
• create_net
• remove_net
• connect_net
• disconnect_net
Note: For more information about these Tcl commands, see the Vivado Design Suite Tcl Command Reference
Guide (UG835), or type <command> -help.
TIP: The Vivado tools allows you to make netlist changes unconditionally using the netlist modifying
commands. However, logical changes can lead to invalid physical implementation. It is recommended to
run DRCs after performing your netlist changes. In addition, DRCs are run as part of the process of adding
the logical changes to the physical implementation. These DRCs flag any invalid netlist changes or new
physical restrictions that need to be addressed before physical implementation can commence.
Logical changes are reflected in the schematic view as soon as the netlist modifying commands
are executed. The following figure shows an example of a cell that was created using a LUT1 as a
reference cell.
When the output of the LUT1 is connected to an OBUF, the schematic reflects this change
showing the ECO_INV/O pin no longer with a "no-connect." The following figure shows the
resulting schematic view.
Use Cases
The following examples show some of the most common use cases for netlist modifications. The
examples show the schematic of the original logical netlist, list the netlist modifying Tcl
commands, and show the schematic of the resulting modified netlist.
The following Tcl commands show how to add an inverter between the output of the FDRE and
the OBUF:
In this example script, LUT1 cell ECO_INV is created, and the INIT value is set to 2'h1, which
implements an inversion. The net between the FDRE and OBUF is disconnected from the Q
output pin of the FDRE, and the output of the inverting LUT1 cell ECO_INV is connected to the I
input pin of the OBUF. Finally, a net is created and connected between the Q output pin of the
FDRE and the I0 input pin of the inverting LUT1 cell.
The following figure shows the schematic of the resulting logical netlist changes.
After the netlist has been successfully modified, the logical changes must be implemented. The
LUT1 cell must be placed, and the nets to and from the cell routed. This must occur without
modifying placement or routing of parts of the design that have not been modified. The Vivado
implementation commands automatically use incremental mode when place_design is run on
the modified netlist, and the log file reflects that by showing the Incremental Placement
Summary:
+--------------------------------------------------+
|Incremental Placement Summary |
+--------------------------------------------------+
| Type | Count | Percentage |
+--------------------------+----------+------------+
| Total instances | 3834 | 100.00 |
| Reused instances | 3833 | 99.97 |
| Non-reused instances | 1 | 0.03 |
| New | 1 | 0.03 |
+--------------------------+----------+------------+
To preserve existing routing and route only the modified nets, use the route_design
command. This incrementally routes only the changes, as you can see in the Incremental Routing
Reuse Summary in the log file:
+--------------------------------------------------+
|Incremental Routing Reuse Summary |
+--------------------------------------------------+
|Type | Count | Percentage |
+---------------------+-----------+----------------+
|Fully reused nets | 6401| 99.97 |
|Partially reused nets| 0| 0.00 |
|Non-reused nets | 2| 0.03 |
+---------------------+-----------+----------------+
Instead of automatically placing and routing the modified netlist using the incremental
place_design and route_design commands, the logical changes can be committed using
manual placement and routing constraints. For more information see the Modifying Placement
and Modifying Routing sections earlier in this chapter.
The following Tcl script shows how to add a port to the existing design and route the internal
signal to the newly created port.
• Creates an OBUF that drives the debug port through net ECO_OBUF1_out.
• Creates a net to connect the output of the demuxState_reg register to the input of the OBUF.
The following figure shows the schematic of the resulting logical netlist changes.
After the netlist has been successfully modified, the logical changes must be implemented.
Because the port has been assigned to a package pin, the OBUF driving the port is automatically
placed in the correct location. Therefore, the placer does not have anything to place and
therefore incremental compile is not triggered when running place_design followed by
route_design. To route the newly added net that connects the internal signal to the OBUF
input, use the route_design -nets command or route the net manually to avoid a full
route_design pass which might change the routing for other nets. Alternatively, you can run
route_design -preserve, which preserves existing routing. See Using Other route_design
Options.
The following Tcl script shows how to insert a pipeline register between the two LUT6 cells. The
register is implemented with the same control signals as the load register.
{egressLoop[4].egressFifo/buffer_fifo/
infer_fifo.block_ram_performance.fifo_ram_reg/DOBDO[ 29]}]
connect_net -hierarchical -net {egressLoop[4].egressFifo/buffer_fifo/
dout2_in[29]}
-objects [list \ {ECO_pipe_stage[29]/Q}]
The following figure shows the schematic of the resulting logical netlist changes.
After the netlist has been successfully modified, the logical changes must be committed.
Accomplish this using the place_design and route_design commands.
Engineering change orders (ECOs) are modifications to the post implementation netlist with the
intent to implement the changes with minimal impact to the original design. Vivado provides an
ECO flow, which allows you to modify a design checkpoint, implement the changes, run reports
on the changed netlist, and generate programming files.
The advantage of the ECO flow is fast turn-around time by taking advantage of the incremental
place and route features of the Vivado tool.
The Vivado IDE provides a predefined layout to support the ECO flow. To access the ECO
Layout, select Layout → ECO.
ECO Navigator
The ECO Navigator provides access to the commands that are required to complete an ECO.
Scratch Pad
The scratch pad tracks netlist changes and place and route status for Cells, Pins, Ports, and Nets.
Otherwise you can skip straight to Incremental route. After that you can save your changes to a
new checkpoint and write new programming and debug probe files and Open the Hardware
manger to program your device. If you are satisfied with your changes you can incorporate them
into your original design. Otherwise, you can start at the beginning of the ECO flow until the
design is working as expected.
Open DCP
Modify Netlist
Fully No
Incr. Place
Placed?
Yes
Incr. Route
No
Design
Working?
Yes
TIP: When you re-run implementation in project mode the results in the previous run directory will be
deleted. Save the ECO checkpoint to a new directory or create a new implementation run for your
subsequent compile to preserve the changes to the ECO checkpoint.
Edit Section
The Edit section of the ECO Navigator (shown in the below figure) provides access to all the
commands that are required to modify the netlist.
• Create Net: Opens the Create Net dialog box, which allows you to create new nets in the
current loaded design. Nets can be created hierarchically from the top level of the design, or
within any level of the hierarchy by specifying the hierarchical net name. Bus nets can be
created with increasing or decreasing bus indexes, using negative and positive index values. To
create a bus net, turn on Create bus and specify the beginning and ending index values.
If you select a pin or port, you can have the newly created net automatically connect to them
by selecting the Connect selected pins and ports check box.
• Create Cell: Opens the Create Cell dialog box, which allows you to add cells to the netlist of
the currently loaded design. You can add new cell instances to the top- level of the design, or
hierarchically within any module of the design. Instances can reference an existing cell from
the library or design source files, or you can add a black box instance that references cells that
have not yet been created. If a LUT cell is created, you can specify a LUT equation in the
Specify LUT Equation dialog box by selecting it.
• Create Port: Opens the Create Port dialog box, in which you can create a port and specify
such parameters as direction, width, single-ended, or differential. New ports are added at the
top level of the design hierarchy. You can create bus ports with increasing or decreasing bus
indexes, using negative and positive index values. You can also specify I/O standard, pull type,
and ODT type. When a Location is specified, the port is assigned to a package pin.
• Create Pin: Opens the Create Pin dialog box, which allows you to add single pins or bus pins
to the current design. You can define attributes of the pin, such as direction and bus width, as
well as the pin name. You can create bus pins with increasing or decreasing bus indexes, using
negative and positive index values. A pin must be created on an existing cell instance, or it is
considered a top-level pin, which should be created using the create_port command. If the
instance name of a cell is not specified, the pin cannot be created.
• Connect Net: The selected pin or port is connected to the selected net. If a net is not selected,
the Connect Net dialog box opens, which allows you to specify a net to connect to the
selected pins or ports in the design. The window displays a list of nets at the current selected
level of hierarchy that can be filtered dynamically by typing a net name in the search box. The
selected net will be connected across levels of hierarchy in the design, by adding pins and
hierarchical nets as needed to complete the connection.
• Disconnect Net: Disconnects the selected net, pin, port or cell from the net in the current
design. If a cell is selected, all nets connected to that cell will be disconnected.
• Replace Debug Probes: Opens the Replace Debug Probes dialog box, if a debug core has
previously been inserted into the design. The Replace Debug Probes dialog box contains
information about the nets that are probed in your design using the ILA and/or VIO cores. You
can modify the nets that are connected to the debug probe by clicking the icon next to the net
name in the Probe column. This opens the Choose Nets dialog box, which allows you to select
a new net to connect to the debug probe.
• Place Cell: Places the selected cell onto the selected device resource.
• Unplace Cell: Unplaces the selected cell from its current placement site.
• Delete Objects: Deletes the selected objects from the current design.
Run Section
The Run section of the ECO Navigator, shown in the figure below, provides access to all the
commands required to implement the current changes.
• Check ECO: Runs the ECO checks rule deck on the current design.
TIP: The Vivado tools allows you to make netlist changes unconditionally using the ECO commands.
However, logical changes can lead to invalid physical implementation. Run the Check ECO function to
flag any invalid netlist changes or new physical restrictions that need to be addressed before physical
implementation can commence.
• Optimize Logical Design: In some cases, it is desirable to run opt_design on the modified
design to optimize the netlist. This command opens the Optimize Logical Design dialog box,
allowing you to specify options for the opt_design command. Any options that are entered
in the dialog box are appended to the opt_design command as they are typed. For example,
to run opt_design -sweep, type -sweep under Options.
• Place Design: Runs incremental place_design on the modified netlist as long as 75% or
more of the placement can be reused. The Incremental Placement Summary at the end of
place_design provides statistics on incremental reuse. Selecting this command opens the
Place Design dialog box and allows you to specify options for the place_design command.
Any options that are entered in the dialog box are appended to the place_design
command as they are typed.
• Route Design: Selecting this command opens the Route Design dialog box. Depending on the
selection, this command allows you to perform an Incremental Route of the modifications
made to the design, Route the selected pin, or Route selected nets. If Incremental Route is
selected on a modified netlist that has less than 75% of reused nets, the tool reverts to the
non-incremental route_design.
Depending on your selection, you have four options to route the ECO changes:
Report Section
The Report Section of the ECO Navigator, shown in the figure below, provides access to all the
commands that are required to run reports on the modified design.
For more information on these commands, refer to the Vivado Design Suite User Guide: Using the
Vivado IDE (UG893).
Program Section
The Program section of the ECO Navigator, shown in the figure below, provides access to the
commands that allow you to save your modifications, generate a new BIT file for programming
and a new LTX file for your debug probes, and program the device.
• Save Checkpoint As: This command allows you to save your modifications to a new
checkpoint.
• Generate Bitstream: This command allows you to generate a new .bit file for programming.
• Write Debug Probes: This command allows you to generate a new .ltx file for your debug
probes. If you made changes to your debug probes using the Replace Debug Probes
command, you need to save the updated information to a new debug probes file (LTX) to
reflect the changes in the Vivado Hardware Manager.
Scratch Pad
The Scratch Pad is updated as changes are made to the loaded checkpoint. See the following
figure. The Object Name column displays hierarchical names of Cells, Nets, Ports, and Pins. The
Connectivity (Con) column tracks the connectivity of the objects and the Place and Route (PnR)
column tracks the place and route status of the objects. In the Scratch Pad shown in the
following figure, notice that check marks in the Con and PnR columns identify connectivity and
place/route status. Looking at this figure, you can identify the following:
• The port ingressFifoWrEn_debug has been added and assigned to a package pin.
• The net ingressFifoWrEn has been connected to the newly created Port, but the connection
has not yet been routed to the port.
• Collapse All: Displays objects by groups, and does not display individual members of the
group.
• Group by Type: Displays the objects by type, or in the order they have been added.
• Remove selected objects: Removes selected objects from the Scratch Pad.
• Add Objects to Scratch Pad: Adds unconnected, unplaced, or unrouted objects to the Scratch
Pad.
• Select Array Elements: Selects all the elements in an array if one element has been selected.
• Connect Net to Output Port: Opens the Connect Net to Output Port dialog box, which allows
you to connect the selected net to an external port. See the following figure.
• Elide Setting: Specifies how to truncate long object names that do not fit in the Object Name
column. Choices are Left, Middle, and Right.
• Report Net Route Status: Reports the route status of the selected net.
• Select Driver Pin: Selects the driver pin of the selected net.
• Configure I/O Ports: Assigns various properties of the selected I/O ports.
• Split Diff Pair: Removes the differential pair association from the selected port.
• Auto-place I/O Ports: Places I/O ports using the Autoplace I/O Ports wizard.
• Place I/O Ports in Area: Assigns the currently selected ports onto pins in the specified area.
• Place I/O Ports Sequentially: Assigns the currently selected ports individually onto package
pins.
• Highlight Leaf Cells: Highlights the primitive logic for the selected cell.
• Unhighlight Leaf Cells: Unhighlights the primitive logic for the selected cell.
• Find: Opens the Find dialog box to find objects in the current design or device by filtering Tcl
properties and objects.
• Export to Spreadsheet: Writes the contents of the Scratch Pad to a Microsoft Excel
spreadsheet.
Schematic Window
Logical changes are reflected in the schematic view as soon as the netlist is changed. The
following figure shows an updated schematic based on the netlist changes.
TIP: Use the Mark Objects and Highlights Objects command to help you keep track of objects in the
Schematic Window as you make changes to the netlist.
Appendix A
Overview
The AMD Vivado™ Integrated Design Environment (IDE) supports simultaneous parallel
execution of synthesis and implementation runs on multiple Linux hosts. You can accomplish this
manually by configuring individual hosts or by specifying the commands to launch jobs on
existing compute clusters.
Currently Linux is the only operating system Vivado supports for remote host configurations.
Remote host settings are accessible through the Tools menu by selecting Tools → Settings →
Remote Hosts.
Requirements
The requirements for launching synthesis and implementation runs on remote Linux hosts are:
• Vivado tools installation is assumed to be available from the login shell, which means that
$XILINX_VIVADO and $PATH are configured correctly in your .cshrc/.bashrc setup scripts.
$PATH is used by the shell to find the vivado executable while $XILINX_VIVADO is used by
some XILINX tools to obtain the vivado executable path. It is best to set both of these
environment variables to the vivado executable in your .cshrc/.bashrc setup scripts.
Alternatively, for Manual Configuration, if you do not have Vivado set up upon login (CSHRC
or BASHRC), use the Run pre-launch script option, described below, to define an environment
setup script to be run prior to all jobs.
• Vivado IDE installation must be visible from the mounted file systems on remote machines. If
the Vivado IDE installation is stored on a local disk on your own machine, it might not be
visible from remote machines.
• Vivado IDE project files (.xpr) and directories (.data and .runs) must be visible from the
mounted file systems on remote machines. If the design data is saved to a local disk, it might
not be visible from remote machines.
Manual Configuration
Manual configuration of remote hosts allows you to specify individual machine names on which
Vivado can execute. Vivado will open a Secure Shell (SSH) on these machines and spawn
additional Vivado processes. Host names can be added by clicking the add button shown in the
following figure. Once added, the number of jobs per host can selected and hosts can optionally
be disabled. The specific command used to launch the jobs must be provided.
Optionally, users can configure pre- and post-launch scripts and an email address if you desire to
be notified once the jobs complete.
IMPORTANT! Use caution when specifying the “launch jobs with” command. For example, removing
BatchMode=yes might cause the remote process to hang because the Secure Shell incorrectly prompts for
an interactive password.
RECOMMENDED: Test each host to ensure proper setup before submitting runs to the host.
A “greedy,” round-robin style algorithm is used to submit jobs to the remote hosts. Before
launching runs on multiple Linux hosts it is important to configure SSH so that the host does not
require a password each time you launch a remote run.
Note: This is a one-time step. When successfully set-up, this step does not need to be repeated.
1. Run the following command at a Linux terminal or shell to generate a public key on your
primary machine. Though not required, it is a good practice to enter (and remember) a private
key phrase when prompted for maximum security.
ssh-keygen -t rsa
2. Append the contents of your publish key to an authorized_keys file on the remote
machine. Change remote_server to a valid host name:
cat ~/.ssh/id_rsa.pub | ssh remote_server “cat - >> ~/.ssh/
authorized_keys”
3. Run the following command to prompt for your private key pass phrase, and enable key
forwarding:
ssh-add
You should now be able to ssh to any machine without typing a password. The first time you
access a new machine, it prompts you for a password. It does not prompt upon subsequent
access.
TIP: If you are always prompted for a password, contact your System Administrator.
Cluster Configurations
Compute Clusters are groups of machines configured through third party tools that accept jobs,
schedule them, and efficiently allocate the compute resources. Common compute clusters
include LSF, SGE and SLURM. To add custom compute clusters to Vivado, you can click the plus
tool bar button shown in figure def and provide a name for the cluster configuration. You then
need to specify the command necessary to submit a job to the cluster, cancel a job on the cluster,
and the cluster type. Vivado natively supports LSF, SGE, and SLURM. For any other cluster, you
can choose CUSTOM in the combo box. For CUSTOM cluster, you should provide path of the Tcl
file, which contains logic to fetch Job ID and return Job ID value to the proc. This proc name
should be used to populate the field Job ID Proc. Job ID Tcl and Job ID Proc can be left empty for
natively supported clusters, that is, LSF, SGE, and SLURM. The configuration can be tested by
pressing the test configuration button.
In this example, the client machine name is xcolc200189, the scheduler machine name is
xcolc200185.
1. Set up SSH keys on client and scheduler to enable ssh without password.
2. Start Vivado on the client machine.
f. In the Launch Runs dialog box, choose Launch runs on cluster and in the combo box,
select the custom cluster name created above.
h. In a terminal, ssh into the scheduler machine and check to see the job running using the
squeue command on the scheduler machine.
i. See the job complete successfully in the Vivado session running on the client.
Appendix B
Implementation Categories,
Strategy Descriptions, and Directive
Mapping
Implementation Categories
Table 14: Implementation Categories
Category Purpose
Performance Improve design performance
Area Reduce LUT count
Power Add full power optimization
Flow Modify flow steps
Congestion Reduce congestion and related problems
STEPS.<STEP>_DESIGN.ARGS.DIRECTIVE
Where <STEP> is one of SYNTH, OPT, PLACE, PHYS_OPT, or ROUTE. This property is an enum
type, so all supported values can be returned using list_property_value.
Following is an example:
The following Tcl example shows how to list the directives for each synthesis and implementation
command using a temporary, empty project:
Appendix C
The AMD Technical Information Portal is an online tool that provides robust search and
navigation for documentation using your web browser. To access the Technical Information
Portal, go to https://docs.amd.com.
Documentation Navigator
Documentation Navigator (DocNav) is an installed tool that provides access to AMD Adaptive
Computing documents, videos, and support resources, which you can filter and search to find
information. To open DocNav:
• From the AMD Vivado™ IDE, select Help → Documentation and Tutorials.
• On Windows, click the Start button and select Xilinx Design Tools → DocNav.
• At the Linux command prompt, enter docnav.
Note: For more information on DocNav, refer to the Documentation Navigator User Guide (UG968).
Design Hubs
AMD Design Hubs provide links to documentation organized by design tasks and other topics,
which you can use to learn key concepts and address frequently asked questions. To access the
Design Hubs:
Support Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see Support.
References
These documents provide supplemental material useful with this guide:
Training Resources
AMD provides a variety of training courses and QuickTake videos to help you learn more about
the concepts presented in this document. Use these links to explore related training resources:
Revision History
The following table shows the revision history for this document.
information and to make changes from time to time to the content hereof without obligation of
AMD to notify any person of such revisions or changes. THIS INFORMATION IS PROVIDED "AS
IS." AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE
CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES,
ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY
DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR
FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY
PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL
DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF
AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Copyright
© Copyright 2012-2024 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, UltraScale,
UltraScale+, Vivado, Versal, and combinations thereof are trademarks of Advanced Micro
Devices, Inc. Other product names used in this publication are for identification purposes only
and may be trademarks of their respective companies.