RC DP
RC DP
RC DP
Compiler
Product Version 11.2
March 2012
2003-2012 Cadence Design Systems, Inc. All rights reserved.
Printed in the United States of America.
Cadence Design Systems, Inc. (Cadence), 2655 Seely Ave., San Jose, CA 95134, USA.
Open SystemC, Open SystemC Initiative, OSCI, SystemC, and SystemC Initiative are trademarks or registered
trademarks of Open SystemC Initiative, Inc. in the United States and other countries and are used with
permission.
Trademarks: Trademarks and service marks of Cadence Design Systems, Inc. contained in this document are
attributed to Cadence with the appropriate symbol. For queries regarding Cadences trademarks, contact the
corporate legal department at the address shown above or call 800.862.4522. All other trademarks are the
property of their respective holders.
Restricted Permission: This publication is protected by copyright law and international treaties and contains
trade secrets and proprietary information owned by Cadence. Unauthorized reproduction or distribution of this
publication, or any portion of it, may result in civil and criminal penalties. Except as specified in this permission
statement, this publication may not be copied, reproduced, modified, published, uploaded, posted, transmitted,
or distributed in any way, without prior written permission from Cadence. Unless otherwise agreed to by
Cadence in writing, this statement grants Cadence customers permission to print one (1) hard copy of this
publication subject to the following conditions:
1. The publication may be used only in accordance with a written agreement between Cadence and its
customer.
2. The publication may not be modified in any way.
3. Any authorized copy of the publication or portion thereof must include all original copyright, trademark,
and other proprietary notices and this permission statement.
4. The information contained in this document cannot be used in the development of like products or
software, whether for internal or external use, and shall not be used for the benefit of any other party,
whether or not for consideration.
Patents: Cadence Product Encounter RTL Compiler described in this document, is protected by U.S. Patents
[5,892,687]; [6,470,486]; 6,772,398]; [6,772,399]; [6,807,651]; [6,832,357]; [7,007,247]; and [8,127,260]
Disclaimer: Information in this publication is subject to change without notice and does not represent a
commitment on the part of Cadence. Except as may be explicitly set forth in such agreement, Cadence does
not make, and expressly disclaims, any representations or warranties as to the completeness, accuracy or
usefulness of the information contained in this document. Cadence does not warrant that use of such
information will not infringe any third party rights, nor does Cadence assume any liability for damages or costs
of any kind that may result from use of such information.
Restricted Rights: Use, duplication, or disclosure by the Government is subject to restrictions as set forth in
FAR52.227-14 and DFAR252.227-7013 et seq. or its successor
Datapath Synthesis in Encounter RTL Compiler
Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
List of Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
About This Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Additional References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
How to Use the Documentation Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Reporting Problems or Errors in Manuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Customer Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Cadence Online Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Other Support Offerings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Man Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Command-Line Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Getting the Syntax for a Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Getting the Syntax for an Attribute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Searching for Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Searching For Commands When You Are Unsure of the Name . . . . . . . . . . . . . . . . 21
Documentation Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Text Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1
Datapath Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Overview of Datapath Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Supported HDL Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Sharing Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Controlling Sharing Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Speculation Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Controlling Speculation Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2
Datapath Reporting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Overview of Datapath Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Command Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Interpreting the Datapath Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Module and Instance Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Operator Type and Signed Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Width of Input and Output Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
File Name, Line Number, and Column Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Report Datapath for RTL Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Using the report datapath Command at Different Stages in the Design Flow . . . . . . . . . 61
elaboration Design Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
synthesize -to_generic Design Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
synthesize -to_mapped Design Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3
RC-Datapath (RC-DP) Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Datapath Function Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Primitive Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
$abs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
$blend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
$carrysave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
$intround . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
$lead0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
$lead1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
$rotatel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
$rotater . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
$round . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Parameter Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
ChipWare Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4
Datapath Coding Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Starting from the RTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Importing the Gate-Level Netlist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Keeping Relevant Operators in the Same Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Inferring Datapath Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Applying Carry-save Arithmetic Automatically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Inferring a Constant Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Using Signed Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Using Signed Part-Select / Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Implementing Programmable Unsigned and Signed Multiplier . . . . . . . . . . . . . . . . . . 91
Mixing Unsigned and Signed Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Avoiding Manual Signal Extension Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
VHDL Signed/Unsigned Type Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Inferring a Square Automatically . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Implying Upper-Bit Truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Following Self-Determined Bit Width Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Avoiding Instantiated Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Arithmetically Complementing an Operand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
List of Figures
Figure 1-1 Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Figure 1-2 Speculation (Unsharing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Figure 1-3 Carry-save Transformation on the Critical Path. . . . . . . . . . . . . . . . . . . . . . . . . 32
Figure 1-4 Carry-save Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Figure 1-5 Timing-Driven Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
List of Examples
Example 1-1 HDL description of a multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Example 1-2 HDL description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Example 1-3 Merging Across HDL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Example 1-4 Merging Product-of-Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Example 1-5 Merging a Comparator with Arithmetic Operators . . . . . . . . . . . . . . . . . . . . . 37
Example 1-6 Merging One Upstream Operator to Multiple Downstream Operators . . . . . 37
Example 1-7 Transforming CSA over a Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Example 1-8 Transforming CSA over an Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Example 1-9 Arithmetic with Lower-Bit Truncation, Truncation after Addition . . . . . . . . . . 39
Example 1-10 Instantiated Operators Cannot Be Merged . . . . . . . . . . . . . . . . . . . . . . . . . 40
Example 1-11 Inferred Operators Can Be Merged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Example 1-12 Gate-Level Netlist Cannot Be Merged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Example 1-13 Non-Interacting Operators Cannot Be Merged . . . . . . . . . . . . . . . . . . . . . . 43
Example 2-1 report datapath Header Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Example 2-2 Area Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Example 2-3 Design with Datapath Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Example 2-4 Tcl Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Example 2-5 Results of report datapath After Using the Elaborate Command . . . . . . . . . 55
Example 2-6 Report datapath Results after Using synthesize -to_generic . . . . . . . . . . . . 56
Example 2-7 Verilog Design with Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Example 2-8 report datapath for RTL Sharing that Shows the Mux Operator . . . . . . . . . . 60
Example 3-1 $abs Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Example 3-2 $abs Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Example 3-3 $blend Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Example 3-4 $blend Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Example 3-5 $carrysave Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Example 3-6 $intround Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Example 3-7 $lead0 Function Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Preface
Additional References
The following sources are helpful references, but are not included with the product
documentation:
TclTutor, a computer aided instruction package for learning the Tcl language:
http://www.msen.com/~clif/TclTutor.html.
TCL Reference, Tcl and the Tk Toolkit, John K. Ousterhout, Addison-Wesley
Publishing Company
IEEE Standard Hardware Description Language Based on the Verilog Hardware
Description Language (IEEE Std.1364-1995)
IEEE Standard Hardware Description Language Based on the Verilog Hardware
Description Language (IEEE Std. 1364-2001)
IEEE Standard VHDL Language Reference Manual (IEEE Std. 1076-1987)
IEEE Standard VHDL Language Reference Manual (IEEE Std. 1076-1993)
Note: For information on purchasing IEEE specifications go to http://shop.ieee.org/store/ and
click on Standards.
README File
README File
NEW FEATURES AND
SOLUTIONS TO PROBLEMS
Whats New in Encounter RTL
Compiler
HDL Modeling in Encounter RTL ChipWare Developer in Library Guide for Encounter RTL
Compiler Encounter RTL Compiler Compiler
Setting Constraints
Datapath Synthesis in Low Power in Design for Test in
and Performing Timing
Encounter RTL Encounter RTL Encounter RTL
Analysis in Encounter
Compiler Compiler Compiler
RTL Compiler
REFERENCES
Attribute Reference Command Reference ChipWare in GUI Guide for Quick Reference for
for Encounter RTL for Encounter RTL Encounter RTL Encounter RTL Encounter RTL
Compiler Compiler Compiler Compiler Compiler
Customer Support
Cadence offers live and online support, as well as customer education and training programs.
http://support.cadence.com
http://www.cadence.com/support
Messages
From within RTL Compiler there are two ways to get information about messages.
Use the report messages command.
For example:
rc:/> report messages
This returns the detailed information for each message output in your current RTL
Compiler run. It also includes a summary of how many times each message was issued.
Use the man command.
Note: You can only use the man command for messages within RTL Compiler.
For example, to get more information about the TIM-11 message, type the following
command:
rc:/> man TIM-11
If you do not get the details that you need or do not understand a message, contact Cadence
Customer Support to file a CCR.
Man Pages
In addition to the Command and Attribute References, you can also access information about
the commands and attributes using the man pages in RTL Compiler. Man pages contain the
same content as the Command and Attribute References. To use the man pages from the
UNIX shell:
1. Set your environment to view the correct directory:
setenv MANPATH $CDN_SYNTH_ROOT/share/synth/man
2. Enter the name of the command or attribute that you want either in RTL Compiler or
within the UNIX shell. For example:
man check_dft_rules
man cell_leakage_power
You can also use the more command, which behaves like its UNIX counterpart. If the output
of a manpage is too small to be displayed completely on the screen, use the more command
to break up the output. Use the spacebar to page forward, like the UNIX more command.
rc:/> more report timing
Command-Line Help
You can get quick syntax help for commands and attributes at the RTL Compiler command-
line prompt. There are also enhanced search capabilities so you can more easily search for
the command or attribute that you need.
Note: The command syntax representation in this document does not necessarily match the
information that you get when you type help command_name. In many cases, the order of
the arguments is different. Furthermore, the syntax in this document includes all of the
dependencies, where the help information does this only to a certain degree.
If you have any suggestions for improving the command-line help, please e-mail them to:
[email protected]
For example:
rc:/> get_attribute max_transition * -help
Type a sequence of letters after the set_attribute command and press Tab to get a
list of all attributes that contain those letters. For example:
rc:/> set_attr li
ambiguous "li": lib_lef_consistency_check_enable lib_search_path libcell
liberty_attributes libpin library library_domain line_number
You can type a sequence of letters and press Tab to get a list of all commands that start
with those letters.
For example:
rc:/> path_<Tab>
Documentation Conventions
... Three dots (...) indicate that you can repeat the previous
argument. If the three dots are used with brackets (that is,
[argument]...), you can specify zero or more arguments. If
the three dots are used without brackets (argument...), you
must specify at least one argument.
1
Datapath Optimization
This chapter describes datapath optimization performed in RTL Compiler, as well as how to
optimize RTL code to improve synthesis quality of results of datapath designs.
Sharing Transformations
Sharing is an optimization technique that reclaims design area (reduces gate count) without
degrading design performance. This optimization is based on the principle that two similar
arithmetic operations can be performed on one arithmetic hardware component if they are
never used at the same time.
This technique shares hardware resources across portions of the design. During the
synthesis flow, resource sharing is performed automatically to reduce area. Sharing is
performed by merging two datapath modules into one, which guarantees that the design
timing constraints are not violated. This results in a design with smaller area and possibly
reduced delay.
Example 1-1 on page 25 shows the HDL description of a multiplexer with two multipliers and
a corresponding diagram. After sharing, there is only one multiplier on the output of the
multiplexer as shown in Figure 1-1 on page 26.
Verilog
module sharing_example (a, b, c, d, cond, y);
parameter w = 16;
input [w-1:0] a, b, c, d;
input cond;
output [w*2-1:0] y;
wire [w*2-1:0] a_times_b = a * b;
wire [w*2-1:0] c_times_d = c * d;
assign y = cond ? a_times_b : c_times_d;
endmodule
VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity sharing_example is
generic (w : natural := 16);
port ( a, b, c, d : in unsigned (w-1 downto 0);
cond : in std_logic;
y : out unsigned (w*2-1 downto 0) );
end sharing_example;
architecture rtl of sharing_example is
signal a_times_b, c_times_d : unsigned (w*2-1 downto 0);
begin
a_times_b <= a * b;
c_times_d <= c * d;
y <= a_times_b when (cond = 1) else c_times_d;
end rtl;
a b c d a c b d
Cond MUX
*
out
out
Implementation of HDL without sharing Implementation of HDL after sharing optimization
To turn off RTL sharing transformations, use the dp_sharing root attribute.
rc:/> set_attribute dp_sharing none /
Default: basic
To turn off sharing transformations within the design, use the dp_sharing design
attribute.
rc:/> set_attribute dp_sharing none [find /designs* -design name]
Default: inherited
To turn off sharing transformations within a particular subdesign, use the dp_sharing
attribute.
rc:/> set_attribute dp_sharing none [find /designs* -subdesign name]
Default: inherited
See Report Datapath for RTL Sharing on page 60 for an example report.
Speculation Transformations
If the select pin of a multiplexer (mux) is on the critical path, and the mux drives a datapath
component, then the operation at the output side of the mux can be duplicated at the input
side of the mux. This unsharing process speeds up the critical path.
Figure 1-2 on page 29 shows the select line of the multiplexer on the critical path. The adder
at the output side of the mux is duplicated and moved toward the input side of the mux,
thereby removing it from the critical path.
VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity ex3 is
generic (w : natural := 16);
port ( a, b, c : in unsigned (w-1 downto 0);
cond : in std_logic;
y : out unsigned (w-1 downto 0) );
end ex3;
architecture rtl of ex3 is
signal a_mux_b : unsigned (w-1 downto 0);
begin
a_mux_b <= a when (cond = 1) else b;
y <= a_mux_b + c;
end rtl;
a c c b
a b
+ +
MUX
c
b
MUX
+
Critical Path
To generally turn off RTL speculation transformations use the dp_speculation root
attribute:
rc:/> set_attribute dp_speculation_ none /
Default: basic
To turn off RTL speculation transformations for a design use the dp_speculation
design attribute.
rc:/> set_attribute dp_speculation none [find /designs* -design name]
Default: inherited
To turn off speculation transformations within a particular subdesign use the
dp_speculation subdesign attribute.
rc:/> set_attribute dp_speculation none [find /designs* -subdesign name]
Default: inherited
When a set of datapath operators is identified as a candidate for CSA transformation, RTL
Compiler looks at each intermediary signal between them and examines whether it is
equipped with an appropriate bit range to carry the needed precision of its computation. Any
disqualified intermediary signal becomes the boundary of CSA transformation.
A truncation may or may not block transformation. In the process of qualifying an intermediary
signal for CSA transformation, RTL Compiler looks into the intention behind the RTL code and
analyzes the information content it needs to carry. The qualifying or disqualifying decision is
based on the required precision, instead of the HDL-implied precision suggested by its source
operator alone.
CSA transformation does not span across hierarchical boundaries. The output ports of a
module are never made in carrysave form. CSA transformation does not span across clock
cycles, and a signal feeding a register is never made in carrysave form.
A carry-save adder (CSA) takes three numbers and produces two outputs, one formed with
the sums and the other with the carryouts.
The most straightforward way to add up a set of numbers is to employ an adder tree. Each
adder consumes two numbers and produces one. The adder at the root of the tree generates
the final sum.
Use the carrysave transformation technique to greatly improve both timing and area.
Figure 1-3 on page 32 shows the carrysave transformation technique.
a b c d e f a b c d e f
+ + +
+
+
+
y
y
(a) Carry-propagate (b) Carrysave
Diagram (a) shows three carry propagate adders on the critical path.
Diagram (b) has only one carry-propagate adder on the critical path and shows how special
carrysave blocks can be used to perform the same carry-propagate addition. By taking in
three input numbers and generating two output numbers, the carrysave block adds up three
numbers without resolving the carry propagation. At the end, the only two remaining numbers
are said to be the sum in a carrysave form. A carry-propagate adder is needed to add the two
numbers to produce the final sum.
The sign represents a carry-save adder (CSA). The carrysave block does not incur the
delay of carry propagation. Its delay is small and is independent of the width of the operands.
This carrysave concept is applicable in various scenarios, such as the vector sum, which
adds up a set of numbers, or the Wallace tree, which adds up partial products inside of a
multiplier.
Timing analysis on a multiplier or a vector sum often identifies the carry-propagate adder as
a significant portion of the critical path. Therefore, employing a technique that merges
arithmetic operators can significantly improve timing and area. Figure 1-4 on page 33 shows
how this works on a block computing y = a * b + c * d.
a b c d a b c d
*
(+) *
(+)
a*b+c*d
+
y
y
(a) (b)
Figure 1-4 (a) shows an implementation using discrete operators, that is, without any CSA
transformations. It takes two multipliers and one adder to implement this function. The adders
are used to implement the multipliers, as indicated with the (+) in the diagram.
Traditionally, the synthesis tool labors to optimize each of these discrete operators
individually, without taking into account how they interact with each other. Each of these
operators have a carry-propagate adder; therefore, there are two carry-propagate adders that
land on the critical path.
Figure 1-4 (b) shows an implementation after CSA transformations. RTL Compiler looks at
the design at the operator level and recognizes that this is a cluster of arithmetic operators
that can be merged. Instead of implementing three discrete components, RTL Compiler
merges them as one larger complex operator, optimizing the entire merged operator. By
doing so, there is only one carry-propagate adder on the critical path.
Default: basic
To turn off carry-save transformations within a design, use the dp_csa design attribute.
rc:/> set_attribute dp_csa none [find /designs* -subdesign name]
Default: inherited
To turn off carry-save transformations within a particular subdesign, use the dp_csa
subdesign attribute.
rc:/> set_attribute dp_csa none [find /designs* -subdesign name]
Default: inherited
For each merging candidate of any scenario, merging may or may not take place, depending
on whether the original functionality can be preserved.
The purpose of merging operators is to eliminate intermediary carry propagate adders. This
can be applied to a set of datapath operators interacting with each other. The most typical
scenarios are vector sum, multiply-add, sum of product, or a combination of these. Datapath
operators are transformed in the following scenarios.
Vector Sum
a + b + c
Multiply-Add
a * b + c
Sum-of-Product
a * b + c * d
a * b + c * d + e - f
Merging is not limited to operators inferred in the same HDL statement. As shown in
Example 1-3 on page 35, both of the HDL code segments implement signal y using a single
merged operator.
p = a * b;
q = c * d;
y = p + q;
VHDL
p <= a * b;
q <= c * d;
y <= p + q;
y = a * b + c * d;
VHDL
y <= a * b + c * d;
Product-of-Sum CSA
Verilog
wire [13:0] a, b, c, d;
wire [15:0] cs, p;
wire [31:0] q;
wire [32:0] y;
assign cs = a + b + c + d;
assign y = cs * p + q;
VHDL
signal a, b, c, d : unsigned (13 downto 0);
signal cs, p : unsigned (15 downto 0);
signal q : unsigned (31 downto 0);
signal y : unsigned (32 downto 0);
In this case, the internal signal (cs) is kept in carrysave form and the downstream multiplier
is equipped with a special architecture that takes care of inputs in carrysave form. This
effectively merges the upstream adders with the downstream multiplier. There is only one
carry propagate adder from each input to output.
Comparator
Another common CSA transformation scenario is the comparison of the result of arithmetic
expressions, as shown in Example 1-5 on page 37. Both p and q in the example are in
carrysave format.
Verilog
wire [15:0] a, b, c, d;
wire [16:0] p, q;
wire is_greater, is_equal;
assign p = a + b;
assign q = c + d;
assign is_greater = (p > q);
assign is_equal = (p == q);
VHDL
signal a, b, c, d : unsigned (15 downto 0);
signal p, q : unsigned (16 downto 0);
signal is_greater, is_equal : std_logic;
p <= resize(a,17) + resize(b,17);
q <= resize(c,17) + resize(d,17);
is_greater <= 1 when (p > q) else 0;
is_equal <= 1 when (p = q) else 0;
Multiple-Fanout
A more challenging scenario is when one mergeable operator feeds into more than one
operator downstream, as shown in Example 1-6 on page 37, where the one multiplier needs
to be merged with each of the two downstream adders.
Verilog
cs = a * b;
x = cs + c;
y = cs + d;
VHDL
cs <= a * b;
x <= cs + c;
y <= cs + d;
In this case, the internal signal (cs) is kept in carrysave form and the downstream clusters (x
and y) add them up. This effectively merges the upstream multiplier with each of the two
downstream adders. There is only one carry propagate adder from input to each output.
Example 1-7 on page 38 shows a transformation scenario involving CSA over a multiplexer.
The tmp signal is in carrysave format.
Verilog
module test(a, b, c, d, s, e, z);
input [7:0] a, b, c, d;
input [14:0] e;
input s;
output [17:0] z;
wire [16:0] tmp;
assign tmp = s ? a*b + c*d : a*c + b*d;
assign z = tmp + e;
endmodule
VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity test is
port ( a, b, c, d : in unsigned (7 downto 0);
s : in std_logic;
e : in unsigned (14 downto 0);
z : out unsigned (17 downto 0) );
end test;
architecture rtl of test is
signal tmp : unsigned (16 downto 0);
begin
tmp <= (resize(a*b,17) + resize(c*d,17)) when (s = 1)
else (resize(a*c,17) + resize(b*d,17));
z <= resize(tmp,18) + resize(e,18);
end rtl;
Example 1-8 on page 38 shows a transformation scenario involving CSA over an inverter.
The tmp signal is in carrysave format.
Verilog
module test (a, b, c, z);
input [15:0] a, b, c;
output [16:0] z;
wire [16:0] tmp = a + b;
assign z = ~tmp + c;
endmodule
VHDL
library ieee;
use ieee.numeric_std.all;
entity test is
port ( a, b, c : in unsigned (15 downto 0);
z : out unsigned (16 downto 0) );
end test;
architecture rtl of test is
signal tmp : unsigned (16 downto 0);
begin
tmp <= resize(a,17) + resize(b,17);
z <= (not tmp) + c;
end rtl;
Truncated CSA
Example 1-9 on page 39 shows a special CSA case known as a truncated CSA. The r signal
in the example will be transformed.
Verilog
wire [15:0] a, b, c, d;
wire [16:0] p, q;
wire [17:0] r;
wire [15:0] y;
assign p = a + b;
assign q = c + d;
assign r = p + q;
assign y = r[17:2]; // the three operators are merged
VHDL
signal a, b, c, d : unsigned (15 downto 0);
signal p, q : unsigned (16 downto 0);
signal r : unsigned (17 downto 0);
signal y : unsigned (15 downto 0);
CSA transformations work on inferred operators, but not on instantiated ones, as shown in
Example 1-11 on page 41. There are two reasons for this:
An instantiated module is a user-defined design hierarchy that RTL Compiler must honor.
RTL Compiler is not allowed to automatically dissolve a user-defined module for the
purpose of CSA transformations, as shown in Example 1-10 on page 40.
RTL Compiler does not guess the arithmetic functionality of a user-defined module by its
module name, its input or output port names, or its input or output bit widths.
Verilog
module test (a, b, c, d, y);
input [7:0] a, b, c, d;
wire [15:0] p, q;
output [15:0] y;
VHDL
library ieee, cw;
use ieee.std_logic_1164.all;
use cw.components.all;
entity test is
port ( a, b, c, d : in std_logic_vector (7 downto 0);
y : out std_logic_vector (15 downto 0) );
end test;
architecture rtl of test is
signal p, q : std_logic_vector (15 downto 0);
begin
u1 : CW_mult generic map (wA => 8, wB => 8)
port map (A => a, B => b, TC => 0, Z => p);
u2 : CW_mult generic map (wA => 8, wB => 8)
port map (A => c, B => d, TC => 0, Z => q);
u3 : CW_sum generic map (nI => 2, wAi => 16, wZ => 16)
port map (A => (p & q), TC => 0, Z => y);
end rtl;
Verilog
module test (a, b, c, d, y);
input [7:0] a, b, c, d;
wire [15:0] p, q;
output [15:0] y;
assign p = a * b;
assign q = c * d;
assign y = p + q;
endmodule
VHDL
library ieee;
use ieee.numeric_std.all;
entity test is
port ( a, b, c, d : in unsigned (7 downto 0);
y : out unsigned (15 downto 0) );
end test;
architecture rtl of test is
signal p, q : unsigned (15 downto 0);
begin
p <= a * b;
q <= c * d;
y <= p + q;
end rtl;
CSA transformation works on inferred operators. RTL Compiler knows exactly what they do
and how they can be merged. RTL Compiler cannot merge operators represented by
imported gate-level netlists for the following two reasons. With an instantiated component,
1. RTL Compiler must respect the user- defined design hierarchy
2. RTL Compiler does not guess the functionality of a user-defined module and does not
try to reverse-engineer the hidden functionality in a gate-level representation. For
example, in Example 1-12 on page 41, RTL Compiler cannot determine whether the
numbers are being added or not.
Verilog
module add8 (y, a, b);
input [7:0] a, b;
output [7:0] y;
wire n1, n2, n3, n4, n5, n6, n7;
HA1 i0 (.A(a[0]), .B(b[0]), .S(y[0]), .CO(n1));
FA1A i1 (.A(a[1]), .B(b[1]), .CI(n1), .S(y[1]), .CO(n2));
FA1A i2 (.A(a[2]), .B(b[2]), .CI(n2), .S(y[2]), .CO(n3));
VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity add8 is
port ( y : out unsigned (7 downto 0);
a, b : in unsigned (7 downto 0) );
end add8;
architecture rtl of add8 is
component HA1 port ( A, B : in std_logic; S, CO : out std_logic );
end component;
component FA1A port ( A, B, CI : in std_logic; S, CO : out std_logic );
end component;
component EO3 port ( A, B, C : in std_logic; Z : out std_logic );
end component;
signal n1, n2, n3, n4, n5, n6, n7 : std_logic;
begin
i0 : HA1 port map ( A => a(0), B => b(0), S => y(0), CO => n1);
i1 : FA1A port map ( A => a(1), B => b(1), CI => n1, S => y(1), CO => n2);
i2 : FA1A port map ( A => a(2), B => b(2), CI => n2, S => y(2), CO => n3);
i3 : FA1A port map ( A => a(3), B => b(3), CI => n3, S => y(3), CO => n4);
i4 : FA1A port map ( A => a(4), B => b(4), CI => n4, S => y(4), CO => n5);
i5 : FA1A port map ( A => a(5), B => b(5), CI => n5, S => y(5), CO => n6);
i6 : FA1A port map ( A => a(6), B => b(6), CI => n6, S => y(6), CO => n7);
i7 : EO3 port map ( A => a(7), B => b(7), C => n7, Z => y(7) );
end rtl;
library ieee;
use ieee.numeric_std.all;
entity test is
port ( y : out unsigned (7 downto 0);
a, b, c : in unsigned (7 downto 0) );
end test;
architecture rtl of test is
component add8
port ( y : out unsigned (7 downto 0);
a, b : in unsigned (7 downto 0) );
end component;
signal p : unsigned (7 downto 0);
begin
-- y <= a + b + c;
u0 : add8 port map (a => a, b => b, y => p);
u1 : add8 port map (a => p, b => c, y => y);
end rtl;
Often there is RTL code that has multiple datapath operators, but RTL Compiler concludes
that they cannot be merged because these operators do not interact with each other, as
shown by the RTL code in Example 1-13 on page 43.
In Example 1-13 on page 43, the operators come from the same source and their outputs go
into the same mux, although different pins. These operators are not interacting with each
other.
VHDL
signal mode : unsigned (1 downto 0);
signal a, b, y : unsigned (15 downto 0);
variable p : unsigned (31 downto 0);
case mode is
when "00" => y <= a + b;
when "01" => y <= a - b;
when "10" => p := a * b; y <= p(15 downto 0);
when others => y <= a + b;
end case;
Architecture Selection
The best architecture for a datapath operator is a function of the design constraints,
technology library, and its surrounding logic. The choice should not be uniform among all
operators since each operator has its own environment. Whether done manually or
automatically through a synthesis tool, architecture selection performs accurate timing
analysis and makes precise decisions based on the delay and area calculations.
For each operator in the design, whether it is merged or discrete, RTL Compiler selects the
best architecture, also known as speed grade. There can be multiple architectures or speed
grades for datapath operators, such as addition, subtraction, multiplication, and division.
Furthermore, the implementation of the selected architecture is refined to improve the overall
quality of results. These architecture selection and implementation refinement decisions are
a function of timing constraints, surrounding control logic, and the target technology library.
Figure 1-5 is a simplified scenario of timing-driven architecture selection. The adder at the
upper half of the figure is more timing critical and may be implemented using a fast
architecture. The adder at the lower half of the figure has more slack and can be implemented
using a slower architecture. RTL Compiler evaluates the situation and may change the
architecture if it helps timing or area.
fast
critical path
+
+
slow
*
Note: If you choose to explicitly control this process through the user_speed_grade
attribute, then you must do so after using the synthesize -to_generic command.
Otherwise, RTL Compiler will ignore the user specified speed grade and implement an
architecture that may or may not coincide with the specified speed grade.
The number of architectures available to a particular datapath operator is usually less than
five. Therefore, in some cases, RTL Compiler will map different speed grades to the same
architecture. For example, the very_slow and slow speed grades may actually be
implemented with the same architecture.
Find out the architecture of a particular datapath component using the speed_grade
attribute. For example:
rc:/> get_attribute speed_grade [find /designs* -subdesign name]
The speed_grade attribute is a read-only attribute that returns a string whose value is
either very_fast, fast, medium, slow, or very_slow.
Booth encoding may make the multiplier faster, smaller, or both depending on the width of the
multiplicand, the multiplier, and the underlying technology library.
To explicitly control the sub-architecture of a multiplier, use one of following two methods:
Specify a valid implementation using the user_sub_arch attribute.
For example:
set_attribute user_sub_arch {booth | non_booth | radix8} \
[find /designs* -subdesign name]
If any string other than booth, non_booth or radix8 is provided as the sub-architecture,
then that value is ignored and a warning message is issued.
Once you have set the sub_arch of a multiplier, the rest of the RTL Compiler flow will honor
the setting.
Note: In a CSA_tree group, different participating multipliers can have different sub_arch
pragma settings. For example:
assign Z = a * /* cadence sub_arch booth */ b +
c * /* cadence sub_arch non_booth/ * d;
Find out the sub-architecture of a particular multiplier component using the sub_arch
attribute. For example:
rc:/> get_attribute sub_arch [find /designs* -subdesign name]
The sub_arch attribute returns a string whose value is either booth , non_booth or
radix8.
Default: false
Using this attribute accurately downsizes a datapath component without degrading timing.
However, there is a potential increase in run-time.
Note: Using this attribute is only effective during incremental optimization. This
operation is not performed if the synthesize -to_mapped -no_incremental
command is set.
Perform architecture upsizing after mapping by setting the dp_postmap_upsize
attribute to true. For example:
rc:/> set_attribute dp_postmap_upsize true /
Default: false
Using this attribute can improve slack when a datapath component may be affecting the
critical path. Using this attribute improves timing but may result in a longer run-time.
Note: Using this attribute is only effective during incremental optimization. This operation is
not performed if the synthesize -to_mapped -no_incremental command is set.
2
Datapath Reporting
Command Syntax
Report datapath operators that were inferred from the design using the following
command:
report datapath [-full_path] [-no_header] [-no_area_statistics] [-mux] [-all]
[design] [-max_width string] [-print_instantiated] [-print_inferred]
[> file]
-all reports all the datapath operators present in the design including muxes, as shown
in Example 2-5.
Note: The mux operators are different from the MUX library cells that are picked by the
mapper or are available in the technology library.
design specifies a particular design on which to report datapath operators. By default,
RTL Compiler reports on the current design.
file specifies the name of the file to write the report.
-full_path reports the full UNIX path and name of the filename. By default, RTL
Compiler only reports the design name.
Using the -full_path option is useful when the HDL source files being synthesized
are located in directory/directories, which is different from the directory where
RTL Compiler is being executed.
By default the report datapath command does not print the full path of the file, it only
prints the filename. For example, if RTL Compiler is run in the following directory:
/home/somebody/report_example
then by default, the report datapath command reports the example.v file in the
Filename column of the report.
If you use the -full_path option with the report datapath command, then the full
path of the file is displayed, which is the name of the file provided to the read_hdl
command. For example, the following is the relative path used with the read_hdl
command.
exec dir : /home/somebody/report_example
file dir : /home/somebody/report_example/src_files
Then, when you use the -full_path option, the path of the source file relative to the
current directory is reported. For example:
src_files/example.v
-no_area_statistics suppresses the table that shows the total area and
percentage information. The area and the percentage of the total area consumed by the
datapath operators in the design are only available after issuing the
synthesize -to_mapped command.
By default, when using the report datapath command on a mapped netlist containing
datapath operators, you will get the area statistics of the design, as shown in
Example 2-2.
------------------------------------
datapath modules 4938.00 100.00
mux modules 0.00 0.00
others 0.00 0.00
------------------------------------
total 4938.00 100.00
This information is useful in determining the percentage of the design that contains
datapath operators.
By default, the area report is suppressed for a netlist that contains only generic cells (no
library cells).
-mux reports muxes present in the design, as shown in Example 2-8. Muxes are not
reported by default. Using the -mux option only displays the muxes in the design and
suppresses the other datapath operators. To view both, use the -all option.
-print_inferred reports only the inferred datapath components in the design.
-print_instantiated reports only the instantiated datapath components in the
design.
Example 2-5 Results of report datapath After Using the Elaborate Command
Instantiated datapath components
Inferred components
Example 2-6 on page 56 shows the report datapath command after using the
synthesize -to_generic command.
The following section briefly describes the column headings in the report datapath
command report.
Module Name of the datapath partition
Instance Instance name of the datapath partition
Operator Operator type of the individual datapath operator
Signedness Sign type of the individual datapath operator
Architecture Selected architecture of the individual datapath operator
Note: The no_value listed in the report means that the CSA tree contains partial
product generators of two multipliers and a carrysave tree of a*b*c+d. The partial
product generators has two sub-architectures, booth and non-booth. An adder does not
have a sub-architecture choice. Therefore, no_value indicates that there is not a sub-
architecture.
Inputs Bit-width of input operands of the individual datapath operator
Outputs Bit-width of output operands of the individual datapath operator
Area Area consumed by the datapath partition
Line Line number in the RTL code where the operator is inferred
Col Column number in the RTL code where the operator is inferred
Filename File name of the RTL code that infers these operators
The report regarding the datapath modules is followed by a datapath area summary (under
the headings: Type, Area, and Percentage) that includes the entire design. This second part
of the report shows the percentage of area consumed by all the datapath operators in the
entire design.
The following sections describe the column headings of the datapath report in greater detail.
An artificial level of design hierarchy is created for each partition. The Module and Instance
column show its module name and instance name, respectively. RTL Compiler generates the
module and instances names. They cannot be predetermined.
If the datapath operator is multiply (*), add (+), or subtract (-), then its Verilog symbol is
shown.
If a datapath partition has multiple operators, then these operators form a CSA tree, which
has only one carry-propagate adder on the timing path through it. Examining how operators
are grouped into partitions reveals how the carrysave transformation is applied.
Architecture
The Architecture column shows the selected architecture of an individual datapath operator.
Before using the synthesize -to_mapped command, the information in this column is
only intermediate, or temporary, because RTL Compiler has not yet chosen the optimal
architecture. Only after using the synthesize -to_mapped command will the final
implemented architecture appear.
There is only one input operand if the operator is a unary minus. There are three input
operands if the operator is, for example, an addition with a one-bit carry-in. There are two
output operands if the output is in carrysave format.
Area
The Area column shows the area occupied by a datapath partition. It is only available after
using the synthesize -to_mapped command.
The second part of the datapath report, under the Type, Area, and Percentage headings
summarizes the percentage of area taken by the datapath partition in the design. These
numbers are not meaningful until you have used the synthesize -to_mapped command.
When you set this attribute to false, all the file, row, and column information is not printed
out or is not available.
output [8:0] y;
reg [8:0] y;
always @(in1 or in2 or in3 or in4 or sel)
begin
if (sel == 1b0)
y = in1 + in2;
else
y = in3 + in4;
end
endmodule; // sharing
Example 2-8 report datapath for RTL Sharing that Shows the Mux Operator
Inferred components
Operator Signedness Inputs Outputs CellArea
========================================================
sharing
add_12_14
module:add_unsigned
very_fast + unsigned 8x8 9 130.50
========================================================
sharing
add_10_14
module:add_unsigned
very_fast + unsigned 8x8 9 130.50
========================================================
sharing
mux_out_9_10
module:mux
very_fast mux n/a 2x9x9 9 27.00
========================================================
3
RC-Datapath (RC-DP) Functions
Table 3-1 shows the function primitives. All of these functions have an output precision that
can be computed simply from the input signal attributes.
Syntax Function
$abs(X) Returns absolute value of the signed input X.
$carrysave If the input can be generated in carrysave format, then it
returns a concatenation of the two words of the carrysave
result.
$blend (X0, X1, alpha, Alpha blender. Returns value obtained by interpolating
alpha1) between values represented by X0, X1.
$intround Returns an approximate product of the first input by rounding
off as many lead-significant bits of each partial product as the
value of the second input position.
$lead0(X) Returns count of leading 0s. If the input is an all 0 signal, then
the output is an all 1 signal.
$lead1(X) Returns count of leading 1s. If the input is an all 1 signal, then
the output is an all 1 signal.
$round (X, pos) Rounds off the input value coming in through port in and
rounds off as many least significant bits (LSBs) as the value
of pos.
$rotatel (X,Y) Left rotates the signal in data by as many bits as given by the
value in amount signal.
$rotater (X,Y) Right rotates the signal in data by as many bits as given by
the value in amount signal.
Of the primitives listed in this table, $blend and $abs are mergeable functions that are used
in sum of products operations. These built-in signal functions are potentially mergeable with
surrounding computations representing sum or sum-of-product. Detailed information about
the mergeability of individual primitives is available in the following section.
Primitive Functionality
This section describes the functionality of the primitives listed in Table 3-1. For all datapath
function primitives, each argument can be a signed or unsigned expression, though not a
word array or a word array slice. Therefore, function primitives cannot be used on arrays or
word arrays or on expressions whose result type is an array or word array slice. Whenever an
expression is used as an argument to a function primitive, the width and sign type of the
expression result are self-determined by Verilog rules. An integer constant has the width of
32 and an array reference has the same width and sign type as each array word.
$abs
Output Format
The width of the output of $abs is the same as the width of input X. The output sign type is
always unsigned.
Function
For the integer value represented by the input parameter, ($abs/it) outputs the absolute value
of the input integer as an unsigned integer.
The value of the most significant bit determines whether negation takes place. The input
signal is always treated as signed regardless of the sign type of the input signal at the parent
module that calls this $abs() function.
Note: The absolute value is potentially mergeable inside a sum-of-product computation. As
shown in Example 3-1, a computation, such as $abs(X) +a*b, is potentially mergeable and
can be synthesized with just one carry-propagate adder.
Example 3-2 shows the simulation model for the $abs primitive function. Change the
parameters values as needed.
$blend
Output Format
The width and sign type of the $blend output is determined by widths of the X0, X1, and
alpha inputs and the sign types of the X0 and X1 inputs, as shown in Table 3-2.
Function
The blend operation represents the interpolation between two numeric values to produce
another numeric value. Define to be the sum of integer values of alpha and alpha1. Let
wa
max be 2 , where wa is the width of alpha. Let X0 and X1 represent the integer values
of X0 and X1. Then the blend function is mathematically given by the expression:
X0* ( max ) + X1* .
Note: When the above expression is divided by the alphamax, it represents a value that is
in between the extreme x0 and x1 values. Therefore, the above expression represents an
integer-normalized interpolation between X0 and X1.
The $blend output is a signed signal only if at least one of the X0 and X1 values is signed.
Internal to $blend, the alpha1 is a single bit unsigned input. Therefore, irrespective of the
width and sign type of the input to alpha0 port, only the least significant bit of the connected
input is used internally as the value at alpha1.
Note: Regardless of the sign type of the signal connected to the alpha port, it is always
internally treated as an unsigned signal. Also, since $blend is essentially a sum-of-product
computation, it is mergeable with the surrounding computation, as long as the merged
Example 3-4 shows the simulation model for the $blend primitive function. Change the
parameters values as needed.
parameter wx0 = 2;
parameter sx0 = 0;
parameter wx1 = 2;
parameter sx1 = 0;
parameter walp = 2;
begin
X0_s = {sx0 ? X0[wx0-1] : 1b0, X0};
X1_s = {sx1 ? X1[wx1-1] : 1b0, X1};
al = alpha + alpha1;
alc = (1b1 << walp) - al;
blend = al*X1_s + alc*X0_s;
end
endfunction
$carrysave
Output Format
The width of the carrysave output is twice the width of the input. The X input can contain a
constant, a single signal name, or an arithmetic expression. The width of the input is
determined by Verilogs rules of self-determination of expression width. The sign type of the
carrysave output is unsigned.
Function
The carrysave function assumes that the result of the input expression can be generated in
carrysave format. If this is the case, then the output signal will be a concatenation of the two
words of the carrysave result. When the input expression cannot produce the result in
carrysave format, the output remains twice as wide as the input expression width. The left half
is all zeroes and the right half is the binary representation of the input expressions value.
Note: There is not a bit-accurate simulation model for the $carrysave function primitive.
The use of the tmp variable in the example is needed to determine the bit width of the sum
a + b + c.
Without such an intermediary variable whose bit width is explicitly specified, the bit width of
the sum would follow the rules of self-determined bit width in Section 4.4 Expression bit
lengths in the Verilog-2001 LRM (IEEE Std 1364-2001).
For example,
input [7:0] a, b, c;
wire [19:0] cs = $carrysave(a + b + c);
is equivalent to
input [7:0] a, b, c;
wire [7:0] tmp = a + b + c; // bit width of sum is same as bit width of widest term
wire [19:0] cs = $carrysave(tmp);
$intround
Output Format
The output width is the width of the X input. The output sign-type is the sign-type of the X
input.
Function
The X input is assumed to come from a sum-of-product tree. For each $intround()
operation, RTL Compiler traverses backward and identifies the product terms. A product term
is either a multiplication operator or a single operand.
The $intround() operation computes an approximate product of the first input by rounding
off as many least-significant bits (LSB) of each partial product as the value of the second input
position. The rounding causes the final product bit vector, which is returned, to have as many
0 valued least-significant bits as the number of positions. The approximation reduces area,
although the result of the computation is approximate.
Note: There is not a bit-accurate simulation model for the $intround function primitive.
$lead0
Output Format
Function
Computes the count of leading zeroes in the input argument and outputs this count as an
unsigned binary number. If the input is an all 0 string, then the output is an all 1 string.
Example 3-8 shows the simulation model for the $lead0 primitive function. Change the
parameters values as needed.
$lead1
Output format
Function
Computes the count of leading ones in the input argument and outputs this count as an
unsigned binary number. If the input is an all 1 string, then the output is an all 1 string.
Example 3-10 shows the simulation model for the $lead1 primitive function. Change the
parameters values as needed.
$rotatel
Output Format
The output width and signtype is the same as the input width and signtype.
Functionality
Rotate left the signal given in the in input using the same width as in and by as many bits
as given by the value in the rotate signal.
If width of rotate is larger than log 2 ( inwidth ) + 1 , then only the log 2 ( inwidth ) + 1
amount of least significant bits of rotate are used.
Example 3-12 shows the simulation model for the $rotate1 primitive function. Change the
parameters values as needed.
$rotater
Output Format
The output width and signtype are the same as the input width and signtype.
Functionality
Rotate right the signal given in the in input using the same width as in and by as many bits
as given by the value in the rotate signal.
If the width of rotate is larger than log 2 ( inwidth ) + 1 then only log 2 ( inwidth ) + 1
many least significant bits of rotate are used.
Example 3-14 shows the simulation model for the $rotater primitive function. Change the
parameters values as needed.
$round
Output Format
The output format has the same width and signtype as the port in input.
Functionality
The input value coming in through port in is rounded off and as many least significant bits
are rounded off as the value of pos. In other words, the output value is generated by starting
with the same value as in, then as many least significant bits to 0 as the value of pos. The
bit at position pos-1 of in is added at bit position number pos.
Example 3-16 shows the simulation model for the $round primitive function. Change the
parameters values as needed.
Parameter Functions
Table 3-3 describes global system functions that assist in writing parameterized code. The
inputs to these functions must be constant expressions. Because these functions are
themselves constant expressions, they do not imply any hardware.
.
Table 3-3 Provided Global System Functions
Note: The input arguments for $max, $min, and $log2 must evaluate to constants.
ChipWare Components
Some datapath functions can be conveniently described with ChipWare components. RTL
Compiler supports its proprietary ChipWare components, as well as DesignWare, GTECH,
and AmbitWare libraries.
See the ChipWare in Encounter RTL Compiler for more information on ChipWare and
third party libraries.
4
Datapath Coding Styles
Overview on page 80
Starting from the RTL on page 80
Importing the Gate-Level Netlist on page 80
Keeping Relevant Operators in the Same Hierarchy on page 81
Inferring Datapath Modules on page 84
Applying Carry-save Arithmetic Automatically on page 85
Inferring a Constant Multiplier on page 86
Using Signed Arithmetic on page 88
Using Signed Part-Select / Concatenation on page 90
Implementing Programmable Unsigned and Signed Multiplier on page 91
Mixing Unsigned and Signed Expressions on page 94
Avoiding Manual Signal Extension Techniques on page 95
VHDL Signed/Unsigned Type Conversion on page 96
Inferring a Square Automatically on page 97
Implying Upper-Bit Truncation on page 98
Following Self-Determined Bit Width Rules on page 101
Avoiding Instantiated Components on page 104
Arithmetically Complementing an Operand on page 107
Overview
Immediately after reading in the design, RTL Compiler separates datapath computations from
control-related logic and creates one datapath partition for each set of merged datapath
operators. When examining opportunities to merge operators, the highest priority is to
preserve the original functionality. More merging usually leads to better quality of silicon
(QoS). Based on a set of intelligent rules, RTL Compiler performs as much operator merging
as possible.
However, RTL Compiler does not understand the overall design specification behind the RTL
code. There are scenarios where the RTL coding style affects merging activities, and
therefore affects the QoS. In these scenarios the RTL coding style imposes more restrictions
on operator merging than necessary. The following sections discuss typical coding scenarios
and guidelines that interfere with or support maximum merging of operators.
Tip
Guideline: If two arithmetic operators are directly interacting with each other, keep
them at the same level of hierarchy, that is, in the same module.
A great deal of RTL code keeps an adder, subtractor, or multiplier in a module by itself for
various reasons. CSA transformation respects user-defined design hierarchies and does not
merge across hierarchical boundaries. Therefore, a standalone operator inside a module
cannot be merged with operators in other modules, which has a negative impact on QoS.
The design in Example 4-1 separates the interacting operators and operands in different
modules thereby preventing any CSA opportunities.
The designs in Example 4-2 on page 83 and Example 4-3 on page 83, keep the interacting
operators and operands in the same module.
Verilog
module mult (y, a, b);
input [7:0] a, b;
output [15:0] y;
assign y = a * b;
endmodule
module add (y, a, b);
input [15:0] a, b;
output [15:0] y;
assign y = a + b;
endmodule
module test (y, a, b, c);
input [7:0] a, b;
input [15:0] c;
output [15:0] y;
wire [15:0] p;
mult U1 (p, a, b);
add U2 (y, p, c);
endmodule
VHDL
library ieee;
use ieee.numeric_std.all;
entity mult is
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (7 downto 0)
);
end mult;
architecture rtl of mult is
begin
y <= a * b;
end rtl;
library ieee;
use ieee.numeric_std.all;
entity add is
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (15 downto 0)
);
end add;
architecture rtl of add is
begin
y <= a + b;
end rtl;
library ieee;
use ieee.numeric_std.all;
entity test is
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (7 downto 0);
c : in unsigned (15 downto 0)
);
end test;
architecture rtl of test is
component mult
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (7 downto 0)
);
end component;
component add
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (15 downto 0)
);
end component;
signal p : unsigned (15 downto 0);
begin
U1: mult port map ( p, a, b);
U2: add port map ( y, p, c);
end rtl;
Verilog
module test (y, a, b, c);
input [7:0] a, b;
input [15:0] c;
output [15:0] y;
wire [15:0] p;
assign p = a * b;
assign y = p + c;
endmodule
VHDL
library ieee;
use ieee.numeric_std.all;
entity test is
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (7 downto 0);
c : in unsigned (15 downto 0)
);
end test;
architecture rtl of test is
signal p : unsigned (15 downto 0);
begin
p <= a * b;
y <= p + c;
end rtl;
Verilog
module test (y, a, b, c);
input [7:0] a, b;
input [15:0] c;
output [15:0] y;
assign y = a * b + c;
endmodule
VHDL
library ieee;
use ieee.numeric_std.all;
entity test is
port ( y : out unsigned (15 downto 0);
a, b : in unsigned (7 downto 0);
c : in unsigned (15 downto 0)
);
end test;
architecture rtl of test is
begin
y <= a * b + c;
end rtl;
Tip
Guideline: Infer adders, subtractors, or multipliers, rather than creating them
yourself.
You can often handcraft arithmetic operators. For example, instead of inferring a multiplier,
you may choose to devise a certain architecture for the multiplier and describe its
implementation in detail, including Booth encoding, partial product generation, carrysave
reduction, and so on. Sometimes, a part of the architecture may be described at an
abstraction level as low as logic equations.
Note: Although at quite a low level, this is still called RTL coding since it does not directly
instantiate gates in the target library.
If you consider the performance of an individual adder, subtractor, or multiplier, then the
architectures built into the datapath engine are usually as good as what can be accomplished
by handcrafting. However, when you consider the overall design, carrysave transformation
becomes the differentiator between inferring and handcrafting.
Tip
Guideline: Do not handcraft the carrysave technique. Let RTL Compiler apply the
carrysave technique through CSA transformations automatically.
Although this practice of handcrafting arithmetic operators does work, it limits RTL Compiler
to the architecture and the implementation described in the RTL code. It also makes the RTL
code difficult to read and maintain.
Tip
Guideline: Do not perform manual shift-and-add for a constant multiplication. Just
infer a multiplier.
The traditional method to synthesize constant multiplication is to start from a full multiplier and
later let logic optimization remove all of the redundant logic.
Just like other handcrafting, this manual shift-and-add approach hurts operator merging.
Datapath synthesis does shift-and-add whenever it helps QoS. Handcrafted shift-and-add is
no longer needed.
To give RTL Compiler more freedom to optimize, use the RTL coding style for unsigned
constant multiplication shown in Example 4-5.
Verilog
wire [15:0] a;
wire [23:0] y;
assign y = a * 76; // i.e. a * 8b01001100
VHDL
signal a : unsigned (15 downto 0);
signal y : unsigned (23 downto 0);
To give RTL Compiler more freedom to optimize, use the coding style for signed constant
multiplication shown in Example 4-7.
Verilog
wire signed [15:0] a;
wire signed [23:0] y;
assign y = a *76 ; // i.e. a * 8sb01001100
VHDL
signal a : signed (15 downto 0);
signal y : signed (23 downto 0);
Tip
Guideline: Use signed operators for any design that needs signed arithmetic.
Using unsigned operators to perform signed arithmetic requires lots of explicit, manual sign
extension, resulting in lengthy and error-prone RTL code.
Verilog-1995 does not support a signed data type. RTL Compiler supports Verilog-2001,
which has a signed data type and associated signed operators.
The coding styles shown in Example 4-8 and Example 4-9 both work for signed addition.
When in doubt, use the coding style shown in Example 4-9.
Verilog -2001
wire signed [15:0] a, b;
wire signed [16:0] y;
assign y = a + b; // infers an 16x16 signed adder
VHDL
signal a, b : signed (15 downto 0);
signal y : signed (16 downto 0)
Example 4-10 and Example 4-11 both work for signed multiplication. When in doubt, use
Example 4-11.
Verilog -2001
wire signed [16:0] a;
wire signed [14:0] b;
wire signed [31:0] y;
assign y = a * b; // infers a 17x15 signed multiplier
VHDL
signal a : signed (16 downto 0);
signal b : signed (14 downto 0);
signal y : signed (31 downto 0)
y <= a * b;
Tip
Guideline: Do not use part-selects that specify an entire vector.
Tip
Guideline: Use selective zero-/sign-extend operands and have a single datapath to
implement switchable unsigned and signed datapath, as shown in Example 4-15.
When both a signed and unsigned version of an operator is needed, and there is a way to use
one bigger operator to implement both, using one single operator usually produces a better
QoS.
Example 4-14 Using Two Operators to Implement Switchable Unsigned and Signed
Multiplier
Verilog
module ex_4_14 (a, b, tc, y); //bad QoS
parameter width = 16;
input [width-1:0] a, b;
input tc;
output [2*width-1:0] y;
wire signed [width-1:0] a_sgn, b_sgn;
wire signed [2*width-1:0] y_sgn;
wire [2*width-1:0] y_uns;
assign a_sgn = a;
assign b_sgn = b;
VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity ex_4_14 is
generic (width : natural := 16);
port ( a : in unsigned (width-1 downto 0);
b : in unsigned (width-1 downto 0);
tc : in std_logic;
y : out unsigned (width*2-1 downto 0) );
end ex_4_14;
architecture rtl of ex_4_14 is
signal a_sgn, b_sgn : signed (width-1 downto 0);
signal y_sgn : signed (width*2-1 downto 0);
signal y_uns : unsigned (width*2-1 downto 0);
begin
a_sgn <= signed(a);
b_sgn <= signed(b);
y_sgn <= a_sgn * b_sgn; -- a 16x16 signed multiplier
y_uns <= a * b; -- a 16x16 unsigned multiplier
y <= unsigned(y_sgn) when (tc = 1) else y_uns;
end rtl;
Example 4-15 Using a Single Operator to Implement Switchable Unsigned and Signed
Multiplier
Verilog
module ex_4_15 (a, b, tc, y); //good QoS
parameter width = 16;
input [width-1:0] a, b;
input tc;
output [2*width-1:0] y;
wire a_msb, b_msb;
wire signed [width:0] a_sgn, b_sgn;
VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity ex_4_15 is
generic (width : natural := 16);
port ( a : in unsigned (width-1 downto 0);
b : in unsigned (width-1 downto 0);
tc : in std_logic;
y : out unsigned (width*2-1 downto 0) );
end ex_4_15;
architecture rtl of ex_4_15 is
signal a_msb, b_msb : std_logic;
signal a_sgn, b_sgn : signed (width downto 0);
signal y_extra : signed (width*2+1 downto 0);
begin
a_msb <= tc and a(width-1);
b_msb <= tc and b(width-1);
a_sgn (width) <= a_msb;
b_sgn (width) <= b_msb;
a_sgn (width-1 downto 0) <= signed(a);
b_sgn (width-1 downto 0) <= signed(b);
y_extra <= a_sgn * b_sgn; -- a 17x17 signed multiplier
y <= unsigned (y_extra (width*2-1 downto 0));
end rtl;
Tip
Guideline: Do not unintentionally mix unsigned and signed expression types in one
expression.
Avoid mixing unsigned and signed signals in an arithmetic expression. VHDL does not allow
mixing signed and unsigned operators among the operands of an operator.
Verilog allows mixing signed and unsigned operators among the operands of an operator.
When this happens, as defined by the IEEE Standard, an operator is unsigned if any one of
its operands is unsigned.
Example 4-16 is shows a coding style where you may not be aware that RTL Compiler is
synthesizing an unsigned multiplier. If the intention is to have a multiplier taking an unsigned
operand and a signed operand, then Example 4-17 shows the correct coding style.
Example 4-18 shows a coding style where you may not be aware that RTL Compiler is
synthesizing an unsigned constant multiplier. If the intention is to have a constant multiplier
taking a signed signal and a unsigned constant, then Example 4-19 shows the correct coding
style.
Example 4-16 Mixing Operands of Different Sign Type Results in Unsigned Functionality
wire [15:0] a; //unsigned
wire signed [15:0] b; //signed
wire signed [31:0] prod;
assign prod = a * b; //an unsigned multiplier
Example 4-17 Inferring a Signed Multiplier when Mixing Operands of Different Sign Type
wire [15:0] a;
wire signed [15:0] b;
wire signed [31:0] prod;
// Add 0 as signed bit, then cast to signed
assign prod = $signed ({1'b0, a}) * b; //a signed multiplier
Example 4-18 Mixing Operands of Different Sign Type Results in Unsigned Functionality
wire signed [15:0] a;
wire signed [19:0] prod;
assign prod = a * 4'b1011; //an unsigned constant multiplier
// 4'b0011 is an unsigned constant
Example 4-19 Inferring a Signed Multiplier when Mixing Operands of Different Sign Type
wire signed [15:0] a;
wire signed [19:0] prod;
assign prod = a * 5'sb01011; //a signed constant multiplier
// 5sb01011 is a signed constant
Tip
Guideline: Declare appropriate signal types to automatically handle signal
extension.
Manual signal extension is popular in RTL coding of real-world designs, which could be zero-
extension for unsigned data or sign-extension for signed data.
RTL Compiler performs behavioral analysis and understands the real intention behind the
concatenation. However, when it comes to explicit manual signal extension, as shown in
Example 4-20 and Example 4-22, there is the potential to create a scenario of mixed signed
and unsigned operands. If this happens, using the intended sign type, as shown in
Example 4-21 and Example 4-23 can be a better practice.
Tip
Guideline: When using numeric packages, make sure you clearly specify whether
arithmetic operations are unsigned or signed.
Do not declare a signal type that is inconsistent with the intended operation, as shown in
Example 4-24.
Declare the signal type that matches the intended operation, as shown in Example 4-25.
Use the ieee.numeric_std IEEE numeric package for numeric types and functions.
Use only one numeric package at a time.
Use the unsigned and signed numeric types in all arithmetic expressions.
Use the report datapath command to list the datapath blocks and the input and output
operand types.
Verilog
wire [15:0] a;
wire [31:0] y;
assign y = a * a;
VHDL
signal a : unsigned (15 downto 0);
signal y : unsigned (31 downto 0);
y <= a * a;
Verilog
wire signed [15:0] a;
wire [31:0] y;
assign y = a * a;
VHDL
signal a : signed (15 downto 0);
signal y : signed (31 downto 0);
y <= a * a;
Tip
Guideline: Be aware of implied upper-bit truncation in addition and subtraction.
When there is a sequence of computation by addition, subtraction, or multiplication,
unless disallowed in the algorithm, keep full precision until the end of the data flow.
Do truncation at the end of the sequence, which facilitates the most operator
merging and usually leads to the best QoS.
Example 4-28 shows a scenario where implied upper-bit truncation does not affect operator
merging.
Example 4-28 Operator Merging is Allowed if Truncation Does Not Affect Final
Outcome
Verilog
wire [15:0] a, b, c, d; // operators merged
wire [15:0] p, q;
wire [15:0] y;
assign p = a + b; // implied upper-bit truncation
assign q = c + d; // implied upper-bit truncation
assign y = p + q; // implied upper-bit truncation
VHDL
signal a, b, c, d : unsigned (15 downto 0); -- operators merged
signal p, q : unsigned (15 downto 0);
signal y : unsigned (15 downto 0);
However, since the final output y requires a precision of only 16 bits, the intermediate implied
truncations in generating p and q do not cause any loss of information. Therefore, the three
additions are mergeable in spite of implied upper-bit truncation.
Example 4-29 carries full precision everywhere, allowing the three adders to be merged
without introducing any mathematical error.
Verilog
wire [15:0] a, b, c, d; // operators merged
wire [16:0] p, q;
wire [17:0] y;
assign p = a + b; // full precision
assign q = c + d; // full precision
assign y = p + q; // full precision
VHDL
signal a, b, c, d : unsigned (15 downto 0); -- operators merged
signal p, q : unsigned (16 downto 0);
signal y : unsigned (17 downto 0);
Note: Adding two 16-bit numbers with full precision leads to a 17-bit sum. Similarly, adding
two 17-bit numbers leads to a 18-bit sum.
Example 4-30 contains both implied upper-bit truncation and full precision. The calculation of
p and q throws away the carry out. The calculation of y accommodates the carry-out. If the
three adders are merged, then the calculation of p and q are treated as full precision without
throwing away the carry out. This would make the merged operator mathematically different
from the original design. This is a case where the operators cannot be merged.
Example 4-30 Mixture of Implied Upper-Bit Truncation and Full Precision Arithmetic
May Hurt Operator Merging
Verilog
wire [15:0] a, b, c, d; // operators not merged
wire [15:0] p, q;
wire [17:0] y;
assign p = a + b; // implied upper-bit truncation
assign q = c + d; // implied upper-bit truncation
assign y = p + q; // full precision
VHDL
signal a, b, c, d : unsigned (15 downto 0); -- operators not merged
signal p, q : unsigned (15 downto 0);
signal y : unsigned (17 downto 0);
Example 4-31 shows another scenario where it is safe to merge the three additions as one
operator.
Verilog
wire [15:0] a, b, c, d; // merged as one cluster
wire [16:0] p, q;
wire [15:0] y;
assign p = a + b; // full precision, no truncation
assign q = c + d; // full precision, no truncation
assign y = p + q; // explicit upper-bit truncation
VHDL
signal a, b, c, d : unsigned (15 downto 0); -- merged as one cluster
signal p, q : unsigned (16 downto 0);
signal y_extra : unsigned (16 downto 0);
signal y : unsigned (15 downto 0);
p <= resize(a,17) + resize(b,17); -- full precision, no truncation
q <= resize(c,17) + resize(d,17); -- full precision, no truncation
y_extra <= p + q;
y <= y_extra (15 downto 0); -- explicit upper-bit truncation
Tip
Guideline: Do not use self-determined expressions. Make widths of arithmetic
expressions unambiguous using intermediate signals and additional assignments.
However, when the RTL code falls into the self-determined bit-width rules defined in the
Verilog LRM (IEEE Std 1364-1995), the width of y is as shown in Table 4-2. This can have a
negative impact on overall QoS.
Example 4-32 is a design that relies on the self-determined bit-width rule for the two adders
in the comparison. According to the LRM rule, Example 4-32 is equivalent to Example 4-33.
The three operators (two adders and one comparator) are not merged.
Many times, the needed functionality can be implemented by using either the coding style
shown in Example 4-33 or Example 4-34. However, the coding style shown in Example 4-34
leads to better QoS because it is more inclined to facilitate operator merging.
Example 4-35 on page 103 is a design that relies on the self-determined bit-width rule for the
two multipliers in the comparison. According to the LRM rule, Example 4-35 is equivalent to
Example 4-36 on page 103. The three operators (two multipliers and one comparator) are not
merged.
Many times, the needed functionality can be implemented by using either Example 4-36 or
Example 4-37. However, Example 4-37 on page 103 is more inclined to facilitate operator
merging and will lead to a better QoS.
Note: Be aware of the LRM self-determined bit-width rules. Avoid the self-determined rules
by explicitly declaring width of intermediary signals. As long as functionality allows, facilitate
as much operator merging as possible.
Tip
Guideline: Do not instantiate arithmetic DesignWare components because
instantiated components prevent datapath optimization and have a negative effect
on QoS, as shown in Example 4-38. Instead, write arithmetic expressions in the
RTL, as shown in Example 4-39.
Verilog
module gd11_1 (a, b, c, d0, d1, z0, z1); //instantiation of DW component
input [7:0] a, b, c;
input [15:0] d0, d1;
output [15:0] z0, z1;
VHDL
library ieee, dw01, dw02;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
use dw01.dw01_components.all;
use dw02.dw02_components.all;
Verilog
module gd11_2 (a, b, c, d0, d1, z0, z1); //without instantiation of DW component
input [7:0] a, b, c;
input [15:0] d0, d1;
output [15:0] z0, z1;
assign z0 = a * b + d0;
assign z1 = a * c + d1;
endmodule
VHDL
library ieee;
use ieee.numeric_std.all;
entity gd11_2 is -- without instantiation of DW component
port ( a, b, c : in unsigned (7 downto 0);
d0, d1 : in unsigned (15 downto 0);
z0, z1 : out unsigned (15 downto 0) );
end gd11_2;
architecture rtl of gd11_2 is
begin
-- single datapath with:
-- automatic sum-of-products
-- implicit usage of carry-save internally
z0 <= a * b + d0;
z1 <= a * c + d1;
end rtl;
Tip
Guideline: Complement operands arithmetically using the - operator, as shown in
Example 4-40.
These operands can be extracted easily from a larger datapath and has a positive impact on
QoS.
Do not manually negate operands because this prevents Sum-of-Products (SoP) extraction
and has a negative impact on QoS, as shown in Example 4-41.
Verilog
module gd13_2 (a, b, c, sign, z); //good QoS
VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity gd13_2 is -- good QoS
port ( a, b : in signed (15 downto 0);
c : in signed (31 downto 0);
sign : in std_logic;
z : out signed (31 downto 0) );
end gd13_2;
architecture rtl of gd13_2 is
signal a_int : signed (16 downto 0);
signal z_extra : signed (32 downto 0);
begin
-- complement multiplier instead of product
a_int <= -resize(a,17) when (sign = 1) else resize(a,17);
z_extra <= a_int * b + c;
z <= z_extra (31 downto 0);
end rtl;
Verilog
module gd13_1 (a, b, c, sign, z); //poor QoS
VHDL
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity gd13_1 is -- poor QoS
port ( a, b : in signed (15 downto 0);
c : in signed (31 downto 0);
sign : in std_logic;
z : out signed (31 downto 0) );
end gd13_1;
architecture rtl of gd13_1 is
signal p, px : signed (31 downto 0);
constant CONSTVAL1 : signed (1 downto 0) := "01";
constant CONSTVAL2 : signed (1 downto 0) := "00";
begin
p <= a * b;
px <= (not p) when (sign = 1) else p;
z <= px + (0 & sign) + c;
end rtl;
Index
A applying 85
coding styles 80
applying carrysave arithmetic 85 column number 59
architecture Commands
no_value 57 report datapath 50
architectures synthesize 58
selection 44 synthesize -to_generic 46
context-driven 44 synthesize -to_generic -effort high 27
dynamic generation 45 synthesize -to_mapped 51
sub-architecture 47 comparator 36
target library-based 44 complementing
timing-driven 45 operands 107
area 59 components
arithmetic instantiated 104
operators 81 concatenation 90
signed 88 constant multiplication 86
Attributes context-driven architecture selection 44
dp_csa 34
dp_postmap_downsize 48
dp_sharing 27 D
dp_speculation_ 30
hdl_track_filename_row_col 53, 59 datapath
speed_grade 46 coding styles 80
sub_arch 47 functions 64
user_speed_grade 46 modules 84
user_sub_arch 47 report
architecture 58
area 59
B file name, line and column
number 59
bit width growth operator type and signed type 58
addition 101 width of input and output
multiplication 101 operands 58
subtraction 101 datapath functions
bit width rules 101 $abs 66
booth and non-booth 47 $blend 67
$carrysave 69
$intround 71
C $lead0 72
$lead1 73
Carry-Save Adder (CSA) 31 $rotatel 74
transformations 31 $rotater 75
non-inferred, gate-level netlist 41 $round 76
non-inferred, instantiated 40 DesignWare components
preventing 40 instantiated 104
Carry-save Arithmetic (CSA)
E M
-effort_high manual
enabling CSA transformations 31 signal extension 95
enabling sharing transformations 27, mixing expressions 94
30 modules 84
elaboration multiple-fanout 37
design stage 61
N
F
no_value architecture 57
functions
datapath primitives 64
parameter 77 O
operand
G complementing 107
input and output 58
gate-level netlist operators
importing 80 HDL 24
square 97
supported 24
H
HDL P
supported operators 24
hierarchy parameter functions 77
inferring operators 81 part-select 90
pragma
sub_arch 47
I primitive functionality 65
Product-of-Sum (PoS)
importing the gate-level netlist 80 scenarios 36
inferring
a square 97
constant multiplier 86 Q
datapath modules 84
instantiated components 104 quality of results
improving 48
K
R
keeping operators in the same
hierarchy 81 report datapath
elaboration design stage 61
synthesize -to_generic design stage 61
L synthesize -to_mapped design
stage 61
line number 59 RTL
start at 80
rules
U
self-determined bit width 101 unsigned and signed datapath
mixing 94
upper-bit truncation 98
S
scenarios
CSA transformations 34
V
comparator 36 Verilog
CSA over a multiplexer 38 preprocessor (VPP)
CSA over an invertor 38 directives 109
multiple-fanout 37 VHDL
product-of-sum (CSA) 36 signed and unsigned types 96
truncated 39 using numeric packages 96
self-determined bit width 101
sharing 25
transformations 27
signal extension 95
W
signed width
arithmetic 88 arithmetic expressions 101
part-select 90 input and output operands 58
speculation 28
transformations 30
speed grades
very_slow and slow 46
square
inferring 97
sub-architecture selection 46
sub-architectures
booth and non_booth 46
supported
HDL operators 24
synthesize -to_mapped 58
T
target library-based architecture
selection 44
timing-driven
architecture selection 45
transformations
CSA 31
sharing 27, 30
truncation
upper-bit 98