Generic Logic Synthesis Meets RTL Synthesis: Heinz Riener Mathias Soeken Eleonora Testa Giovanni de Micheli
Generic Logic Synthesis Meets RTL Synthesis: Heinz Riener Mathias Soeken Eleonora Testa Giovanni de Micheli
Abstract—We present an integration of Generic Logic Syn- II. L OGIC SYNTHESIS INTEGRATED IN RTL S YNTHESIS
thesis, a recent methodology for developing logic synthesis algo-
rithms that are independent of a specific technology-independent A. Synthesis and verification flow
logic representation, in an RTL synthesis flow. This integration
allows us to choose different multi-level logic representations In this section, we illustrate the overall RTL synthesis flow
during the synthesis process and judge the impact of this by means of a running example. In the example, we synthesize
choice on the overall synthesis result. We propose a prototypical a multiplier module provided in the hardware description
implementation that combines the open-source RTL synthesis language Verilog.
framework Yosys with the EPFL logic synthesis library mocktur- 1 module top(input clk, input[7:0] a,b, output reg[15:0] c);
tle. In an experimental evaluation, we show the synthesis results 2 always @(posedge clk) c <= a * b;
for different arithmetic and cryptographic benchmarks from 3 endmodule // top
OpenCores.org and for a hand-crafted modular multiplier using
four different multi-level logic representations in technology- Synthesis. As synthesis suite, we use the open-source
independent logic optimization. synthesis framework Yosys [4]. Yosys comes with a shell
interface which allows sessions like the following:
I. I NTRODUCTION yosys> read_verilog file.v
yosys> prep
yosys> techmap
Generic Logic Synthesis [1] is a recent methodology in yosys> cirkit -script optimize.cs
technology-independent logic synthesis for developing algo- yosys> flatten
yosys> write_verilog file_optimized.v
rithms that can be generically applied to different (multi-level)
logic representations. As a consequence, this methodology en- In the session above, a conservative RTL synthesis flow
ables a fast realization of complex synthesis flows for tailored (‘prep’) followed by a technology mapping step (‘techmap’)
logic representations such as majority-inverter graphs (MIG) and a flattening step (‘flatten’) are carried out on the Verilog
or XOR-AND graphs (XAG) without the need for replicating file.
and adapting large portions of source code. We have implemented a new command ‘cirkit’ into Yosys
Each logic synthesis algorithm is parameterized with a that enables us to run logic optimization scripts composed
logic representation and generically implemented using a set of optimizing transformations provided by the mockturtle
of naming conventions with agreed semantics defined by a library.1 Each combinational part of the technology-mapped
common network interface API. Logic representations then implementation is extracted and logically optimized using the
have to implement (a subset of) the network interface API. script ‘optimize.cs’ (‘cirkit’). Input and output of the ‘cirkit’
When a generic logic synthesis algorithm is instantiated with command are provided in form of LUT networks. The logi-
a concrete logic representation, static checking ensures at cally optimized combinational parts are finally re-composed,
compile-time that the logic representation implements all flattened into one gate level netlist, and the optimized gate
methods of the network interface API required by the logic level netlist is written into a new Verilog file.
synthesis algorithm. If one (or more) methods of the network Fig. 1 depicts the overall interaction of the RTL synthesis
interface API are missing, or if the interface is not correctly engine with the logic synthesis framework.
implemented, a compile-time error is reported. Otherwise, if Verification. The correctness of the optimizing transforma-
compilation succeeds, a highly-tailored (and optimized) logic tions carried out by the proposed flow can be verified in two
synthesis algorithm for the concrete logic representation at stages:
hand is generated.
1) Combinational equivalence checking (CEC): We prove
A prototypical implementation of Generic Logic Synthe- that LUT networks provided as input and output to
sis [1] has been presented for the scalable peephole synthesis CirKit are functionally equivalent using combinational
framework introduced by Mishchenko and Brayton [2]. The equivalence checking implemented in ABC [5].
implementation is publicly available in the EPFL logic syn- 2) Weak sequential equivalence checking (SEC): We prove
thesis library mockturtle [3]. that the outputs of the initial RTL design and the
In this paper, we demonstrate how the mockturtle library obtained gate level netlist do not diverge for a fixed
can be integrated into an RTL synthesis flow and show the
impact of the choice of a multi-level logic representation in 1 The modified Yosys synthesis suite can be found online:
RTL synthesis after LUT mapping. https://github.com/hriener/yosys/
RT level
RTL
Gate level
are composed according to the LUT structure of the original
synthesis
design
(Yosys) netlist LUT network.
The flexibility lies in the underlying synthesis algorithm. It
is important to note, that for LUT resynthesis the synthesis
LUT LUT algorithm must find a subnetwork for all LUT functions f
network network
in the LUT network. Possible synthesis algorithms are Shan-
non decomposition, database lookup, bi-decomposition, exact
Logic
synthesis [7], or DSD decomposition.
synthesis
(CirKit)
D. Cut rewriting
Cut rewriting is based on the idea to improve a logic
Fig. 1. RTL synthesis with integrated logic optimization network by changing the gate-level structure of a subcircuit.
Subcircuits are enumerated using cut enumeration [8]. For
each cut, several alternative gate-level structures are computed,
number of time steps when fed with same inputs using and the effect of substituting the structure for the current one
sequential equivalence checking implemented in Yosys. is evaluated. Note that not all gates may be removed when
removing the current structure, because they are required by
B. Optimization scripts other gates outside of the cut. Similarly, the new gate-level
CirKit2 is a front-end for the mockturtle library that instanti- structure to be inserted in the logic network may make use of
ates the implemented algorithms and provides a shell interface already existing logic. Analysing the impact and gain in this
for executing sequences of optimizing transformations. In the setting is referred to as DAG-aware rewriting [9].
following, we focus on the subset of the supported commands Cut rewriting is implemented as a generic algorithm in
used to interface with the RTL synthesis suite Yosys. A typical mockturtle, where the synthesis algorithm to compute alter-
CirKit optimization script for this purpose looks as follows: native gate-level structures for a cut is passed as a parameter.
In fact, the same synthesis algorithms that are used in LUT
cirkit> read_blif <TMP_DIR>/input.blif
cirkit> lut_resynthesis --mig resynthesis can be used for cut rewriting, which is another
cirkit> cut_rewrite --mig manifestation of generic logic synthesis in mockturtle.
cirkit> lut_mapping
cirkit> collapse_mapping E. LUT mapping
cirkit> write_blif <TMP_DIR>/output.blif LUT mapping (see, e.g., [10]) addresses the problem of
First, the LUT network is read (‘read_blif’). This LUT resynthesizing a logic network with small gates into a logic
network is composed of LUTs with potentially large number of network with larger gates, where the gate’s size refers to the
fan-ins. The goal of LUT resynthesis (‘lut_resynthesis’) is to number of inputs. It can be considered the dual problem of
decompose these LUTs into primitive gates of a homogeneous LUT resynthesis, in which large gates are resynthesized into
gate library to express the LUT network in form of a multi- smaller gates. A concrete typical instance of LUT mapping,
level logic representation. The additional parameter ‘--mig’ e.g., used in FPGA programming, is to map a gate-level
specifies that the gate library of MIGs should be used, which network (with gates that do not have more than 3 inputs)
consists only of majority-of-three (MAJ-3) gates and inverters. into a logic network that supports gates which can implement
Multi-level logic optimization algorithms, such as Boolean arbitrary 6-input functions.
rewriting [6] (‘cut_rewrite’) are then carried out to optimize Most LUT mapping algorithms are based on cut enumera-
the logic representation with respect to a cost function. Finally, tion. First, cuts are enumerated for all gates in the input logic
the optimized multi-level logic representation is mapped back network. Then some of the gates are determined to be in the
into a LUT network (‘lut_mapping’ and ‘collapse_mapping’). mapped network by selecting one of the gate’s cuts, such that
The obtained LUT network is written into a file (‘write_blif’) the following conditions hold: (i) all outputs must be mapped,
to be read by Yosys. (ii) if a gate is mapped, then also each leave in the gate’s
In practice, the first and last command (‘read_blif’ and selected cut must be either a primary input or mapped as well.
‘write_blif’) of a CirKit optimization script are automatically LUT mapping usually only determines which gates are
generated by Yosys and need not be part of the optimization mapped and which cuts are chosen; it does not create the larger
script. LUT network. We follow this convention in CirKit, where the
command ‘lut_mapping‘ performs the mapping by annotating
C. LUT resynthesis gates in the logic network, and ‘collapse_mapping‘ creates the
LUT mapping based on the annotations.
LUT resynthesis is an algorithm that translates a LUT
network into an arbitrary gate-based network. Each LUT III. E XPERIMENTAL R ESULTS
is synthesized into subnetwork of the targeted gate-based Experimental setup. We have evaluated the quality and
network type, based on the LUT’s truth table. The subnetworks performance of the proposed RTL synthesis framework consid-
ering four different multi-level logic representations: (1) AND-
2 https://github.com/msoeken/cirkit inverter graphs (AIGs), (2) majority-inverter graphs (MIGs),
TABLE I
N UMBER OF LUT S OF INDIVIDUAL MODULES AFTER RTL SYNTHESIS , LOGIC SYNTHESIS , AND LUT MAPPING
(3) XOR-AND graphs (XAGs), and (4) XOR-majority graphs module name (Name), the number of primary inputs (I), the
(XMGs). number of primary outputs (O), the number of LUTs (LUTs)
As benchmarks, we use different arithmetic and crypto- and the number of levels (Levels) after LUT mapping of the
graphic designs obtained from OpenCores.org (sha1, sha256, individual modules. The remaining columns show the number
sha512, md5) and a hand-crafted modular multiplier design of LUTs (LUT) and levels (Levels) after logic synthesis
(modular_mul). The benchmarks are optimized with the flow and LUT mapping for the four different multi-level logic
script described in Section II, where ‘cut_rewrite’ is repeated representations, respectively.
5× and repeated until convergence, respectively. Table II presents synthesis results after flattening the in-
All experiments have been conducted on an Intel® Core™ dividual modules into one monolithic netlist. The first three
i7-7567U CPU with 3.50GHz and 16GB RAM. We use columns from left to right show the name of the top-level
a global time limit of 100 minutes for executing an RTL module (Name), the number of LUTs (LUTs) and the num-
synthesis flow on a benchmark. ber of flipflops (DFFs) of the flattened benchmarks without
Experimental results. Table I presents synthesis results for additional logic optimization. The remaining columns list
individual modules of the RTL design. Each row corresponds the number of LUTs (LUTs) and the total run-time (Time)
to one module of a design. Bold font denotes a top-level required for RTL synthesis with integrated logic synthesis
module. The first five columns from left to right show the for the four multi-level logic representations, respectively. We
mark the best area result in terms of LUTs in green color. the design. The modules are represented as technology-
Discussion. Logic synthesis techniques allow us to reduce mapped LUT networks using Yosys standard gate li-
different cost functions (area, depth, etc.) when integrated brary. Mapping from this standard gate library into the
into an RTL synthesis flow. In this work, we focus on respective logic representation and back leads in some
area reduction for FPGAs. In our experiments, applying a cases to suboptimal results. For instance, the number of
logic synthesis flow reduces the number of LUTs by up to LUTs increases after re-integrating the optimizing MIG
51.23% depending on the intermediate logic representation in of modular_mul into Yosys, such that optimizing the
use. In general, cut rewriting for majority-based intermediate MIG 5× leads to a better result than optimizing the MIG
representations tends to converge faster, whereas allowing up to convergence. Experimenting with different gate
XOR gates in the intermediate representation leads to a better libraries has the potential to result in better synthesis
average reduction of the number of LUT gates. The latter quality or runtime.
effect is likely due to the choice of optimizing arithmetic
V. ACKNOWLEDGMENTS
and cryptographic benchmarks, which both typically contain
many XOR gates. The time limit of 100 minutes is only once This research was supported by the Swiss National Science
reached for the benchmark md5 for which LUT resynthesis Foundation (200021-169084 MAJesty), by the European Re-
into ANDs and inverters does not finish in time. Overall, search Council in the project H2020-ERC-2014-ADG669354
XAGs and XMGs perform best: 51.23% of LUTs are reduced CyberCare, and by the EPFL Open Science Fund.
in 59.82m when using XAGs and 44.30% in 40.67m when R EFERENCES
using XMGs.
[1] H. Riener, E. Testa, W. Haaswijk, A. Mishchenko, L. G. Amarù,
G. De Micheli, and M. Soeken, “Scalable generic logic synthesis:
IV. C ONCLUSION One approach to rule them all,” in Proceedings of the 56th Annual
Design Automation Conference 2019, DAC 2019, Las Vegas, NV,
We have proposed an integration of RTL synthesis with USA, June 02-06, 2019, 2019, pp. 70:1–70:6. [Online]. Available:
Generic Logic Synthesis. This integration allows us to run https://doi.org/10.1145/3316781.3317905
RTL synthesis while choosing one of many multi-level logic [2] A. Mishchenko and R. K. Brayton, “Scalable logic synthesis using a
simple circuit structure,” in International Workshop on Logic Synthesis,
representations in technology-independent logic optimization. 2006, pp. 15–22.
We have shown the synthesis results after LUT mapping [3] M. Soeken, H. Riener, W. Haaswijk, and G. De Micheli, “The EPFL
for FPGAs considering four different RTL designs from logic synthesis libraries,” May 2018, arXiv:1805.05121.
[4] D. Shah, E. Hung, C. Wolf, S. Bazanski, D. Gisselquist, and
OpenCores.org and a hand-crafted modular multiplier con- M. Milanovic, “Yosys+nextpnr: An open source framework from verilog
sidering four different logic representations (AIGs, MIGs, to bitstream for commercial FPGAs,” in 27th IEEE Annual International
XAGs, XMGs). The experimental results indicate that high- Symposium on Field-Programmable Custom Computing Machines,
FCCM 2019, San Diego, CA, USA, April 28 - May 1, 2019, 2019, pp.
effort logic optimization techniques are capable of reducing 1–4. [Online]. Available: https://doi.org/10.1109/FCCM.2019.00010
the number of LUTs of these benchmarks by up to 51.23%. [5] R. K. Brayton and A. Mishchenko, “ABC: an academic industrial-
The results also show that the choice of the multi-level logic strength verification tool,” in Computer Aided Verification, 22nd
International Conference, CAV 2010, Edinburgh, UK, July 15-
representation has an important impact on the compaction of 19, 2010. Proceedings, 2010, pp. 24–40. [Online]. Available:
the logic and the performance of the synthesis process. For https://doi.org/10.1007/978-3-642-14295-6_5
instance, switching from AIGs to XAGs for the hand-crafted [6] H. Riener, W. Haaswijk, A. Mishchenko, G. De Micheli, and
M. Soeken, “On-the-fly and DAG-aware: Rewriting Boolean
modular multiplier leads to a reduction of approximately networks with exact synthesis,” in Design, Automation & Test
100’000 LUTs (27.45%) and almost reduces the total runtime in Europe Conference & Exhibition, DATE 2019, Florence, Italy,
by almost 2× (47.47%). March 25-29, 2019, 2019, pp. 1649–1654. [Online]. Available:
https://doi.org/10.23919/DATE.2019.8715185
The proposed integration of Generic Logic Synthesis and [7] W. Haaswijk, M. Soeken, A. Mishchenko, and G. De Micheli, “SAT-
RTL synthesis enables several promising directions for future based exact synthesis: Encodings, topology families, and parallelism,”
research: IEEE Trans. on Computer-Aided Design, 2019, accepted, in press.
[Online]. Available: https://ieeexplore.ieee.org/document/8634910
1) Mixing logic representations: In the experiments, we [8] J. Cong, C. Wu, and Y. Ding, “Cut ranking and pruning: Enabling
chose the same logic representations for all modules a general and efficient FPGA mapping solution,” in Proceedings
of the 1999 ACM/SIGDA Seventh International Symposium on
in the RTL design. The presented approach, however, Field Programmable Gate Arrays, FPGA 1999, Monterey, CA,
is capable of choosing different logic representations USA, February 21-23, 1999, 1999, pp. 29–35. [Online]. Available:
for each module of an RTL design. Designs such as https://doi.org/10.1145/296399.296425
[9] A. Mishchenko, S. Chatterjee, and R. K. Brayton, “DAG-aware AIG
processors, that integrate arithmetic as well as control rewriting a fresh look at combinational logic synthesis,” in Proceedings
logic, may substantially benefit from a more fine-tuned of the 43rd Design Automation Conference, DAC 2006, San Francisco,
selection of logic representations. An interesting chal- CA, USA, July 24-28, 2006, 2006, pp. 532–535. [Online]. Available:
https://doi.org/10.1145/1146909.1147048
lenge for future work is to automatically decide which [10] J. Cong and Y. Ding, “FPGA technology mapping,” in
logic representations to chose for a module at hand. Encyclopedia of Algorithms, 2016, pp. 773–777. [Online]. Available:
2) Mapping into technology libraries: The proposed syn- https://doi.org/10.1007/978-1-4939-2864-4_148
thesis flow interacts with logic synthesis on the granular-
ity of modules. Each combinational module is extracted
from the RTL design, optimized, and re-integrated into