Techt Map
Techt Map
Robert J. Francis
40
O-8186-3010-8/92 $03.00 0 1992 IEEE
abc def gh i j ab
, , f de df
” ” x Y
”
w
x Y
Y
z
a) Boolean Network
a) Boolean Network
g” i j
abc
de f
‘
”
Figure 2: Network a.nd Circuit on the longest path. To illustrate the issues that dif-
ferentiate LUT synthesis from conventional logic syn-
thesis, this tutorial will focus on the minimization of
the total number of LUTs in the final circuit. The fol-
technology. lowing section discusses the limitations of conventional
The modifications performed by logic optimization library-based synthesis when applied to LUT circuits,
typically include redundancy removal and common and Section 3 discusses approaches to logic synthesis
sub-expression elimination. The intention is to im- that deal specifica.lly with LUT circuits.
prove the final circuit by simplifying the underlying
network. For example, consider the network shown
in Figure 3a. The common sub-expression e + f can 2 Library-Based Synthesis
be factored out of the functions x and g leading to the
simplified network covered by the circuit shown in Fig-
ure 3b. Conventional techniques for logic optimization Standard Cells and Mask Programmed Gate Ar-
can be effective for LUT circuits particularly at a level rays both implement. combinational functions using a
of granularity where factors ha.ve more t,han AVinputs. limited set of simple gates. The most successful ap-
These techniques have been summarized in [2] and will proach t,o synthesis for these ASIC technologies has
not be discussed here. used library-based technology ma.pping [3]. This ap-
Technology mapping selects sub-networks of the op- proach traverses the network from the primary inputs
timized network to be implemented by the available to the primary outputs, and at each node the local
circuit elements. In the case of LUT-based FPGAs, structure of the network is m.adched against a library
any sub-network wit.11 at most K inputs can be im- of patterns representing the set of available gates. For
plemented by a K-input LUT. The final circuit must each successful match, the cost of the circuit using that
include a LUT implementing each of the primary out- gate is calculated from the cost of the gate, and the cost
puts and all of the LUT input,s t,hat a.re not primary of t,he previously constructed circuits implementing the
inputs. input,s to the gate. The minimum cost circuit among
The optimization goa. for the synthesis of LUT cir- all the matches is then retained.
cuits is typically the minimiza.tion of the t.ot.al munber The major obstacle to applying library-based tech-
of LUTs, t,he number of ferlels of LUTs, or bot,h. Min- nology mapping to LUT circuits is the large number
imizing the number of LUTs in the circuit increases of different functions that a K-input LUT can imple-
the size of designs that can fit into the fixed number ment. The function implemented by a K-input LUT is
of LUTs availa.ble in a given FPGA. The minimiza- determined by the values stored in its 2K memory bits.
tion of the number of levels of LUTs can improve the Since each bit ca.n independently be either 0 or 1, there
performance of the circuit by reducing the number of are 2-7K different Boolean functions of li’ variables. For
logic block delays and programmable routing delays values of li’ greater than 3 the library required to rep-
41
without with with fanout nodes in the original network.
I< permutations permut,ations permutations
and inversions and inversions
2 1G 12 4 3.1 Decomposition of Infeasible Nodes
3 25G 80 14
4 65536 3984 232 The general stra.tegy for the decomposition of infea-
sible nodes is to decompose each infeasible node into
Table 1: Number of Patt,erns for a K-Input LUT sub-functi’ons that use fewer inputs than the original
infeasible node. Any sub-function that uses no more
t,han K inputs is feasible and is decomposed no further.
resent a K-input LUT becomes impractically large. Any sub-function that has more tha.t Ii’ inputs is recur-
The size of the libra.ry can be reduced by noting sively decomposed. Eventually the original infeasible
that some pa,tterns are equivalent after a. permutation node is decomposed into a set of feasible nodes. Four
of inputs [4]. The inversion of outputs or inputs, which methods tl1a.t have been proposed for the decompo-
is trivially accomplished with a LUT, ca.n also produce sition of infeasible nodes are; disjoint decomposition,
equivalent ‘patterns. Table 1 lists the number of differ- algebraic factorization, AND-OR decomposition, and
ent patterns, with and without permutations and in- Shannon cofactoring.
versions, for IC = 2, 3, and 4. To match a. sub-network
against a pattern in the reduced library it may be nec- 3.1.1 Disjoid Decomposition
essary to permute or invert the sub-network. Ha.shing
functions have been proposed to simplify the matching A disjoint deconlposition is based on a pa.rtition of the
of permuted patterns [5]! but the increased complexity inputs to the infeasible node into two disjoint sets re-
of pattern matching limits the benefits of the reduced ferred to as the bound sef and the free set. One or more
library. functions of the bound set are extracted from the in-
Another alternative is to use a partial library tuned feasible node, and the infeasible node is replaced by a
to take advantage of the network structure likely to be function of the outputs of the extracted functions and
produced by technology independent logic optimiza- the inputs in the free set. The attraction of a disjoint
tion [6]. The limitation of this approach is that it pre- decomposition is that the number of inputs in the each
cludes some opportunities for optimization of the final of the t.wo sets must be less than the number of inputs
circuit. The following section discusses approaches to t,o the infeasible node.
LUT synthesis that exploit the full functiona.lity of a Disjoint decompositions can be found by searching
K-input LUT to obtain improved result,s. through all possible pa.rtitions of t,he inputs to the in-
feasible node, alld using well known methods such as
residues [19], t.o determine if each ea.& partition leads
3 LUT-Specific Synthesis t,o a disjoint decomposition. A residue function is ob-
t.ained by repla.cing the inputs in the free set with con-
There has been a great, deal of recent, work on lo .ic sta.nt values. If the set of all possible residue functions
synthesis that deals specifically with LUT circuits. BG , for a given partition consists of the constants 0 or 1,
71 PI P WI, Pll, 1121,
[’181. The
[13], P41, P51, 161, P7 ,
‘G ey to all of these approaches is t Ile ability
1 or a single function h of the bound variables, or its in-
verse SE,then the partition is a disjoint decomposition,
of a K-input LUT to implement ~11functions of Ii with one extracted function. For example, consider the
variables. This complelen.ess simplifies the ma.tching of 4-input function f = ab + c&d + 7ib? + Tibz shown in
a sub-network t.o a LUT. To determine if a sub-network Figure 4a, and t,he pa,rtition of its inputs into the free
matches a. K-input, LUT it is not, necessary t,o matc!l set. (0, b} a.nd the bound set {c? d}. The set of residue
the sub-network a.gainst, a. library of sepamte palterns, functions for this partition, shown in Figure 4b, con-
as described in the preceding sect,ion. It is sufficient sists of the constants 0 and 1, a.nd the function cd and
to count the number of inputs t’o the sub-network, and
verify that the number of inputs does not exceed the its inverse (cd). Therefore, this partition leads to the
constraint K. disjoint, decomposition of the function f, shown in Fig-
Technology mapping optimizes the final circuit by ure 4c.
selecting which sub-networks are covered by LUTs. If The number of partitions grows exponentially with
the original network includes nodes with more than K number of inputs to the infeasible node, and the search
inputs, referred to as infeasible nodes., it may not be for disjoint decompositions can become prohibitively
possible to find a circuit of LUTs covermg the network. expensive if the infeasible node has a large number of
In many mapping algorithms, t.o ensure that a circuit inputs.
covering the network exists, each infeasible node is de-
composed into a set of feasible nodes, each wit,h at most 3.1.2 Algebraic Decomposition
I< inputs. In addition, the decomposition of bot,h fea-
sible and infeasible nodes present,s an opportunity to Algebmic factoriza.tion techniques developed for tech-
optimize the final circuit. nology independent logic opt,imization can also be used
The next section discusses the decomposition of in- for the decomposition of infeasible nodes [7]. For ex-
feasible nodes, Section 3.2 discusses how decomposition ample, she function CC= UC+ bc + bd + ce ca.n be alge-
and covering can be combined to improve the final cir- bra.ica.lly factored into the fa.ctor y = a + b + e, and the
cuit, and Section 3.3 describes how covering can exploit remainder ;c = cv + bd. Since the va.riable b is nsed by
42
ab abed abc abd
cd b hc,d
0 01 0
cd
‘r
i . . . . . . . . . . q.“-“..-:
i
43
4 Original 4-input, OR node, 4 LUTs a) Fanin LUTs
r ...... ...
1
i
IV 1
: ........ . -...........
44
FFD algorithm begins with an empty list of bins. The
boxes are sorted by size and then each box, beginning
with the largest, is packed into the first bin in ‘the list
into which it fits. If the box does not fit int.o any bin
then it is packed into a new bin added to t,he end of
the list. In Figure 7b the FFD algorit.hm has packed
the fanin LUTs from Figure 7a into LUTs having filled
capacities of 5, 4, and 2. Note that packing boxes into
bins implies decomposition of the node being mapped.
45
Y 2
Y z
46
grammable Gate Arrays,” Proc. 27th DAC, June 1990, pp.
613-619.
w
[71 Ft. Murgai, Y, Nishizaki, N. Shenay, R.. K. Brayton,
A. Sangiovanni-Vincent&, “Logic Synthesis for Pro-
x Y grammable Gate Arrays,” Proc. 27th DAC, Jtme 1990, pp.
620-625.
I..4+
.................. .............I ....I
1161 K. C. Chen, “Logic Minimization of Lookup-Table Based
FPGAs,” 1st Intl Workshop on FPGAs, Feb. 1992, pp. 71-
.? 76,
c) Edges (zz), (yz) invisible, 2 LUTs 1171 P. Sawkar, D. Thomas “Area and Delay Mapping for Table-
Look-Up Based Field Progr- able Gate Arrays,” Proc,
29th DAC, June 1992, pp. 368-373.
Figure 11: Covering Using Edge Visibility
WI J. Cong, T. Ding, A. Kahng, P. Trajmar “Graph Based
FPGA Technology Mapping for Delay Optimization,”
Proc. ICCD, Oct. 1992.
References
WI E. J. McClusky, Logic Design Principles, Prentice Hall,
1986.
PI S. D. Brown, R. J. Francis, J. Rose, Z. G. Vranesic, Field-
Programmable gate Arrays, Kluwer Acedemic Publish-
ers, 1992. PO1C. E. Shannon, “The Synthesis of Two-Terminal Switching
Circuits,” Bell Syst. Tech. Journal, Vol. 28, 1949, pp. 59-
98.
PI R. K. Brayton, G. D. Hachtel, A. Sangiovanni-Vincentelli,
“Multilevel Logic Symhesis,” Proc. of IEEE, Vol. 78, No.
2, Feb. 1990, pp. 264-300. 1211 M. R. Gamy, D. S. Johnson, Computers and In-
tractability, A Guide to the Theory of NP-
Completeness, W. H. Freeman and Co., 1979.
[31 K. Keutzer, “DAGON: Technology Binding and Local Op-
timization by DAG Matching,” Proc. 24th DAC, June
1987, pp. 341-347. WI R. J. Francis, Technology Mapping for Lookup Table-
Based FPGAs, Ph.D. Thesis in preparation., University
of Toronto, Department of Electrical Engineermg.
141 S. Trimberger, “A Small Complete Mapping Library for
Lookup-Table-Based FPGAs,” 2nd Intl. Workshop on
Field-Programmable Logic and Applications, Aug. 1992.