Power and Timing Op Miza On Using Mul Bit Flip-Flop: 0. Revision History
Power and Timing Op Miza On Using Mul Bit Flip-Flop: 0. Revision History
Sheng-Wei Yang, Tzu-Hsuan Chen, Jhih-Wei Hsu, Ting Wei Li, Cindy Shen
Synopsys, Inc.
0. Revision History
2024-02-14 - Second Revision. Added the defini on of ‘displacement’ and QpinDelay with
more pictures. Also clarify the goal.
1. Introduc on
In modern technology nodes, the power and area minimiza on has become one of the most studied
topics in semiconductor industry. One commonly applied concept is to replace single-bit flip-flops with
mul bit flip-flops. By using one mul bit flip-flop to replace mul ple single-bit flip-flops, more area can
be freed up in the design as one mul bit flip-flop takes less area to place than the single-bit flip-flops it
replaces. Furthermore, before changing single-bit flip-flops to a mul bit-flip-flop, each single-bit flip-flop
has its own power, ground, and clock pin connec ons. Replacing them with one mul bit-flip-flop can
also efficiently reduce power, ground, and clock net rou ng complexity. Therefore, in modern technology
nodes, this technique has been widely applied and referred to as “mul bit flip-flop banking”.
However, with the advance in technology nodes, ming, power, and area op miza on has become a
much more convoluted problem. For some ming cri cal nets, the original idea of mul bit-flip-flop
banking a empts would possibly result in worsening the ming, which hampers the overall op miza on
objec ve. Therefore, some mes we have to divide a mul bit flip-flop into several single-bit flip-flops to
further op mize ming cri cal nets. This technique is o en mes referred to as “mul bit flip-flop
debanking”.
In this contest problem we will simulate the mul bit flip-flop banking and debanking decisions in some
virtual designs as testcases so that contestants would need to take ming, power, and area objec ves
together to find the best possible op miza on solu ons for each testcase.
Figure 1(a): A simple transistor layout of two single-bit flip-flops and one multibit flip-flop. (left)
Figure 1(b): A diagram illustrating how dynamic power is reduced due to the reduction of on-chip clock
tree length. (right)
Figure 2: A simplified diagram of how two single-bit flip-flops can be banked into one two-bit flip-flop,
and debanked vice versa. In this diagram, the upper le D pin of the single-bit flip-flop is mapped to the
D0 pin of the two-bit flip-flop; the lower le D pin of the single-bit flip-flop is mapped to the D1 pin of
the two-bit flip-flop.
2. Contest Objec ve
The contestants need to develop a banking & debanking algorithm that can take any given testcase and
produce a placement result that sa sfies cell density constraints and with the resultant ming, power,
and area op mized. No cell overlapping is allowed in the placement result. A cost metric will be given for
each testcase to iden fy the weight for each ming, power, and area calcula on. The cost metrics of
calcula ng ming, power, and area for this contest is as follows:
Where i stands for every flip-flop instance in the design. TNS(i) stands for the total nega ve slack of the
flip-flop. Power(i) stands for the power consump on of the flip-flop. Area(i) stands for the area cost of
the flip-flop. The design is divided into several bins, and we will define a u liza on rate threshold for
each bin in the design. D stands for the number of bins that violates the u liza on rate threshold. 𝛼, 𝛽,
𝛾, and 𝛿 are weight for each cost. The weighted summa on of the four cost metrics above represents
quality of result used in this contest.
Figure 3 shows an example of the density constraint of a placement. A whole placement region is divided
into several uniform bins. For each bin, we set up a density threshold represen ng the upper bound of
the cell area allowed to be placed within the bin. Should the total cell area exceeds the threshold, we
consider this bin violates the density constraints, hence accrue the penalty score by density weight 𝛿. In
the contest, the same threshold is applied to every uniform bin, which is defined in the BinMaxU l of the
input file.
Figure 3: A representa on of cell density constraint. The whole placement region is divided into several
uniform bins, and a threshold is applied to every bin. Should the total cell area within the bin exceeds
the threshold, density constraint penalty score is accrued.
Syntax
Alpha <alphaValue>
Beta <betaValue>
Gamma <gammaValue>
Delta <deltaValue>
Example
Alpha 1
Beta 5
Gamma 5
Delta 10
Die size and input output ports: DieSize describes the dimension of the die, namely the placement area
of the design. NumInput describes the number of input pins of the die. Each “Input” syntax describes the
input pin name and its loca on. NumOutput describes the number of output pins of the die. Each
“Output” syntax describes the output pin name and its loca on. Contestants cannot extend the cell size
(or region) to solve the density viola on or Flip-Flop overlap issues. The goal is to op mize the mul ple
objec ves: ming and power; without increasing the cell’s region or area (die size).
Syntax
NumInput <inputCount>
NumOutput <outputCount>
Example
NumInput 2
Input INPUT0 0 25
Input INPUT1 0 5
NumOutput 2
Cell library:
Syntax
Example
FlipFlop 1 FF1 5 10 2
Pin D 0 8
Pin Q 5 8
FlipFlop 2 FF2 8 10 4
Pin D0 0 9
Pin D1 0 6
Pin Q0 8 9
Pin Q1 8 6
FlipFlop 2 FF2A 10 10 4
Pin D0 0 9
Pin D1 0 6
Pin Q0 8 9
Pin Q1 8 6
Placement result: The x and y coordinate describes the bo om-le corner of the cell:
Syntax
NumInstances <instanceCount>
Example:
NumInstances 2
Inst C1 FF1 50 20
Inst C2 FF1 50 0
Netlist:
Syntax
NumNets <netCount>
Pin <instName>/<libPinName>
Example
NumNets 4
Net N1 2
Pin INPUT0
Pin C1/D
Net N2 2
Pin INPUT1
Pin C2/D
Net N3 2
Pin C1/Q
Pin OUTPUT0
Net N4 2
Pin C2/Q
Pin OUTPUT1
Syntax
BinWidth <width>
BinHeight <height>
Example
BinWidth 100
BinHeight 100
BinMaxU l 75
∑( )
The formula on of u liza on ra o of a Bin =
Placement rows. The given rows would start from (0, 0) and cover the en re die:
Syntax
Example
PlacementRows 0 0 2 10
Timing slack and delay informa on. For each instance pin in the design we will give out a ming slack
informa on. The delay model is formulated by displacement delay and Q-pin delay. The defini on of
displacement is the Manha an distance between the original pin loca on and the new pin loca on. For
any cell displacement we mes the coefficient with the displacement distance to get the displacement
delay. For every flip-flop gate defined in the library we define a Q-pin delay for it.
Syntax
DisplacementDelay <coefficient>
Example
DisplacementDelay 0.01
QpinDelay FF1 1
QpinDelay FF2 3
QpinDelay FF2A 2
TimingSlack C1 D 1
TimingSlack C1 Q 0
TimingSlack C2 D 1
TimingSlack C2 Q 0
Power consump on informa on: for every cell gate there is a power consump on rate.
Syntax
Example
GatePower FF1 10
GatePower FF2 17
GatePower FF2A 18
Syntax
CellInst <InstCount>
NumNets <netCount>
Pin <instName>/<libPinName>
Example
CellInst 1
Inst C1 FF2 48 10
NumNets 4
Net N1 2
Pin INPUT0
Pin C1/D0
Net N2 2
Pin INPUT1
Pin C1/D1
Net N3 2
Pin C1/Q0
Pin OUTPUT0
Net N4 2
Pin C1/Q1
Pin OUTPUT1
4. Example
We take the following circuit as a sample input:
1-bit FF 2-bit FF
Normalized Area 1 1.6
Normalized Power 1 1.7
Normalized Q-Pin Delay 1 2
Alpha 1
Beta 5
Gamma 5
Delta 1
DieSize 0 0 50 30
NumInput 2
Input INPUT0 0 25
Input INPUT1 0 5
NumOutput 2
Output OUTPUT0 50 25
Output OUTPUT1 50 5
FlipFlop 1 FF1 5 10 2
Pin D 0 8
Pin Q 5 8
FlipFlop 2 FF2 8 10 4
Pin D0 0 9
Pin D1 0 6
Pin Q0 8 9
Pin Q1 8 6
NumInstances 2
Inst C1 FF1 15 20
Inst C2 FF1 15 0
NumNets 4
Net N1 2
Pin INPUT0
Pin C1/D
Net N2 2
Pin INPUT1
Pin C2/D
Net N3 2
Pin C1/Q
Pin OUTPUT0
Net N4 2
Pin C2/Q
Pin OUTPUT1
BinWidth 10
BinHeight 10
BinMaxU l 79
PlacementRows 0 0 2 10
DisplacementDelay 0.01
QpinDelay FF1 1
QpinDelay FF2 2
TimingSlack C1 D 1
TimingSlack C1 Q 0
TimingSlack C2 D 1
TimingSlack C2 Q 0
GatePower FF1 10
GatePower FF2 17
CellInst 1
Inst C1 FF2 8 10
NumNets 4
Net N1 2
Pin INPUT0
Pin C1/D0
Net N2 2
Pin INPUT1
Pin C1/D1
Net N3 2
Pin C1/Q0
Pin OUTPUT0
Net N4 2
Pin C1/Q1
Pin OUTPUT1
5. Evalua on
The expected output should sa sfy the following constraints:
Ini al score is calculated by taking weighted ming, power, area, and u liza on rate constraints into
considera on. The cost metrics is as follows:
If the program and the output data violate any of these above bullets, you will get 0 score for the
corresponding test case.
The number of CPU cores available for your program is 8 cores in the evalua on:
𝑒𝑙𝑎𝑝𝑠𝑒 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑡𝑒𝑠𝑡 𝑏𝑖𝑛𝑎𝑟𝑦
𝑟𝑢𝑛𝑡𝑖𝑚𝑒 𝑓𝑎𝑐𝑡𝑜𝑟 = 0.02 × log ( )
𝑚𝑒𝑑𝑖𝑎𝑛 𝑒𝑙𝑎𝑝𝑠𝑒 𝑡𝑖𝑚𝑒
The rule for binary name will be given out in the contest website. Please note that contestants should
follow the naming descrip on otherwise the score will be annulled.
Contestants will have access to the machine provided by ICCAD. Contestants are required to upload their
binary along with every modules required to run their binary. Contestants should note that if the public
account could not run the contestant’s binary due to any reason, the score would result in failure.
5. References
[1] Ya-Chu Chang, Tung-Wei Lin, Iris Hui-Ru Jiang, and Gi-Joon Nam. 2019. Graceful Register Clustering by
Effec ve Mean Shi Algorithm for Power and Timing Balancing. In Proceedings of the 2019 Interna onal
Symposium on Physical Design (ISPD '19).
[2] Meng-Yun Liu, Yu-Cheng Lai, Wai-Kei Mak, and Ting-Chi Wang. 2022. Genera on of Mixed-Driving
Mul -Bit Flip-Flops for Power Op miza on. In Proceedings of the 41st IEEE/ACM Interna onal
Conference on Computer-Aided Design (ICCAD '22).
[3] Gang Wu, Yue Xu, Dean Wu, Manoj Ragupathy, Yu-yen Mo, and Chris Chu. 2016. Flip-flop clustering
by weighted K-means algorithm. In Proceedings of the 53rd Annual Design Automa on Conference (DAC
'16)