0% found this document useful (0 votes)
62 views13 pages

Power and Timing Op Miza On Using Mul Bit Flip-Flop: 0. Revision History

Uploaded by

叶雨阳
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views13 pages

Power and Timing Op Miza On Using Mul Bit Flip-Flop: 0. Revision History

Uploaded by

叶雨阳
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Power and Timing Op miza on Using Mul bit Flip-Flop

Sheng-Wei Yang, Tzu-Hsuan Chen, Jhih-Wei Hsu, Ting Wei Li, Cindy Shen

Synopsys, Inc.

0. Revision History
2024-02-14 - Second Revision. Added the defini on of ‘displacement’ and QpinDelay with
more pictures. Also clarify the goal.

2024-02-01 - First Major Revision

1. Introduc on
In modern technology nodes, the power and area minimiza on has become one of the most studied
topics in semiconductor industry. One commonly applied concept is to replace single-bit flip-flops with
mul bit flip-flops. By using one mul bit flip-flop to replace mul ple single-bit flip-flops, more area can
be freed up in the design as one mul bit flip-flop takes less area to place than the single-bit flip-flops it
replaces. Furthermore, before changing single-bit flip-flops to a mul bit-flip-flop, each single-bit flip-flop
has its own power, ground, and clock pin connec ons. Replacing them with one mul bit-flip-flop can
also efficiently reduce power, ground, and clock net rou ng complexity. Therefore, in modern technology
nodes, this technique has been widely applied and referred to as “mul bit flip-flop banking”.

However, with the advance in technology nodes, ming, power, and area op miza on has become a
much more convoluted problem. For some ming cri cal nets, the original idea of mul bit-flip-flop
banking a empts would possibly result in worsening the ming, which hampers the overall op miza on
objec ve. Therefore, some mes we have to divide a mul bit flip-flop into several single-bit flip-flops to
further op mize ming cri cal nets. This technique is o en mes referred to as “mul bit flip-flop
debanking”.

In this contest problem we will simulate the mul bit flip-flop banking and debanking decisions in some
virtual designs as testcases so that contestants would need to take ming, power, and area objec ves
together to find the best possible op miza on solu ons for each testcase.

Figure 1(a): A simple transistor layout of two single-bit flip-flops and one multibit flip-flop. (left)
Figure 1(b): A diagram illustrating how dynamic power is reduced due to the reduction of on-chip clock
tree length. (right)

Figure 2: A simplified diagram of how two single-bit flip-flops can be banked into one two-bit flip-flop,
and debanked vice versa. In this diagram, the upper le D pin of the single-bit flip-flop is mapped to the
D0 pin of the two-bit flip-flop; the lower le D pin of the single-bit flip-flop is mapped to the D1 pin of
the two-bit flip-flop.

2. Contest Objec ve
The contestants need to develop a banking & debanking algorithm that can take any given testcase and
produce a placement result that sa sfies cell density constraints and with the resultant ming, power,
and area op mized. No cell overlapping is allowed in the placement result. A cost metric will be given for
each testcase to iden fy the weight for each ming, power, and area calcula on. The cost metrics of
calcula ng ming, power, and area for this contest is as follows:

𝛼 ∙ 𝑇𝑁𝑆(𝑖) + 𝛽 ∙ 𝑃𝑜𝑤𝑒𝑟(𝑖) + 𝛾 ∙ 𝐴𝑟𝑒𝑎(𝑖) + 𝛿 ∙ 𝐷


∀∈

Where i stands for every flip-flop instance in the design. TNS(i) stands for the total nega ve slack of the
flip-flop. Power(i) stands for the power consump on of the flip-flop. Area(i) stands for the area cost of
the flip-flop. The design is divided into several bins, and we will define a u liza on rate threshold for
each bin in the design. D stands for the number of bins that violates the u liza on rate threshold. 𝛼, 𝛽,
𝛾, and 𝛿 are weight for each cost. The weighted summa on of the four cost metrics above represents
quality of result used in this contest.

Figure 3 shows an example of the density constraint of a placement. A whole placement region is divided
into several uniform bins. For each bin, we set up a density threshold represen ng the upper bound of
the cell area allowed to be placed within the bin. Should the total cell area exceeds the threshold, we
consider this bin violates the density constraints, hence accrue the penalty score by density weight 𝛿. In
the contest, the same threshold is applied to every uniform bin, which is defined in the BinMaxU l of the
input file.
Figure 3: A representa on of cell density constraint. The whole placement region is divided into several
uniform bins, and a threshold is applied to every bin. Should the total cell area within the bin exceeds
the threshold, density constraint penalty score is accrued.

3. Problem Formulation and Input/Output Format


Given:
1. Weight factors for cost metrics
2. Die size and input output ports
3. Cell library and flip-flops library informa on
4. A cell placement result with flip-flops cells
5. Max placement u liza on ra o
6. Netlist
7. Timing slack and delay informa on
8. Power informa on
Output:
1. A cell placement solu on
2. Netlist connec vity
The resultant cell placement solu on must have no cell overlaps while sa sfying max placement
u liza on ra o. The max placement u liza on ra o is a cell density constraint implemented by dividing
the design into several placement bins, and each bin would have a maximum ra o defining the number
of cells allowed in each bin.

3.1 Format of Input Data


Weight factors of the cost metrics: 𝛼, 𝛽, 𝛾, and 𝛿 values are given out as Alpha, Beta, Gamma, and Delta,
respec vely.

Syntax

Alpha <alphaValue>

Beta <betaValue>
Gamma <gammaValue>

Delta <deltaValue>

Example

Alpha 1

Beta 5

Gamma 5

Delta 10

Die size and input output ports: DieSize describes the dimension of the die, namely the placement area
of the design. NumInput describes the number of input pins of the die. Each “Input” syntax describes the
input pin name and its loca on. NumOutput describes the number of output pins of the die. Each
“Output” syntax describes the output pin name and its loca on. Contestants cannot extend the cell size
(or region) to solve the density viola on or Flip-Flop overlap issues. The goal is to op mize the mul ple
objec ves: ming and power; without increasing the cell’s region or area (die size).

Syntax

DieSize <lowerLe X> <lowerLe Y> <upperRightX> <upperRightY>

NumInput <inputCount>

Input <inputName> <x-coordinate> <y-coordinate>

NumOutput <outputCount>

Output <outputName> <x-coordinate> <y-coordinate>

Example

DieSize 0 0 500 450

NumInput 2

Input INPUT0 0 25

Input INPUT1 0 5

NumOutput 2

Output OUTPUT0 500 25

Output OUTPUT1 500 5

Cell library:

Syntax

FlipFlop <bits> <flipFlopLibCellName> <libCellWidth> <libCellHeight> <pinCount>


Pin <pinName> <pinLoca onX> <pinLoca onY>

Example

FlipFlop 1 FF1 5 10 2

Pin D 0 8

Pin Q 5 8

FlipFlop 2 FF2 8 10 4

Pin D0 0 9

Pin D1 0 6

Pin Q0 8 9

Pin Q1 8 6

FlipFlop 2 FF2A 10 10 4

Pin D0 0 9

Pin D1 0 6

Pin Q0 8 9

Pin Q1 8 6

Placement result: The x and y coordinate describes the bo om-le corner of the cell:

Syntax

NumInstances <instanceCount>

Inst <instName> <libCellName> <x-coordinate> <y-coordinate>

Example:

NumInstances 2

Inst C1 FF1 50 20

Inst C2 FF1 50 0

Netlist:

Syntax

NumNets <netCount>

Net <netName> <numPins>

Pin <instName>/<libPinName>

Example
NumNets 4

Net N1 2

Pin INPUT0

Pin C1/D

Net N2 2

Pin INPUT1

Pin C2/D

Net N3 2

Pin C1/Q

Pin OUTPUT0

Net N4 2

Pin C2/Q

Pin OUTPUT1

Max placement u liza on ra o:

Syntax

BinWidth <width>

BinHeight <height>

BinMaxU l <u l>

Example

BinWidth 100

BinHeight 100

BinMaxU l 75
∑( )
The formula on of u liza on ra o of a Bin =

Placement rows. The given rows would start from (0, 0) and cover the en re die:

Syntax

PlacementRows <startX> <startY> <rowWidth> <rowHeight>

Example

PlacementRows 0 0 2 10
Timing slack and delay informa on. For each instance pin in the design we will give out a ming slack
informa on. The delay model is formulated by displacement delay and Q-pin delay. The defini on of
displacement is the Manha an distance between the original pin loca on and the new pin loca on. For
any cell displacement we mes the coefficient with the displacement distance to get the displacement
delay. For every flip-flop gate defined in the library we define a Q-pin delay for it.

Syntax

DisplacementDelay <coefficient>

QpinDelay <libCellName> <delay>

TimingSlack <instanceCellName> <PinName> <slack>

Example

DisplacementDelay 0.01

QpinDelay FF1 1

QpinDelay FF2 3

QpinDelay FF2A 2

TimingSlack C1 D 1

TimingSlack C1 Q 0

TimingSlack C2 D 1

TimingSlack C2 Q 0

Power consump on informa on: for every cell gate there is a power consump on rate.

Syntax

GatePower <libCellName> <powerConsump on>

Example

GatePower FF1 10

GatePower FF2 17

GatePower FF2A 18

3.2 Format of Output Data


The expected output is a list of bo om-le coordinates of each cell instance in the design. Contestants
are expected to u lize mul bit flip-flop banking and debanking techniques, therefore, the cell instance
count could change, and the expected output should also describe the net connec vity informa on, and
all net connec ons remain func onally equivalent as the data input with no open or short nets.

Syntax
CellInst <InstCount>

Inst <instName> <loca onX> <loca onY> <orienta on>

NumNets <netCount>

Net <netName> <numPins>

Pin <instName>/<libPinName>

Example

CellInst 1

Inst C1 FF2 48 10

NumNets 4

Net N1 2

Pin INPUT0

Pin C1/D0

Net N2 2

Pin INPUT1

Pin C1/D1

Net N3 2

Pin C1/Q0

Pin OUTPUT0

Net N4 2

Pin C1/Q1

Pin OUTPUT1

4. Example
We take the following circuit as a sample input:
1-bit FF 2-bit FF
Normalized Area 1 1.6
Normalized Power 1 1.7
Normalized Q-Pin Delay 1 2

Alpha 1

Beta 5

Gamma 5

Delta 1

DieSize 0 0 50 30

NumInput 2

Input INPUT0 0 25

Input INPUT1 0 5

NumOutput 2

Output OUTPUT0 50 25

Output OUTPUT1 50 5
FlipFlop 1 FF1 5 10 2

Pin D 0 8

Pin Q 5 8

FlipFlop 2 FF2 8 10 4

Pin D0 0 9

Pin D1 0 6

Pin Q0 8 9

Pin Q1 8 6

NumInstances 2

Inst C1 FF1 15 20

Inst C2 FF1 15 0

NumNets 4

Net N1 2

Pin INPUT0

Pin C1/D

Net N2 2

Pin INPUT1

Pin C2/D

Net N3 2

Pin C1/Q

Pin OUTPUT0

Net N4 2

Pin C2/Q

Pin OUTPUT1

BinWidth 10

BinHeight 10

BinMaxU l 79

PlacementRows 0 0 2 10

DisplacementDelay 0.01
QpinDelay FF1 1

QpinDelay FF2 2

TimingSlack C1 D 1

TimingSlack C1 Q 0

TimingSlack C2 D 1

TimingSlack C2 Q 0

GatePower FF1 10

GatePower FF2 17

The simple input

Based on the given input, here’s an example of output:

CellInst 1

Inst C1 FF2 8 10

NumNets 4

Net N1 2

Pin INPUT0

Pin C1/D0
Net N2 2

Pin INPUT1

Pin C1/D1

Net N3 2

Pin C1/Q0

Pin OUTPUT0

Net N4 2

Pin C1/Q1

Pin OUTPUT1

5. Evalua on
The expected output should sa sfy the following constraints:

A. All instances must be placed within the die region.


B. All instances must be without overlap and placed on the rows
C. Nets connected to the flip-flops must remain func onally equivalent to the data input. The result
should not leave any open or short net.

Ini al score is calculated by taking weighted ming, power, area, and u liza on rate constraints into
considera on. The cost metrics is as follows:

𝐼𝑛𝑖𝑡𝑖𝑎𝑙 𝑠𝑐𝑜𝑟𝑒 = 𝛼 ∙ 𝑇𝑁𝑆(𝑖) + 𝛽 ∙ 𝑃𝑜𝑤𝑒𝑟(𝑖) + 𝛾 ∙ 𝐴𝑟𝑒𝑎(𝑖) + 𝛿 ∙ 𝐷


∀∈

If the program and the output data violate any of these above bullets, you will get 0 score for the
corresponding test case.

4.1 Run me factor


Run me limit is 60 minutes for each case in the evalua on machine. The hidden cases will be in the
same scale as public cases. We would like to introduce the run me factor in this contest as formulated
below to encourage the ideas with faster turnaround- me.

The number of CPU cores available for your program is 8 cores in the evalua on:
𝑒𝑙𝑎𝑝𝑠𝑒 𝑡𝑖𝑚𝑒 𝑜𝑓 𝑡𝑒𝑠𝑡 𝑏𝑖𝑛𝑎𝑟𝑦
𝑟𝑢𝑛𝑡𝑖𝑚𝑒 𝑓𝑎𝑐𝑡𝑜𝑟 = 0.02 × log ( )
𝑚𝑒𝑑𝑖𝑎𝑛 𝑒𝑙𝑎𝑝𝑠𝑒 𝑡𝑖𝑚𝑒

𝑟𝑢𝑛𝑡𝑖𝑚𝑒 𝑓𝑎𝑐𝑡𝑜𝑟 𝑏𝑜𝑢𝑛𝑑𝑒𝑑 = 𝑚𝑎𝑥 −0.1, 𝑚𝑖𝑛(0.1, 𝑟𝑢𝑛𝑡𝑖𝑚𝑒 𝑓𝑎𝑐𝑡𝑜𝑟)

𝑭𝒊𝒏𝒂𝒍 𝒔𝒄𝒐𝒓𝒆 = 𝐼𝑛𝑖𝑡𝑖𝑎𝑙 𝑠𝑐𝑜𝑟𝑒 × (1.0 + 𝑟𝑢𝑛𝑡𝑖𝑚𝑒 𝑓𝑎𝑐𝑡𝑜𝑟 𝑏𝑜𝑢𝑛𝑑𝑒𝑑)


With this proposed run me factor, that means if the binary is 2x faster/slower, it would get 2% of ini al
score advantage/disadvantage. If the binary is 4x faster/slower, it would get 4% of ini al score
advantage/disadvantage. The run me factor is bounded to 10%.

4.2 Program Requirements


Your program should be able execute like following:

./$binary_name <input.txt> <output.txt>

The rule for binary name will be given out in the contest website. Please note that contestants should
follow the naming descrip on otherwise the score will be annulled.

Contestants will have access to the machine provided by ICCAD. Contestants are required to upload their
binary along with every modules required to run their binary. Contestants should note that if the public
account could not run the contestant’s binary due to any reason, the score would result in failure.

5. References
[1] Ya-Chu Chang, Tung-Wei Lin, Iris Hui-Ru Jiang, and Gi-Joon Nam. 2019. Graceful Register Clustering by
Effec ve Mean Shi Algorithm for Power and Timing Balancing. In Proceedings of the 2019 Interna onal
Symposium on Physical Design (ISPD '19).

[2] Meng-Yun Liu, Yu-Cheng Lai, Wai-Kei Mak, and Ting-Chi Wang. 2022. Genera on of Mixed-Driving
Mul -Bit Flip-Flops for Power Op miza on. In Proceedings of the 41st IEEE/ACM Interna onal
Conference on Computer-Aided Design (ICCAD '22).

[3] Gang Wu, Yue Xu, Dean Wu, Manoj Ragupathy, Yu-yen Mo, and Chris Chu. 2016. Flip-flop clustering
by weighted K-means algorithm. In Proceedings of the 53rd Annual Design Automa on Conference (DAC
'16)

You might also like