Electronics 10 02795

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

electronics

Article
An Investigation of Clock Skew Using a Wirelength-Aware
Floorplanning Process in the Pre-Placement Stages of MSV Layouts
B. Srinath 1 , Rajesh Verma 2 , Abdulwasa Bakr Barnawi 2 , Ramkumar Raja 2 , Mohammed Abdul Muqeet 2 ,
Neeraj Kumar Shukla 2 , A. Ananthi Christy 3 , C. Bharatiraja 4, * and Josiah Lange Munda 5

1 Consultant, MissionX Pvt, Mahindra 603002, India; [email protected]


2 Department of Electrical Engineering, College of Engineering, King Khalid University,
Abha, Asir 61411, Saudi Arabia; [email protected] (R.V.); [email protected] (A.B.B.);
[email protected] (R.R.); [email protected] (M.A.M.); [email protected] (N.K.S.)
3 Department of Electrical and Electronics Engineeqring, Saveetha School of Engineering,
SIMATS Saveetha University, Chennai 600077, India; [email protected]
4 Department of Electrical and Electronics Engineering, SRM Institute of Science and Technology,
Chennai 603203, India
5 Department of Electrical Engineering, Tshwane University of Technology, Pretoria 0001, South Africa;
[email protected]
* Correspondence: [email protected]

Abstract: Managing the timing constraints has become an important factor in the physical design of
multiple supply voltage (MSV) integrated circuits (IC). Clock distribution and module scheduling are

 some of the conventional methods used to satisfy the timing constraints of a chip. In this paper, we
propose a simulated annealing-based MSV floorplanning methodology for the design of ICs within
Citation: Srinath, B.; Verma, R.;
the timing budget. Additionally, we propose a modified SKB tree representation for floorplanning
Barnawi, A.B.; Raja, R.; Muqeet, M.A.;
Shukla, N.K.; Christy, A.A.;
the modules in the design. Our algorithm finds the optimal dimensions and position of the clocked
Bharatiraja, C.; Munda, J.L. An modules in the design to reduce the wirelength and satisfy the timing constraints. The proposed
Investigation of Clock Skew Using a algorithm is implemented in IWLS 2005 benchmark circuits and considers power, wirelength, and
Wirelength-Aware Floorplanning timing as the optimization parameters. Simulation results were obtained from the Cadence Innovus
Process in the Pre-Placement Stages digital system taped-out at 45 nm. Our simulation results show that the proposed algorithm satisfies
of MSV Layouts. Electronics 2021, 10, timing constraints through a 30.6% reduction in wirelength.
2795. https://doi.org/10.3390/
electronics10222795 Keywords: timing constraint; multiple supply voltage; physical design; floorplanning

Academic Editor: Paul Leroux

Received: 8 August 2021


1. Introduction
Accepted: 6 November 2021
Published: 15 November 2021
The emergence of system-on-chip (SoC) technology has created remarkable impacts
in mobile and wearable applications. This is mainly due to their high-speed processing
Publisher’s Note: MDPI stays neutral
and low power consumption. During the back-end design of these integrated circuits, the
with regard to jurisdictional claims in
electronic design automation software (EDA) creates a clock network for the modules in
published maps and institutional affil- the layout through routing from the clock source. The size of the clock network decides
iations. the time of the operation of modules which helps with high-speed operations. Since these
ICs possess large numbers of modules, a compact packaging of modules with a reduced
clock network size for a fixed die size introduces new complexities in the physical design
process of SoC-based ICs and operates its modules with multiple supply voltages (MSV)
Copyright: © 2021 by the authors.
for the reduction in power.
Licensee MDPI, Basel, Switzerland.
In MSV designs, the modules operating at the same voltage levels are floorplanned
This article is an open access article
in a region called the voltage island for the reduction in the distribution of the power
distributed under the terms and network area. Compared with single VDD designs, the aforementioned clock network
conditions of the Creative Commons size is pronounced in the MSV design. In spite of the conventional clock tree distribution
Attribution (CC BY) license (https:// techniques, a zero clock skew-based clocking methodology is necessary for the successful
creativecommons.org/licenses/by/ distribution of clock signals to the modules which contain the clock as one of their pins.
4.0/). The H tree, X tree, method of mean and median (MMM), recursive geometric matching

Electronics 2021, 10, 2795. https://doi.org/10.3390/electronics10222795 https://www.mdpi.com/journal/electronics


Electronics 2021, 10, 2795 2 of 10

(RGM), and zero clock tree are some of the clock distribution methodologies available in
EDA tools. These methodologies follow iterative methods for the distribution of clock
signals from the source to the sink nodes in the layout.

Related Works
This section gives insights into works in literature which address and propose novel
methods for simultaneous power reduction that satisfies the timing constraints. All the
methodologies in these works were experimented on using the MCNC and GSRC bench-
mark circuits. Most of the previous research provides solutions to clock tree generation
and distribution for a single VDD [1–3]. The implementation of methods for multiple
supply voltage designs results in a large skew which degrades the speed of the operations
performed in the IC.
Dynamic voltage frequency scaling (DVFS) and adaptive voltage scaling are the most
effective techniques for power reduction, which function with a design that operates at
different voltage modes [4]. Level converters are used in those designs to save power and
improve speed [5]. To handle clocking strategies in the DVFS, the separate clock trees were
generated for different operating modes. In order to reduce power consumption due to
random logic circuits, a clustered voltage scaling scheme with row-by-row optimization
in power was introduced [2]. In order to operate the IC within the predefined timing
constraints, the critical mode optimization and surrogate-based optimization methods
were proposed [3]. These methods insert buffers to meet the timing constraints in the clock
path present in the voltage islands. In addition, this method reduces the setup violations
before clock tree synthesis (CTS) since the buffers are inserted between modules without
changing their positions. However, after detailed routing, it is observed that it increases
wirelength; this will affect the performance of the chip. In order to overcome this trade-off
problem, an algorithm named the deferred-merge embedding algorithm was proposed [6]
which uses the DeFer algorithm for the optimization of wirelength.
Some of the works in the literature, such as [7–9], proposed methodologies for zero
skew with sharp clock edge rates at the clock utilization points. Several design methodolo-
gies focused on techniques to optimize the process of clock tree synthesis [10–12]. Power
gating [13], buffer sizing [14], and the insertion of multi-bit flip-flops (MBFFs) [15–17] were
introduced for the reduction in power consumption and to satisfy the necessary timing
constraints. In some designs, clock skew was also present in the intra levels of a clock
tree. Adjustable delay buffers were inserted in interconnections to reduce the effect of this
clock skew in the timing of the IC [18]. A legalization-based placement algorithm was
also proposed [19,20] for an accurate timing analysis. In [15], MBFFs were used during the
placement stage for the reduction in power and clock skew. In this method, modules in the
layout were clustered for the reduction in clock skew in the clock distribution networks.
Even though the clustering reduced the levels of the clock tree, it increased the intra-cluster
delay. To resolve the routing complexities due to clock nets, the clock tree generated is seg-
mented to achieve zero skew [21,22]. A power network distribution model was proposed
in [23] for which the simultaneous optimized IR drop through power planning reduced
wirelength. Since the clock distribution also had an effect on the latency of the design, a
constraint-aware clock tree construction algorithm was proposed in [24–26].
In this paper, we propose a floorplan-aware clock tree generation methodology that
identifies the clocked modules in the design and floorplans of those modules for balancing
the clock tree. The modules of the floorplan are initially represented as a skewed binary
tree (SKB) [27]. During the perturbation of modules, our proposed methodology considers
ranking the clocked modules, which will reduce the length of the clock network.
The remainder of this paper report is organized as follows. Section 2 describes the
SKB tree representation and its drawbacks. Sections 3 and 4 present an introduction to
floorplanning representation and our proposed algorithm with its pseudo-code. Simulation
and experimental results with IWLS benchmarks using the Cadence Innovus system are
presented in Section 5.
work.
The remainder of this paper report is organized as follows. Section 2 describes the
SKB tree representation and its drawbacks. Sections 3 and 4 present an introduction to
floorplanning representation and our proposed algorithm with its pseudo-code. Simula-
Electronics 2021, 10, 2795 tion and experimental results with IWLS benchmarks using the Cadence Innovus system
3 of 10
are presented in Section 5.

2. Problem Formulation
2. Problem Formulation
Given a design of the initial floorplan consisting of its functional modules along
Given a design of the initial floorplan consisting of its functional modules along with
with its operating voltage levels, we represent the modules in the voltage island using a
its operating voltage levels, we represent the modules in the voltage island using a skewed
skewed binary tree (SKB). Consider the floorplan of a simple design in Figure 1. This
binary tree (SKB). Consider the floorplan of a simple design in Figure 1. This floorplan has
floorplan has modules which include the clock as a one of its pins. The re-positioning of
modules which include the clock as a one of its pins. The re-positioning of these clocked
these clocked modules in the floorplan results in geometric violations and increases the
modules in the floorplan results in geometric violations and increases the pre-defined
pre-defined width of the voltage island. This may lead to changes in the placement of
width of the voltage island. This may lead to changes in the placement of modules in the
modules in the neighboring voltage island that increases the routing resource in terms of
neighboring voltage island that increases the routing resource in terms of wirelength [27,28].
wirelength [27,28]. For the purpose of quality floorplanning and to satisfy the voltage
For the purpose of quality floorplanning and to satisfy the voltage island constraint, we
island constraint, we propose an algorithm to determine the optimal dimensions of the
propose an algorithm to determine the optimal dimensions of the clocked modules in
clocked
the modules
voltage island.inThen,
the voltage
we will island. Then, we
incorporate thewill incorporate
placement methodthe placement
analogousmethod
to the
analogous to the SKB tree for the positioning of modules in their
SKB tree for the positioning of modules in their respective voltage islands. To respective voltage
furtheris-
lands. To further improve the timing constraints, we will optimize the
improve the timing constraints, we will optimize the resulting floorplans iteratively for resulting floor-
plans
the iteratively
reduction for length
in the the reduction in thetree.
of the clock length
Weofwill
theevaluate
clock tree.ourWe will evaluate
resulting floorplan our
resulting
with a costfloorplan with apossesses
function which cost function which skew,
wirelength, possesses
delay,wirelength,
and powerskew, delay, and
consumption as
power consumption as parameters. After the implementation
parameters. After the implementation of the proposed floorplanning methodologyof the proposed floorplan-
in the
ning methodology
Cadence in the simulation
Innovus system, Cadence Innovus system, that
results showed simulation results scales
our algorithm showed downthat the
our
algorithm scales down the length of the clock tree through reducing the
length of the clock tree through reducing the total wirelength in the design. As a result total wirelength
inrepeated
of the design. As a result our
optimizations, of repeated
proposed optimizations,
algorithm also our proposed
offers poweralgorithm
saving andalso offers
reduces
power
delay saving and
compared reduces
to the delay
existing SKB compared to the existing SKB methodology.
methodology.

4 12
7 CLK 4
CLK
2 7 11
6 CLK 3 CLK
12 CLK
1 10
5 Text CLK
11 CLK 2
CLK 6

10 9
CLK

1 5
9 8
CLK CLK
8
CLK

(a) (b)
Figure1.1.(a)
Figure (a)Floorplan
Floorplanrepresentation,
representation,and
and(b)
(b)placement
placementofofmodules
modulesininthe
thelayout.
layout.

3.3.Preliminaries
Preliminaries
Since
Sinceour
ourproposed
proposedmethodology
methodology is based on on
is based SKBSKB
treetree
representation, first first
representation, we review
we re-
the
viewfloorplanning representation
the floorplanning and theand
representation floorplanning that is used
the floorplanning that to
is satisfy
used tothe voltage
satisfy the
island
voltage constraint in [29]. in [29].
island constraint
3.1. Floorplanning Representation
Given an initial floorplan with the module dimensions (width, height) and its oper-
ating voltage levels, we construct the SKB tree. For the floorplan shown in Figure 1a, we
construct the SKB tree shown in Figure 1b. Each level of this tree represents voltage levels
and then nodes in these branches are the modules operating at a respective voltage range.

3.2. Placement
The tree structure is traversed using the depth-first search process for the placement
of modules from the left corner in the core area of the chip. In case of the formation of a
voltage island, the width of the island is determined using the equation Wi .
ai
Wi = Wc (1 + γ) (1)
at
Electronics 2021, 10, 2795 4 of 10

In the above Equation (1), ai refers to the total area of modules in the power domain,
at refers to the total area of the chip, Wc refers to the width of the chip, and γ refers to the
allowable dead space. A queue structure is maintained for the modules which fail to fit
inside the estimated width of the voltage island. Before the placement of the next module,
priority is given to the modules in the queue structure. Thus, the algorithm performs the
placement process in the voltage island. The algorithm has a unique feature called a cluster
constraint, to limit the density of modules in the voltage island. This feature helps to avoid
a high density in the voltage island; thereby, it reduces congestion due to wirelength.

3.3. Perturbations
To satisfy the dead space constraint, the algorithm adopts the refinement of abnormal
modules through rotation and perturbations. Three different perturbations were performed
in this algorithm: the exchange of modules between voltage, inside the voltage island, and
the change in the order of voltage islands.

3.4. Our Contributions


Our proposed algorithm undergoes the process explained in Section 4. Different from
the SKB tree methodology, our proposed algorithm avoids cluster constraints so as to
achieve less routing area. Additionally, our proposed algorithm avoids the refinement of
modules in order to fulfill the current flow constraint during the routing stage.

4. Proposed Early Clock Planning Algorithm


In this section, we explain our proposed early clock planning algorithm for reducing
the length of the clock tree.
Given an initial floor plan, we use SKB representation to arrange the modules of the
floor plan. After the arrangement of modules, we identify the modules as having a clock
pin. The algorithmic description of our proposed algorithm is given in the early clock
planning algorithm (T).
The steps 1–2 involve inserting the modules in the tree structure as shown in Figure 1.
The depth-first search process is used to traverse the tree which visits each module in
the levels of tree T. Using the steps 3–6 for modules in every level of the tree, we use the
function which determines the optimal dimensions of the clocked modules. In Step 7, if
the search locates the clocked modules, it marks that module as CLK in the tree T. After
distinguishing the modules with clock pins, we implement our proposed algorithm named
Clock_Opt_dimension mi , m j which identifies the suitable dimensions of the clocked
modules and their relatively connected modules in the tree structure.
Using Algorithm 1 for all visited clocked modules, we identify the optimal dimen-
sions. Since the module mi−1 and mi+1 may also connect with the clock pin module mi ,
Algorithm 2 also finds the optimal dimensions of mi−1 and mi+1 .
Algorithm 1: Early clock planning algorithm (T)
1. A tree T, with nodes representing modules in the design
2. level ← 1 , where level ∈ T
3. for level do
4. M ∈ T; mi m j ∈ M
5. Traverse tree T, using DFS;
6. mark clocked modules as CLK;
7. Choose mi and m j  
8. Clock_Opt_dimension mi , m j ;
9. update mi and m j ;
10. end for
Electronics 2021, 10, 2795 5 of 10

 
Algorithm 2:Clock_Opt_dimension mi , m j
1. for mi = CLK do
2. Opt_dimension(mi , mi−1 )
3. Opt_dimension(mi , mi+1 )
4. end for
5. Update mi , mi−1 , and mi+1

Placement
After obtaining the optimal dimensions of the clocked modules, it is updated in the
tree structure so as to perform the placement of modules inside the core area of the chip
as shown in Figure 2. Before the placement of modules, the width of the voltage island is
obtained from the Equation (1). We undergo a similar placement process
Electronics 2021, 10, x FOR PEER REVIEW 6 of 11
as in SKB after the
implementation of our proposed methodology as it reduces the computational complexity.

3 wi
3
6 2 8
9 7
CLK CLK
1
5 CLK 2
9 6
CLK
4 5
8 CLK
CLK
4 1
CLK CLK
7
wf

(a) (b)

3 6 3 6
CLK CLK
9
9

8 2 2
CLK 5 5 8
CLK

1 1 4
4 CLK 7
CLK CLK CLK
wi w1 w2 w3
(c) (d)
Figure 2. Placement process after implementation of proposed methodology.(a) tree structure;(b) placement of modules
Figure 2. Placement process
inside after
the core area of implementation ofofproposed
the chip 1; (c) placement methodology.
modules inside (a)chip
the core area of the tree structure;
2;(d) placement of(b) placement of modules
modules
inside the core area of the chip 1; (c) placement of modules inside the core area of the chip 2; (d) placement of modules
inside the core area of the chip 3.

inside the core area of the chip 3. 5. Simulation Results


In this section, we showcase the results of three different experiments which were
Figure 2a shows
performed on IWLS the updated
benchmark dimensions
circuits. ofthe
Table 1 shows modules in the tree
hardware description structure. Figure 2b–d
of IWLS
benchmarks after synthesis in 45 nm technology.
illustrates the arrangement of modules from the left corner of the chip. Even though the
In this work, we use the size clock buffers of the tree to balance the generated clock
floorplantree.
is We
with optimal
use the dimensions,
engineering thereinistheaCadence
change order (ECO) need for optimization
Innovus EDA to choose in order to reduce
the deadthe
space. For the
clock buffers. reduction
Currently, in supply
multiple this unused space,arewe
voltage designs the optimize the
choice for the re-resultant floorplan
duction in the dynamic power consumption of the high-speed ICs. Hence, in this work,
with simulated annealing with the cost function given in Equation (2).
we use low-threshold voltage standard cells from the technology library.
The experiment setup carried out using the Cadence Innovus is shown in Table 2. In
F1 ) = Pfor+theWconvergence
Cost(iterations
the first experiment, we tend to perform L + D of the cost func- (2)
tion given in Equations (2) and (3) to enhance the performance of the proposed floorplan.
In the second experiment, we compare our proposed floorplan with existing SKB tree
To further
methodology improve
with three the routing
different andtoavoid
aspect ratios the
satisfy the skew
fixed outlineinduced due
constraint. Fi- to routing, we
optimize the floorplans with the cost function in Equation (3).
nally, in the third experiment, we make a comparison of slack time, the number of
sub-trees, and the number of levels in the clock tree with the existing method after clock
tree synthesis (CTS).
Cost( F2 ) = P + W L + D (3)

In the above Equations (2) and (3), P refers to power in nW, WL refers to wirelength µm,
D refers to delay in pico seconds, and skew is measured in pico seconds.
Electronics 2021, 10, 2795 6 of 10

5. Simulation Results
In this section, we showcase the results of three different experiments which were
performed on IWLS benchmark circuits. Table 1 shows the hardware description of IWLS
benchmarks after synthesis in 45 nm technology.
Table 1. The hardware description of IWLS benchmarks for the Specification of design at the 45 nm node.
Benchmark Function Sequential Inverter Buffer Logic Total
AES
AES_CORE 530 5589 274 14,402 20,795
Cipher
DSP_CORE 16-bit DSP 3611 5258 42 23,523 32,436
DMA_CORE DMA 2192 2678 253 13,995 19,118
AC_97CTRL WISHBONE 2199 1525 111 8020 11,855

In this work, we use the size clock buffers of the tree to balance the generated clock tree.
We use the engineering change order (ECO) in the Cadence Innovus EDA to choose the
clock buffers. Currently, multiple supply voltage designs are the choice for the reduction
in the dynamic power consumption of the high-speed ICs. Hence, in this work, we use
low-threshold voltage standard cells from the technology library.
The experiment setup carried out using the Cadence Innovus is shown in Table 2.
In the first experiment, we tend to perform iterations for the convergence of the cost
function given in Equations (2) and (3) to enhance the performance of the proposed
floorplan. In the second experiment, we compare our proposed floorplan with existing SKB
tree methodology with three different aspect ratios to satisfy the fixed outline constraint.
Finally, in the third experiment, we make a comparison of slack time, the number of sub-
trees, and the number of levels in the clock tree with the existing method after clock tree
synthesis (CTS).

Table 2. The hardware description of Cadence Innovus for the Specification of design at the 45 nm node.
Benchmark Circuit AES_CORE DSP_CORE DMA_CORE AC_97ctrl
No. of Modules present 17 28 15 16
No. of Clocked Modules 2 6 5 7
Aspect Ratio 1:1, 2:1, 3:1 1:1, 2:1, 3:1 1:01 1:01
Core utilization 70% 70% 70% 70%
Supply Voltages(V) 1.1, 0.9 1.1, 0.9 1.1, 0.9 1.1, 0.9

5.1. Performance of the Floorplan in Iterations


This study is further performed with various iterations of the floorplan based on
proposed early clock planning methodology. We terminate an iteration if two consecutive
cost function values are of the same value. Tables 3 and 4 show the floorplanning results for
AES_CORE and DSP_CORE. The first column shows the number of iterations performed.
Columns 2–4 show the results of total power consumption, delay, and wirelength in
different power domains. Column 5 shows the cost function results which are calculated
using the Equation (2). Figures 3 and 4 depict the layout after the implementation of the
proposed methodology in AES_CORE and DSP_CORE.

Table 3. Iterative optimization in AES_CORE floorplan.

Total Power (nW) Delay (ps) Wirelength (um) Cost Function


Iterations
PD-1 PD-2 PD-1 PD-2 PD-1 PD-2 PD-1 PD-2
Iteration-1 2.64 3.073 3.62 2.9 0.53 0.536 6.923 6.638
Iteration-2 3.18 2.136 2.43 4.559 0.57 0.482 6.312 7.309
Iteration-3 3.37 2.095 2.93 5.36 0.55 0.484 6.982 8.066
Iteration-4 2.2 2.109 6.26 2.984 0.56 0.497 9.152 5.723
Iteration-5 3.19 2.118 4.89 3.169 0.54 0.5 8.752 5.92
Iteration-6 3.13 2.12 4.79 3.172 0.54 0.5 8.59 5.924
Electronics 2021, 10, 2795 7 of 10

Table 4. Iterative optimization in DSP_CORE floorplan.

Total Power (nW) Delay (ps) Wirelength (um) Cost Function


Iterations
PD-1 PD-2 PD-1 PD-2 PD-1 PD-2 PD-1 PD-2
Iteration-1 6.31 6.902 1.96 2.361 0.87 0.115 9.51 9.6
Iteration-2 6.61 5.304 2.301 2.325 0.88 0.119 10 7.97
Iteration-3 5.21 8.039 3.92 2.561 0.8 0.176 10.2 11
Electronics 2021,
Electronics 2021, 10,
10, xx FOR
FOR PEER
PEER REVIEW
REVIEW
Iteration-4 6.72 7.384 2.251 2.385 0.85 0.135 10 1088 of
of 11
11
Iteration-5 6.71 7.265 2.313 2.222 0.86 0.135 10.1 9.95

Figure3.3.
Figure
Figure 3.Layout:
Layout:After
Layout: Afterimplementation
After implementationof
implementation ofproposed
of proposedearly
proposed earlyclock
early clockfloorplanning
clock floorplanninginin
floorplanning inAES_CORE.
AES_CORE.
AES_CORE.

Figure 4.
Figure 4. Layout:
Layout: After
After implementation
implementation of
of proposed
proposed early
early clock
clock floorplanning
floorplanning in
in DSP_CORE.
DSP_CORE.
Figure 4. Layout: After implementation of proposed early clock floorplanning in DSP_CORE.

5.2.Comparisons
5.2.
5.2. Comparisonsbetween
Comparisons betweenthe
between theProposed
the ProposedFloorplan
Proposed FloorplanMethodology
Floorplan Methodologyand
Methodology andthe
and theSKB
the SKBTree-Based
SKB Tree-BasedFloorplan
Tree-Based Floorplan
Floorplan
after
afterCTS
after CTS
CTS
Here,
Here,we
Here, wecompare
we compareour
compare ourresults
our resultsafter
results afterperforming
after performingclock
performing clocktree
clock treesynthesis
tree synthesisand
synthesis andcalculate
and calculatethe
calculate the
the
number
number of
number of levels,
of levels, number
levels, number
number of of sub-trees,
of sub-trees, skew,
sub-trees, skew, and
skew, and slack
and slack time
slack time
time ofof both
of both the
both the proposed
proposedand
the proposed and
and
existing
existingmethods.
existing methods.
methods. Table
Table5 shows
Table 55 shows
shows thethe
skew
the results
skew
skew afterafter
results
results the implementation
after the implementation
the implementation of theofproposed
of the pro-
the pro-
algorithm.
posed For the reduction
posed algorithm.
algorithm. For the
For in induced
the reduction
reduction inclock
in skew,
induced
induced we optimize
clock
clock skew, we
skew, wethe resulting
optimize
optimize floorplans
the
the resulting
resulting
using the Equation
floorplans
floorplans using the
using (3).
the Equation (3).
Equation (3).
Table 55 shows
Table shows thatthat the
the number
number of of levels
levels ofof the
the clock
clock tree
tree of
of the
the proposed
proposed method
method is is
less than
less than the
the existing
existing method
method in in Table
Table 6. 6. We
We cancan observe
observe that that the
the number
number of of sub-trees,
sub-trees,
skew, and
skew, and slack
slack time
time inin the
the clock
clock distribution
distribution depends
depends on on thethe number
number of of levels
levels of
of the
the
clock tree. The lower the number of levels, the lower the sub-tree is,
clock tree. The lower the number of levels, the lower the sub-tree is, the lower the skew is, the lower the skew is,
and the
and the slack
slack time
time will
will bebe close
close to to zero.
zero. InIn Table
Table 55 the
the columns
columns 2–5 2–5 are
are our
our proposed
proposed
method results
method results andand thethe columns
columns 6–9 6–9 are
are the
the existing
existing method.
method. All All the
the comparisons
comparisons givengiven
in the
in the table
table shows
shows thatthat our
our proposed
proposed methodmethod performs
performs better
better than
than thethe existing
existing method.
method.
Electronics 2021, 10, 2795 8 of 10

Table 5. Skew results after CTS using our proposed methodology.

Skew
Circuit Levels Sub-Trees Slack Time
Rise Time Fall Time
(ps) (ps)
1-6
1 1 0 0 0.029
AES_CORE
DSP_CORE 3 49 86.9 99.3 3.215
DMA_CORE 3 25 27.4 23.8 0.091
AC97_ctrl 5 53 18.2 18.1 −1.3

Table 5 shows that the number of levels of the clock tree of the proposed method is
less than the existing method in Table 6. We can observe that the number of sub-trees, skew,
and slack time in the clock distribution depends on the number of levels
Electronics 2021, 10, x FOR PEER REVIEW 9 of 11 of the clock tree.
The lower the number of levels, the lower the sub-tree is, the lower the skew is, and the
slack time will be close to zero. In Table 5 the columns 2–5 are our proposed method results
Tablethe
and 5. Skew results after
columns 6–9CTS
areusing
the our proposed
existing methodology.
method. All the comparisons given in the table shows
that our proposed method performs betterSkew than the existing method. Figure 5 shows the
Circuit Levels Sub-Trees Slack Time
clock tree generated after implementing Rise Timeour
(ps) proposed
Fall Time methodology
(ps) in DSP_CORE.
AES_CORE 1 1 0 0 0.029
Table 6. Skew results
DSP_CORE 3 using existing
49 methodology.
86.9 99.3 3.215
DMA_CORE 3 25 27.4 23.8 0.091
AC97_ctrl 5 53 18.2 18.1 Skew −1.3
Circuit Levels Sub-Trees Slack Time
Table 6. Skew results using existing methodology. Rise Time Fall Time
(ps) (ps)
Skew
1-6
Circuit Levels Sub-Trees Slack Time
1 Rise 1Time (ps) Fall Time
0 (ps) 0 0.029
AES_CORE
AES_CORE 1 1 0 0 0.029
DSP_CORE 5 53 106.3 107.8 −4.31
DSP_CORE 5 53 106.3 107.8 −4.31
DMA_CORE 5 57 101.5 112.5 −3.13
DMA_CORE 5 57 101.5 112.5 −3.13
AC97_ctrl 7 64 28.2 29.3 −2.89
AC97_ctrl 7 64 28.2 29.3 −2.89

Figure 5. Layout: Clock tree after implementation of proposed methodology in DSP_Core.


Figure 5. Layout: Clock tree after implementation of proposed methodology in DSP_Core.
6. Conclusions
6. Conclusions
The emerging VLSI integrated circuits and applications require a general method-
ologyThe emerging
for the design ofVLSI integrated
the chip circuits
for high speeds and applications
of operation require the
and for processing a general
in- methodology
formation.
for The hardware
the design of the chipelements in thespeeds
for high critical of
path of the design
operation andhave impacts on the information.
for processing
power
The and timingelements
hardware constraints.
inInthe
this paper, path
critical we focused
of theondesign
these two issues
have by present-
impacts on power and timing
ing an early clock plan-based floorplanning algorithm that optimizes the fixed outline
constraints. In this paper, we focused on these two issues by presenting
constraint, voltage island constraint, and reduces delay substantially. In the proposed
an early clock
plan-based floorplanning algorithm that optimizes the fixed outline
framework with the iterative optimization algorithm, it achieves practicality with the constraint, voltage
island constraint,
IWLS benchmark and
netlist reduces atdelay
synthesized 45 nm. substantially.
The experimentalIn results
the proposed
reveal thatframework
the with the
proposed floorplanning methodology shows significant improvements in clock skew,
delay, and power saving though reduction in wirelength. 3D and monolithic-based in-
tegrated circuit designs are the most promising techniques for the compact fabrication of
chips consisting of more than millions of transistors. Even though our proposed floor-
planning methodology provides solutions to multiple supply voltage designs which aids
in satisfying timing constraints by reducing the size of the clock tree, it needs to be re-
Electronics 2021, 10, 2795 9 of 10

iterative optimization algorithm, it achieves practicality with the IWLS benchmark netlist
synthesized at 45 nm. The experimental results reveal that the proposed floorplanning
methodology shows significant improvements in clock skew, delay, and power saving
though reduction in wirelength. 3D and monolithic-based integrated circuit designs are
the most promising techniques for the compact fabrication of chips consisting of more than
millions of transistors. Even though our proposed floorplanning methodology provides
solutions to multiple supply voltage designs which aids in satisfying timing constraints by
reducing the size of the clock tree, it needs to be reconstructed for 3D and monolithic ICs.
This provides the motivation for our future work, which is to propose a novel physical
design methodology which satisfies the desired timing constraints while designing 3D and
monolithic ICs.

Author Contributions: Conceptualization, B.S., R.V., A.B.B., R.R., M.A.M., N.K.S., C.B. and J.L.M.
and N.K.S.; methodology, B.S., R.V., A.B.B., R.R., M.A.M., N.K.S., A.A.C., C.B. and J.L.M.; software,
B.S., A.A.C.,C.B., J.L.M.; validation, B.S., R.V., A.B.B., R.R., M.A.M., N.K.S., A.A.C., C.B. and J.L.M.;
formal analysis, B.S., R.V., A.B.B., R.R., M.A.M., N.K.S., A.A.C., C.B. and J.L.M.; investigation B.S,
R.V., A.B.B., R.R., M.A.M., N.K.S., A.A.C., C.B. and J.L.M.; resources, B.S., R.V., A.B.B., R.R., M.A.M.,
N.K.S.,C.B. and J.L.M.; data curation, B.S., C.B., J.L.M.; writing—original draft preparation, B.S. and
C.B.; writing—review and editing, B.S., J.L.M. and C.B.; visualization, B.S., R.V., A.B.B., R.R., M.A.M.,
N.K.S., A.A.C., C.B. and J.L.M.; supervision, B.S., R.V., A.B.B., R.R., M.A.M., N.K.S., A.A.C., C.B.
and J.L.M.; project administration, B.S., R.V., A.B.B., R.R., M.A.M., N.K.S., A.A.C., C.B. and J.L.M.;
funding acquisition, J.L.M. and C.B. All authors have read and agreed to the published version
of the manuscript.
Funding: The authors extend their appreciation to the Deanship of Scientific Research at King Khalid
University, Kingdom of Saudi Arabia for funding this work through General Research Project under
the grant number (RGP. 1/262/42).
Data Availability Statement: Experimental data is available upon request.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Pangjun, J.; Sapatnekar, S.S. Low-power clock distribution using multiple voltages and reduced swings. In IEEE Transactions on
Very Large Scale Integration (VLSI) Systems; IEEE: New York, NY, USA, 2002; Volume 10, pp. 309–318.
2. Igarashi, M.; Usami, K.; Nogami, K.; Minami, F.; Kawasaki, Y.; Aoki, T.; Takano, M.; Sonoda, S.; Ichida, M.; Hatanaka, N. A
low-power design method using multiple supply voltages. In Proceedings of the 1997 International Symposium on Low Power
Electronics and Design, Monterey, CA, USA, 18–20 August 1997; pp. 36–41.
3. Tsai, C.C.; Lin, T.H.; Tsai, S.H.; Chen, H.M. Clock planning for multi-voltage and multi-mode designs. In Proceedings of the 2011
12th International Symposium on Quality Electronic Design, Santa Clara, CA, USA, 14–16 March 2011; pp. 1–5.
4. Lung, C.L.; Zeng, Z.Y.; Chou, C.H.; Chang, S.C. Clock skew optimization considering complicated power modes. In Proceedings of
the 2010 Design, Automation Test in Europe Conference Exhibition (DATE 2010), Dresden, Germany, 8–12 March 2010; pp. 1474–1479.
5. Rajesh, A.; Raju, B.L.; Reddy, K.C.K. Design of voltage scaled level converters in low power clock distribution networks. In
Proceedings of the 2016 IEEE International Conference on Recent Trends in Electronics, Information Communication Technology
(RTEICT), Bangalore, India, 20–21 May 2016; pp. 579–583.
6. Tsai, J.-L.; Chen, T.-H.; Chen, C.C.P. Zero skew clock-tree optimization with buffer insertion/sizing and wire sizing. In IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems; IEEE: New York, NY, USA, 2004; Volume 23, pp. 565–572.
7. Roy, S.; Mattheakis, P.M.; Masse-Navette, L.; Pan, D.Z. Evolving challenges and techniques for nanometer soc clock network
synthesis. In Proceedings of the 2014 12th IEEE International Conference on Solid-State and Integrated Circuit Technology
(ICSICT), Guilin, China, 28–31 October 2014; pp. 1–4.
8. Balboni, A.; Costi, C.; Pellencin, M.; Quadrini, A.; Sciuto, D. Clock skew reduction in ASIC logic design: A methodology for clock
tree management. In IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems; IEEE: New York, NY, USA,
1998; Volume 17, pp. 344–356.
9. Chan, T.B.; Kahng, A.B.; Li, J. Nolo: A no-loop, predictive useful skew methodology for improved timing in ic implementation. In
Proceedings of the Fifteenth International Symposium on Quality Electronic Design, Santa Clara, CA, USA, 3–5 March 2014; pp. 504–509.
10. Lin, M.P.H.; Hsu, C.C.; Chang, Y.T. Recent research in clock power saving with multi-bit flip-flops. In Proceedings of the 2011
IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS), Seoul, Korea, 7–10 August 2011; pp. 1–4.
11. Tengand, S.K.; Soin, N. Low power clock gates optimization for clock tree distribution. In Proceedings of the 2010 11th
International Symposium on Quality Electronic Design (ISQED), San Jose, CA, USA, 22–24 March 2010; pp. 488–492.
Electronics 2021, 10, 2795 10 of 10

12. Dev, M.P.; Baghel, D.; Pandey, B.; Pattanaik, M.; Shukla, A. Clock gated low power sequential circuit design. In Proceedings of the
2013 IEEE Conference on Information Communication Technologies, Thuckalay, India, 11–12 April 2013; pp. 440–444.
13. Chen, S.Y.; Lin, R.B.; Tung, H.H.; Lin, K.W. Power gating design for standard-cell-like structured ASICs. In Proceedings of the
2010 Design, Automation Test in Europe Conference Exhibition (DATE 2010), Dresden, Germany, 8–12 March 2010; pp. 514–519.
14. Balasubramanian, S.; Panchanathan, A.; Chokkalingam, B.; Padmanaban, S.; Leonowicz, Z. Module Based Floorplanning
Methodology to Satisfy Voltage Island and Fixed Outline Constraints. Electronics 2018, 7, 325. [CrossRef]
15. Lin, M.P.H.; Hsu, C.C.; Chen, Y.C. Clock-tree aware multibit flip-flop generation during placement for power optimization. In IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems; IEEE: New York, NY, USA, 2015; Volume 34, pp. 280–292.
16. Yan, J.T.; Chen, Z.W. Construction of constrained multi-bit flip-flops for clock power reduction. In Proceedings of the 2010
International Conference on Green Circuits and Systems, Shanghai, China, 21–23 June 2010; pp. 675–678.
17. Shyu, Y.T.; Lin, J.M.; Huang, C.P.; Lin, C.W.; Lin, Y.Z.; Chang, S.J. Effective and efficient approach for power reduction by using multi-bit
flip-flops. In IEEE Transactions on Very Large Scale Integration (VLSI) Systems; IEEE: New York, NY, USA, 2013; Volume 21, pp. 624–635.
18. Jooand, D.; Kim, T. Managing clock skew in clock trees with local clock skew requirements using adjustable delay buffers. In
Proceedings of the 2015 International SoC Design Conference (ISOCC), Gyeongju, Korea, 2–5 November 2015; pp. 137–138.
19. Moon, H.; Kim, T. Design and allocation of loosely coupled multi-bit flip-flops for power reduction in post- placement optimiza-
tion. In Proceedings of the 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), Macao, China, 25–28
January 2016; pp. 268–273.
20. Chen, Z.W.; Yan, J.T. Routability-driven flip-flop merging process for clock power reduction. In Proceedings of the 2010 IEEE
International Conference on Computer Design, Amsterdam, The Netherlands, 3–6 October 2010; pp. 203–208.
21. Chong, A.B. ASIC clock tree estimation in design planning. In Proceedings of the 2013 4th International Conference on Intelligent
Systems, Modelling and Simulation, Bangkok, Thailand, 29–31 January 2013; pp. 619–626.
22. Krishnamoorthy, R.; Krishnan, K.; Chokkalingam, B.; Padmanaban, S.; Leonowicz, Z.; Holm-Nielsen, J.B.; Mitolo, M. Systematic
Approach for State-of-the-Art Architectures and System-on-Chip Selection for Heterogeneous IoT Applications. IEEE Access 2021,
9, 25594–25622. [CrossRef]
23. Huang, S.-H.; Wang, C.-L. An effective floorplan- based power distribution network design methodology under reliability
constraints. In Proceedings of the 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat No.02CH37353),
Phoenix-Scottsdale, AZ, USA, 26–29 May 2002; Volume 1, pp. 353–356.
24. Man, X.; Kimura, X. Comparison of optimized multi-stage clock gating with structural gating approach. In Proceedings of the
TENCON 2011—2011 IEEE Region 10 Conference, Bali, Indonesia, 21–24 November 2011; pp. 651–656.
25. Guo, J.; Cao, P.; Wu, J.; Liu, Z.; Yang, J. Analytical Gate Delay Variation Model with Temperature Effects in Near-Threshold
Region Based on Log-Skew-Normal Distribution. Electronics 2019, 8, 501. [CrossRef]
26. Balasubramanian, S.; Chokkalingam, B.; Krishnamoorthy, R.; Adedayo, Y. Power optimization through FuzzyMinProduct
algorithm for voltage assignment in SOC design. J. Appl. Sci. Eng. 2020, 23, 655–659.
27. Khan, N.; Castro-Godinez, J.; Xue, S.; Henkel, J.; Becker, J. Automatic Floorplanning and Standalone Generation of Bitstream-Level
IP Cores. IEEE Trans. Very Large Scale Integr. Syst. 2021, 29, 38–50. [CrossRef]
28. Sadeghi, A.; Zolfy Lighvan, M.; Prinetto, P. Automatic and Simultaneous Floorplanning and Placement in Field-Programmable Gate
Arrays with Dynamic Partial Reconfiguration Based on Genetic Algorithm. Can. J. Electr. Comput. Eng. 2020, 43, 224–234. [CrossRef]
29. Lin, J.-M.; Hung, Z.-X. SKB-tree: A fixed-outline driven representation for modern floorplanning problems. IEEE Trans. Very
Large Scale Integr. (VLSI) Syst. 2011, 20, 473–484. [CrossRef]

You might also like