Cloud Computing Fplan
Cloud Computing Fplan
EMERGING TOPICS
IN COMPUTING
Received 12 September 2014; revised 26 February 2015 and 27 May 2015; accepted 5 June 2015.
Date of publication 11 September 2014; date of current version 9 December 2015.
Digital Object Identifier 10.1109/TETC.2015.2443714
ABSTRACT Cloud computing has recently emerged as a promising computing paradigm, which offers
unprecedented computing power and flexibility in the distributed computing environment. Despite the trend
that electronic design automation industry has prepared to embrace the cloud concept, there is still no research
publication on designing VLSI floorplanning algorithms for a cloud computing platform. This paper proposes
the first such algorithm for thermal driven floorplanning. Since the existing floorplanning techniques are
based on simulated annealing that are sequential algorithms and difficult to parallelize, a new thermal driven
floorplanning algorithm is proposed, which can be easily parallelized in a cloud computing environment.
This algorithm uses an advanced adjacency probability cross entropy optimization and a new integer linear
programming-based resources provisioning to efficiently use the heterogeneous computation resources and
handle the uncertainty of machine waiting time in a cloud. The experimental results on the standard
GSRC benchmark circuits demonstrate that the proposed algorithm can significantly reduce the peak
temperature (up to 24 ◦ C) compared with the simulated annealing technique. In the simulated cloud
computing environment, it runs over 30% faster than the simulated annealing technique with moderate
overhead in monetary expense due to the fact that the proposed algorithm is parallelization friendly. Further,
our algorithm can effectively compute the scheduling solutions considering the uncertainty in waiting time.
INDEX TERMS Cloud computing, green computing, energy-aware computing, VLSI design.
I. INTRODUCTION Services for training EDA tools internally and Cadence has
As an emerging computing paradigm featuring computing as already launched its own SaaS [7].
a service, cloud computing has recently become a research Two of cloud computing features, namely, the massive
focus in the parallel and distributed computing community. computational resources and the pay-per-usage based
One of the most popular cloud computing infrastructure pricing model. To take the advantage of massive
is called Software-as-a-Service (SaaS) which can provide computational resources offered in a cloud, the algorithm
massive computing power to largely improve the needs to be parallelized. On the other hand, since the usage
computational efficiency of softwares. The software of software and computing resources is charged on the
provided by the software service provider (or software pay-per-usage basis, the resource provisioning (scheduling)
provider in short) can be installed in various cloud computers algorithm, which is to allocate the parallelized components
and accessed over the Internet by customers. Many existing to different cloud computers, needs to take the pricing into
commercial cloud service providers adopt this infrastructure consideration.
such as Amazon EC2 and Rackspace Cloud. Refer to Figure 1 (a) for the money flow in the EDA cloude
Despite that cloud computing paradigm has been computing infrastructure. The EDA software customer only
successfully adopted in many research areas [1]–[5], pays to the EDA software provider, while the EDA software
EDA community only recently starts to embrace the cloud provider pays to the cloud computing service provider
concept [6]. In fact, Synopsys has already used Amazon Web due to the machine usage (e.g., an amount of monetary
2168-6750
2015 IEEE. Translations and content mining are permitted for academic research only.
Personal use is also permitted, but republication/redistribution requires IEEE permission.
534 See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. VOLUME 3, NO. 4, DECEMBER 2015
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction IN COMPUTING
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
IN COMPUTING Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction
pricing model and the heterogeneous computational problem asks to place the blocks in the fixed outline
resources. (chip size) such that they do not overlap with each other
• The resource provisioning technique is enhanced to and the total wirelength is minimized. The half-perimeter
consider the uncertainty in machine waiting time. wirelength (HPWL) of a net is often used as the
• The experimental results on the standard GSRC approximation of the wirelength in the literature. It is defined
benchmark circuits demonstrate that the proposed as the half perimeter of the smallest bounding box of all the
thermal driven floorplanning technique can significantly pins in the net. Following the existing works [16]–[18], the
reduce the peak temperature (up to 24 ◦ C) compared thermal-driven floorplanning problem can be formulated as
to the simulated annealing technique while running follows.
over 30% faster in the simulated cloud computing Floorplanning for Peak Temperature Minimization: Given
environment due to the fact that the proposed a set of n blocks, denoted by {b1 , b2 , . . . , bn }, the size of
algorithm is parallelization friendly. Further, it can effec- each block, and the chip size, the problem asks to place
tively compute the schedule considering the uncertainty these blocks such that they do not overlap with each other
in waiting time. and the peak temperature is minimized while the total wire-
The rest of the paper is organized as follows. Section II length (HPWL) is still small.
formulates the thermal driven floorplanning problem.
Section III describes the parallelization friendly cross III. THE CROSS ENTROPY BASED THERMAL DRIVEN
entropy based thermal driven flooplanning technique. FLOORPLANNING ALGORITHM
Section IV describes the new resource provisioning To deploy a thermal driven floorplanning technique in a
technique. Section V presents the experimental results with cloud computing environment, one needs to first parallelize
analysis. A summary of work is given in Section VI. it (decompose the original floorplanning problem into a
set of subproblems) and then design a resource provision-
II. PROBLEM FORMULATION ing technique to schedule subproblems to cloud computers.
A. THERMAL MODEL Since most existing floorplanning techniques are simulated
The thermal profile of a floorplan, or the temperature Ti at annealing based techniques [8], [18]–[21], they are essen-
each block i, can be computed using the thermal conductance tially sequential algorithms and are difficult to parallelize.
matrix R and the power density vector P of each block. Therefore, we will first develop a cross entropy optimization
based thermal-driven floorplanning technique which can be
T1 R11 R12 · · · R1n P1 easily parallelized, and then propose a resource provisioning
T2 R21 R22 · · · R2n P2 technique in Section IV to deploy it in a cloud computing
.. = .. .. .. .. · .. (1)
. . . . . . environment.
Tn Rn1 Rn2 ··· Rnn Pn
A. CROSS ENTROPY FRAMEWORK OVERVIEW
This model has been widely used in the VLSI research field, Our parallel thermal driven floorplanning technique is based
such as [16], [22], and [23]; and it is a fast compact resistive on the advanced cross entropy optimization framework.
model which is not taking too much running time in the Such a framework was originally proposed in [25] and has
proposed design [24]. But this thermal simulation is still com- been been successfully applied in a number of optimization
putationally expensive, fast estimation techniques have been problems such as vehicle routing problem [26] and buffer
proposed. For example, [18] designs a simple power density allocation problem [27]. In [28], this optimization scheme
based thermal estimation equation T = δ ·P, where δ refers to was first introduced to the EDA community. It solves the
the ratio of thickness of the chip over the thermal conductivity decoupling capacitor insertion problem for power grid noise
of the material and P refers to the power density. Based on it, mitigation, which is quite different from our problem.
a model called heat diffusion model is proposed in [18] which The main idea of the cross entropy optimization framework
uses total heat diffusion of a block as an approximation of the is to cast a determinstic optimization problem into a stoach-
temperature of the block. Basically, the total heat diffusion stic optimization problem with mathematical rigor. In our
of any block is computed as the sum of the heat diffusion problem context, the thermal-driven floorplanning problem
between this block and each of its adjacent blocks, where the will be tackled iteratively. A key component in the cross
heat diffusion between two adjacent blocks is proportional to entropy optimization is a probabilistic density function (PDF)
the difference in their power densities as well as their shared called solution PDF. It models the distribution of
boundary length. candidate floorplanning solutions which will be constructed
and updated throughout the optimization. During each iter-
B. PROBLEM FORMULATION ation, a set of cross entropy samples will be generated
Following the literature, a set of n blocks are given in the according to the solution PDF, where each sample represents
floorplanning problem. They are interconnected by some a candidate floorplan. Each cross entropy sample will be
nets, where each net consists of some pins located at then evaluated. That is, the peak temperature of a candidate
some blocks. The standard wirelength driven floorplanning floorplan can be evaluated through performing the thermal
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction IN COMPUTING
simulation on it. Subsequently, a few samples with the lowest with high possibility. Subsequently, one can just find the
peak temperatures will be used to update the solution PDF. maximum lower bound x ∗ such that θ ( f (x ∗ )) still
In the next iteration, the updated solution PDF will be used to approaches 0. As θ (x ∗ ) cannot be directly computed in a
generate new cross entropy samples. They are then evaluated closed form, Monte Carlo simulations will be performed in
through thermal simulation and the top few samples are the cross entropy optimization. For this, n samples, denoted
used to update the solution PDF. This process is repeated by X = {x1 , x2 , . . . , xn }, are generated according to the
until convergence. In our problem, the adjacency probabil- solution PDF Pr(x, u). One then computes ψ(f (x) ≤ f (x ∗ ))
ity matrix will be used as the solution PDF as indicated for each sample and computes the average among all samples
in Section III-B. Refer to Figure 2 for a simple illustration to estimate θ( f (x ∗ )) as [25]
of the above process. m
1X
θ̃ (f (x ∗ )) = ψ(f (xi ) ≤ f (x ∗ )). (2)
m
i=1
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
IN COMPUTING Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction
matrix A is the solution PDF for our problem. The solutions adjacent to b2 and the smaller a1,2 indicates that b1 and b2
quality are evaluated by a weighted sum of both thermal cost tend not to be adjacent to each other. Similarly, the probability
and the half-perimeter wirelength (HPWL) of the floorplan. to choose b2 for relocation is 1 − a1,2 · a2,3 , to choose b3 for
The thermal cost value can by computed by the model which relocation is 1 − a2,3 · a3,4 , and to choose b4 for relocation
has been discussed in the Section II. A, and the HPWL value is 1 − a3,4 .
of the floorplan is calculated by summing up the HPWL of After all k samples are generated, their thermal profiles
all nets. will be evaluated. Since evaluating the thermal profile of a
At a high level, our cross entropy based thermal driven floorplan using the full-fledge thermal simulation is compu-
floorplanning algorithm works as follows. In the first iter- tationally expensive, the thermal evaluation strategy which
ation, starting from an initial floorplan, a set of k samples integrates the rough estimation with the accurate thermal
which are candidate floorplans are generated according to an simulation will be used. Basically, for several iterations the
initial adjacency probability matrix A. The thermal profile full-fledged thermal simulation will be performed using [22]
of each sample will be evaluated and the top ω samples while for the other iterations the fast estimation using [18]
with the lowest peak temperatures will be used to update will be used.
the adjacency probability matrix, where ω is a user specified After evaluating the k samples, the top ω samples with the
parameter. In the second iteration, with the updated adjacency lowest peak temperatures will be used to update the adjacency
probability matrix A, a set of new k samples will be generated. probability matrix as follows. In each of these top cross
The thermal profile of each sample will be evaluated and entropy samples, if bi is adjacent to bj , then the entry ai,j
and the top ω samples will be used to update the adjacency in the adjacency probability matrix will be increased by a
probability matrix. This process is repeated until certain small value δ. This will be performed for all of the ω cross
stopping criterion is satisfied. entropy samples. After this update, the adjacent probability
We are to illustrate the details of the above process using a matrix might not represent a valid probability P distribution
simple one-dimensional floorplanning problem. Suppose that and thus scaling will be performed to ensure nj=1 ai,j = 1
our floorplanning problem consists of four blocks b1 , . . . , b4 for each i. The stopping criterion for the cross entropy
and an initial floorplan is given where the four blocks based thermal driven floorplanning technique is designed as
b1 , b2 , b3 , b4 are located from left to right in one dimension. follows. When the average peak temperature of the top ω
Given an initial adjacency PDF, we are to describe how solutions in the current iteration is sufficiently close to that
to generate a single cross entropy sample and other cross in the previous iteration, the cross entropy algorithm finishes.
entropy samples can be similarly generated. In generating In the implementation, the convergence condition is set to be
a cross entropy sample, suppose that b1 is to be relocated either the solutions qualities are changing with a small range
while relative locations among all other blocks will be kept or the algorithm reaches the maximum iteration number.
unchanged. Recall that the adjacency probability matrix In the experimental part, this number is set to be 1000 to
models the probability for blocks to be adjacent to each other. ensure the runtime of the proposed algorithm is always with
The probability of relocating b1 to be left of b2 (i.e., no 15 minutes for the real case.
relocation is performed) is a1,2 , to be between b2 and b3 is
a1,2 · a1,3 , to be between b3 and b4 is a1,3 · a1,4 , and to the IV. RESOURCE PROVISIONING FOR THERMAL
right of b4 is a1,5 . These probability values will be used to DRIVEN FLOORPLANNING
determine the new locations of b1 . For this, one can sum In the cross entropy based thermal-driven floorplanning, a
up the above four probability values and generate a random set of cross entropy samples are generated in each iteration
number between 0 and the sum according to uniform distri- whose thermal profiles are then evaluated using either fast
bution. Treating each probability value as an interval and link estimation [18] or full-fledged thermal simulation [22]. Our
all the intervals in a non-overlapping fashion to form a single experiments show that the runtime bottleneck of our tech-
long interval. The generated random number can be viewed nique is due to the full-fledged thermal simulations. Thus,
as a point in the long interval and then the interval which the process of evaluating cross entropy samples using full-
the random number falls in can be found. Subsequently, the fledged thermal simulations will be parallelized in the cloud
block b1 can be relocated to the location corresponding to computing environment. We call the problem of evaluating
the interval. The probabilities corresponding to relocate other a cross entropy sample using the full-fledged thermal sim-
blocks can be similarly computed. ulation the subproblem. We are to design a new resource
To generate a cross entropy sample, z relocations will be provisioning (scheduling) technique to efficiently schedule
performed where z is a user specified parameter. In each subproblems in the cloud in Section IV-A and then enhance
relocation, the block to be relocated is first determined and it to consider the uncertainty in waiting time in Section IV-B.
the location it will be moved to is determined by the above Recall that the EDA software provider, who computes the
process. Which block to be relocated will be determined by floorplan using our technique in the cloud, is a customer of the
the adjacency probability using the current floorplan. In the cloud computing service provider (Amazon EC2) and it needs
above initial floorplan, b1 , . . . , b4 , the probability to choose to pay for using the cloud computers. Such a charge depends
b1 for relocations is 1 − a1,2 since b1 is currently only on the type of the machines it uses and the time it uses.
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction IN COMPUTING
Moreover, the waiting time, which is defined as the duration to be considered, ri is updated to include wi which denotes
between the time when subproblems are sent to a cloud the average waiting time on type-i processors. Let rf denote
computer and the actual starting time of solving the the finish time, i.e., the finishing time of the last subproblem.
subproblems, is important. Waiting time could be significant It can be computed as rf = max{r1 , r2 , . . . , rm }. The finish
since when the software provider reserves a cloud computer, time needs to be bounded by a deadline T , specified by the
it tends to maximize its usage through launching various user. The objective of the ILP is to minimize the overhead
applications (floorplanning, routing, etc) on the reserved cost which depends on the process types and how long a
machine. Further, since it is difficult to know the exact processor is used. Let oi denote the overhead cost when the
waiting time during the scheduling, the uncertainty in waiting type-i processor has been employed for one second and
time needs to be considered. oi is provided from the cloud service provider. Note that the
It is worth noting that although there are a large number of runtime on type-i processor ri is the actual finishing time on
computers in a commercial cloud, the number of computers, this processor which includes waiting time. In Amazon EC2,
where the target softwares have been installed and thus can waiting time will not be charged and only the time between
be used in optimization, would be quite limited. As in our the begin and the end of a program is charged. Thus, the vari-
problem, the number of cloud computers installed with ther- able ti is introduced to compute the actual usage on the type-i
mal simulators is limited due to the machine reservation fee processor. Subsequently, the overhead cost can be computed
as m
P
and the maintenance cost. For example, Amazon EC2 could i=1 oi ti . Finally, note that although the waiting time is not
charge $1820 per year (depending on the type of computers) charged, it still needs to be in the formulation since it impacts
for reserving only a single cloud computer. Note that as the scheduling solution due to the deadline constraint. The
indicated in Amazon EC2, when a cloud computer is actually ILP formulation can be formulated as follows.
used, it will be further charged and the rate is proportional to Xm
the runtime. min oi ti
i=1
m
A. ILP FORMULATION X
s.t. ci = k,
We will first characterize cloud computers using types.
i=1
One can use multiple metrics such as frequency and γ f0 ci
memory to define type. In this work, for illustration purpose ri = + wi , ∀i
f i mi
the processors are characterized by frequencies. However, rf = max{r1 , r2 , . . . , rm },
other metrics can be integrated into our technique as well.
rf ≤ T,
In this subsection, the nominal waiting time of each processor
γ f0 ci
is used, while in the next subsection, the variations in waiting ti = , ∀i
time will be considered. The nominal waiting time can be fi
0 ≤ ci ≤ k, ∀i
obtained through taking the average waiting time of the recent
ci ∈ N , ∀i (4)
runs on the cloud computers.
A new integer linear programming (ILP) based resource Note that the above ILP formulation only computes a rough
provisioning technique considering pricing and waiting time assignment of the subproblems and it will be discretized in
is designed as follows. Recall that in the cross entropy a straightforward fashion to compute the actual schedule.
optimization framework, a subproblem corresponds to a In addition, the scalability of the ILP formulation is not an
sample and the evaluation of each sample takes similar time issue in our cross entropy based floorplanning technique.
compared to each other. Suppose that there are m types of The reason is that each time only a small number (50 in our
processors, where m should be small in practice since the experiments) of samples are generated, and thus at most the
cloud computers installed with the thermal simulators are same number of computers are needed and the number of
limited. Denote by k the number of subproblems to be sched- machine types is even fewer.
uled, i.e., there are k cross entropy samples in each iteration.
Let ci (1 ≤ i ≤ m) denote the number of subproblems B. UNCERTAINTY-AWARE RESOURCE PROVISIONING
assigned to type-i processors. Denote by fi the frequency The above ILP formulation for resource provisioning does not
of a type-i processor. Since each subproblem takes similar consider the uncertainty in waiting time, which is however
time to solve, solving a single subproblem on a processor important. Thus, an uncertainty-aware ILP resource provi-
with frequency f0 can be assumed to approximately take sioning technique will be designed. It follows our previous
γ seconds which can be estimated offline through performing work [30] which designs an fault adaptation parameterized
the thermal simulation on a subproblem. When the subprob- technique to handle faults in real-time scheduling. In this
lem executes on a type-i processor, the runtime is f0 /fi · γ . work, we similarly introduce a parameter α to handle the
In the scheduling, we assume that only mi processors are uncertainty-aware optimization in our problem context. The
available for type-i processors and then the runtime spent waiting time wi in type-i processors will be modeled as
on type-i processors, denoted by ri , can be approximately (1 − α)L(wi ) + αU (wi ), where L(wi ) denotes the lower bound
computed as ri = f0 /fi ·γ ·ci /mi . Since the waiting time needs of wi and U (wi ) denotes the upper bound of wi . In practice,
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
IN COMPUTING Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction IN COMPUTING
use heat diffusion based fast thermal estimation [18] to For comparison, we also implement the simulated annealing
evaluate samples. Among the latter, there are 8 iterations based thermal driven floorplanning technique. Note that it is
each of which involves 1 full-fledged thermal simulation for a sequential algorithm which cannot be parallelized and we
further improving the thermal profile. There are 50 samples assume that it runs on a machine with frequency 2.0 GHz
generated in each iteration of the cross entropy optimiza- as above. Comparing the two solutions, the peak temperature
tion. Our experiments show that the cross entropy iterations (evaluated using the full-fledged thermal simulation [22]) of
involving full-fledged thermal simulations is the runtime the simulated annealing is 136.4 ◦ C, which is significantly
bottleneck of our algorithm, which contributes over 90% (about 24 ◦ C) higher than the peak temperature 112.5 ◦ C
of the total runtime. Thus, we only need to parallelize the of our cross entropy optimization technique. Their thermal
3 iterations with full-fledged thermal simulations in cloud. profiles are shown in Figure 4, Figure 5 and Figure 6. It is
For the remaining part, we assume that it runs on a machine clear that our algorithm leads to much more balanced thermal
with frequency 2.0 GHz. distribution compared to the simulated annealing technique.
In addition, due to the fact that our cross entropy optimization
is parallelization friendly, its (simulated parallel) runtime
is only 820.3 seconds in contrast to the runtime of 1235.6
seconds (about 34% slower) using the simulated annealing
technique. Since the total machine usage over all computers
of our technique is more than that of simulated annealing,
the monetary cost of our technique is larger. However, the
large improvement in thermal profile outweighs the overhead
in monetary cost. Further, it can be seen that the wirelength
degradation of our cross entropy technique is minor compared
to the simulated annealing solution.
FIGURE 4. Thermal profile for n100. (a) Floorplan from thermal
driven simulated annealing optimization with peak temperature
62.9 ◦ C. (b) Floorplan from thermal driven cross entropy
optimization with peak temperature 56.9 ◦ C.
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
IN COMPUTING Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction
TABLE 1. Comparison of floorplanning results on large GSRC benchmark circuits. HPWL refers to the half perimeter wire length, peak
temp. refers to the peak temperature, SA refers to simulated annealing, CE refers to cross entropy, runtime refers to the runtime in
seconds in the simulated cloud computing environment. no waiting time is considered in each computer.
TABLE 2. Comparison of best case, worst case and uncertainty-aware resource provisioning of the proposed ILP formulation on the
floorplanning solution considering the variations in waiting time.
• The simulated runtime for our cross entropy optimiza- evaluated using the Monte Carlo simulations, its runtime
tion includes the runtime for ILP solving, which is actually violated the deadline constraint. One can see
quite efficient (<1 second) due to the fact that there are that the runtime of Table 1 is similar the runtime in the
only 50 samples which need to be scheduled. Thus, our best case design since both cases assume no or little
ILP formulation is fast enough for resource provisions waiting time.
for our problem. • The worst case design computes the scheduling assum-
• Although our cross entropy based optimization leads ing that the waiting time is always set to the upper
to more monetary cost, our technique is still more bound on the waiting time (i.e., µ + 3σ ). Thus, it tends
desired than the simulated annealing technique since the to use fast machines as it is relatively hard to satisfy
large improvement in thermal profile outweighs the cost the deadline constraint. Due to this, its scheduling is
overhead. conservative and it needs to use many fast machines
We next perform the experiments considering the to meet the deadline. Consequently, its monetary cost
uncertainty in waiting time using our uncertainty-aware is high.
ILP based resource provisioning. In the experiments, since • The uncertainty-aware ILP formulation computes
we cannot access the historical data of the waiting time in a good tradeoff between the above two methods.
Amazon EC2, we assume that the waiting time follows a Compared to worst case design, it saves the runtime
Gaussian distribution with µ = 10 seconds and σ = 2 while compared to best case design, it always meets the
seconds on each machine. Multiple α are searched in the deadline for 99% cases.
experiments, which are from 0.1 to 1.0 with a step size of 0.1. VI. CONCLUSION
For each α, an ILP is formulated and solved, whose solution In this paper, we propose the first thermal driven floorplan-
is then evaluated using 200 Latin Hypercube Monte Carlo ning technique suitable for a cloud computing environment.
samples generated according to the distribution on waiting Our technique includes a parallelization friendly adjacency
time. The deadline constraint is still set to 100 seconds for probability cross entropy optimization based thermal driven
each cross entropy iteration involving full-fledged thermal floorplanning and a new integer linear programming based
simulations. R is set to 99% to account for most cases in resource provisioning technique. The experimental results on
practice. Note that the solution quality of floorplanning does GSRC benchmark circuits demonstrate that our parallelized
not change. Only the scheduling solution and thus the cost technique can reduce the peak temperature by up to 24 ◦ C
and runtime are changed (note that even if Amazon EC2 compared to the simulated annealing technique while still
does not charge the user due to waiting time, the cost still running over 30% faster in the simulated cloud computing
changes since the scheduling solution is different). The environment.
cost and time corresponding to the 99%-th sample in the ACKNOWLEDGMENT
Monte Carlo simulation are summarized in Table 2. We make This study was supported by the Fundamental Research
the following observations. Funds for Central Universities, China University of
• The best case design computes the schedule assuming Geosciences (Wuhan) (No. CUG140612).
that the waiting time is always set to the lower bound on
REFERENCES
the waiting time (i.e., µ − 3σ ). Thus, it tends to use slow
[1] F. Marozzo, D. Talia, and P. Trunfio, ‘‘A cloud framework for parameter
machines as it is relatively easy to satisfy the deadline sweeping data mining applications,’’ in Proc. IEEE 3rd Int. Conf. Cloud
constraint. However, when the scheduling solution is Comput. Technol. Sci. (CloudCom), Nov./Dec. 2011, pp. 367–374.
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction IN COMPUTING
[2] Y. Charalabidis, S. Koussouris, and A. Ramfos, ‘‘A cloud infrastructure for [27] G. Alon, D. P. Kroese, T. Raviv, and R. Y. Rubinstein, ‘‘Application of
collaborative digital public services,’’ in Proc. IEEE 3rd Int. Conf. Cloud the cross-entropy method to the buffer allocation problem in a simulation-
Comput. Technol. Sci. (CloudCom), Nov./Dec. 2011, pp. 340–347. based environment,’’ Ann. Oper. Res., vol. 134, no. 1, pp. 137–151, 2005.
[3] D. Yuan, Y. Yang, X. Liu, and J. Chen, ‘‘A local-optimisation based strategy [28] X. Zhao, Y. Guo, X. Chen, Z. Feng, and S. Hu, ‘‘Hierarchical
for cost-effective datasets storage of scientific applications in the cloud,’’ cross-entropy optimization for fast on-chip decap budgeting,’’ IEEE
in Proc. IEEE Int. Conf. Cloud Comput. (CLOUD), Jul. 2011, pp. 179–186. Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 30, no. 11,
[4] H. Li, L. Zhong, J. Liu, B. Li, and K. Xu, ‘‘Cost-effective partial migra- pp. 1610–1620, Nov. 2011.
tion of VoD services to content clouds,’’ in Proc. IEEE Int. Conf. Cloud [29] P.-T. de Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, ‘‘A tuto-
Comput. (CLOUD), Jul. 2011, pp. 203–210. rial on the cross-entropy method,’’ Ann. Oper. Res., vol. 134, no. 1,
[5] S. Chaisiri, B.-S. Lee, and D. Niyato, ‘‘Optimization of resource provision- pp. 19–67, 2005.
ing cost in cloud computing,’’ IEEE Trans. Services Comput., vol. 5, no. 2, [30] T. Wei, X. Chen, and S. Hu, ‘‘Reliability-driven energy-efficient
pp. 164–177, Apr./Jun. 2012. task scheduling for multiprocessor real-time systems,’’ IEEE Trans.
[6] W. Haider and A. Wahab, ‘‘A review on cloud computing architectures and Comput.-Aided Design Integr. Circuits Syst., vol. 30, no. 10,
applications,’’ Comput. Eng. Intell. Syst., vol. 2, no. 4, pp. 206–210, 2011. pp. 1569–1573, Oct. 2011.
[7] N. Mokhoff, ‘‘DAC: EDA preps to embrace cloud computing,’’ EETimes, [31] M. D. Mckay, R. J. Beckman, and W. J. Conover, ‘‘Comparison of three
San Francisco, CA, USA, Tech. Rep., Jun. 2010. methods for selecting values of input variables in the analysis of output
[8] Y.-C. Chang, Y.-W. Chang, G.-M. Wu, and S.-W. Wu, ‘‘B∗ -trees: A new from a computer code,’’ Technometrics, vol. 21, no. 2, pp. 239–245, 1979.
representation for non-slicing floorplans,’’ in Proc. IEEE/ACM Design
Autom. Conf. (DAC), Jun. 2000, pp. 458–463. XIAODAO CHEN received the B.Eng. degree in telecommunication from
[9] M. Armbrust et al., ‘‘Above the clouds: A Berkeley view of cloud com- the Wuhan University of Technology, Wuhan, China, in 2006, the M.Sc.
puting,’’ Dept. Elect. Eng. Comput. Sci., Univ. California at Berkeley, degree in electrical engineering from Michigan Technological University,
Berkeley, CA, USA, Tech. Rep. UCB/EECS-2009-28, 2009.
Houghton, USA, in 2009, and the Ph.D. degree in computer engineering from
[10] Q. Tang, S. K. S. Gupta, and G. Varsamopoulos, ‘‘Energy-efficient thermal-
Michigan Technological University, Houghton, USA, in 2012.
aware task scheduling for homogeneous high-performance computing data
He is currently an Associate Professor with the School of Computer
centers: A cyber-physical approach,’’ IEEE Trans. Parallel Distrib. Syst.,
vol. 19, no. 11, pp. 1458–1472, Nov. 2008. Science, China University of Geosciences, Wuhan.
[11] G. Chen et al., ‘‘Energy-aware server provisioning and load dispatching
for connection-intensive Internet services,’’ in Proc. 5th Int. Symp. Netw. LIZHE WANG (SM’09) received the B.Eng. (Hons.) degree and the M.Eng.
Syst. Design Implement. (NSDI), 2008, pp. 337–350. degree from Tsinghua University, Beijing, China, and the D.Eng. (magna
[12] E. Pakbaznia and M. Pedram, ‘‘Minimizing data center cooling and server cum laude) degree in applied computer science from the Karlsruhe Institute
power costs,’’ in Proc. Int. Symp. Low Power Electron. Design (ISLPED), of Technology, Karlsruhe, Germany.
2009, pp. 145–150. He is a 100-Talent Program Professor with the Institute of Remote Sens-
[13] E. Pakbaznia, M. Ghasemazar, and M. Pedram, ‘‘Temperature-aware ing and Digital Earth, Chinese Academy of Sciences, Beijing, China and
dynamic resource provisioning in a power-optimized datacenter,’’ in Proc. a ChuTian Chair Professor with the School of Computer Science, China
Design Autom. Test Eur., Mar. 2010, pp. 124–129. University of Geosciences, Wuhan, China.
[14] J. Xu and J. Fortes, ‘‘A multi-objective approach to virtual machine man- Prof. Wang is a fellow of IET and BCS.
agement in datacenters,’’ in Proc. ACM Int. Conf. Auto. Comput., 2011,
pp. 225–234.
[15] J. Cong and Y. Zhang, ‘‘Thermal-aware physical design flow for 3-D ICs,’’ ALBERT Y. ZOMAYA (F’04) is currently the Chair Professor of High
in Proc. 23rd Int. VLSI Multilevel Interconnection Conf. (VMIC), 2006, Performance Computing and Networking and an Australian Research Coun-
pp. 73–80. cil Professorial Fellow with the School of Information Technologies, The
[16] J. Cong, J. Wei, and Y. Zhang, ‘‘A thermal-driven floorplanning algorithm University of Sydney. He is also the Director of the Centre for Distributed
for 3D ICs,’’ in Proc. IEEE Int. Conf. Comput.-Aided Design (ICCAD), and High Performance Computing, which was established in late 2009.
Nov. 2004, pp. 306–313. Prof. Zomaya received the Ph.D. degree from the Department of Auto-
[17] V. Nookala, D. J. Lilja, and S. S. Sapatnekar, ‘‘Temperature-aware floor- matic Control and Systems Engineering, Sheffield University, U.K. He held
planning of microarchitecture blocks with IPC-power dependence mod- the CISCO Systems Chair Professor of Internet working from 2002 to 2007,
eling and transient analysis,’’ in Proc. Int. Symp. Low Power Electron. and also was the Head of School for 2006-2007 in the same school. Prior
Design (ISLPED), 2006, pp. 298–303. to his current appointment, he was a Full Professor with the School of
[18] Y. Han and I. Koren, ‘‘Simulated annealing based temperature aware Electrical, Electronic and Computer Engineering, University of Western
floorplanning,’’ J. Low Power Electron., vol. 3, no. 2, pp. 141–155, 2007. Australia, where he also led the Parallel Computing Research Laboratory
[19] T.-C. Chen and Y.-W. Chang, ‘‘Modern floorplanning based on fast sim- from 1990 to 2002. He served as an Associate-, a Deputy-, and an Acting-
ulated annealing,’’ in Proc. ACM Int. Symp. Phys. Design (ISPD), 2005, Head in the same department, and held numerous visiting positions and has
pp. 104–112. extensive industry involvement.
[20] T.-C. Chen, Y.-W. Chang, and S.-C. Lin, ‘‘IMF: Interconnect-driven
multilevel floorplanning for large-scale building-module designs,’’ in
Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2005, LIN LIU is currently pursuing the Ph.D. degree with the Department of
pp. 159–164. Electrical and Computer Engineering, Michigan Technological University,
[21] S. Logan and M. R. Guthaus, ‘‘Fast thermal-aware floorplanning using Houghton, USA.
white-space optimization,’’ in Proc. 17th Int. Conf. Very Large Scale
Integr. (VLSI-SoC), Oct. 2009, pp. 65–70. SHIYAN HU (SM’10) received the Ph.D. degree in computer engineering
[22] Z. Feng and P. Li, ‘‘Fast thermal analysis on GPU for 3D-ICs from Texas A&M University, College Station, in 2008.
with integrated microchannel cooling,’’ in Proc. IEEE/ACM Int. Conf. He is currently an Assistant Professor with the Department of Electrical
Comput.-Aided Design (ICCAD), Nov. 2010, pp. 551–555.
and Computer Engineering, Michigan Technological University, Houghton,
[23] J. Cong, G. Luo, and Y. Shi, ‘‘Thermal-aware cell and through-silicon-via
where he serves as the Director of the Michigan Tech VLSI CAD Research
co-placement for 3D ICs,’’ in Proc. 48th ACM/EDAC/IEEE Design Autom.
Laboratory. He was a Visiting Professor with the IBM Austin Research
Conf. (DAC), Jun. 2011, pp. 670–675.
Laboratory, Austin, TX, in 2010. He has over 50 journal and conference
[24] CFD-ACE+ Module Manual, Version I, ESI Group, Paris, France, 2002.
publications. His current research interests include computer-aided design
[25] R. Y. Rubinstein and D. P. Kroese, The Cross-Entropy Method: A Unified
Approach to Combinatorial Optimization, Monte-Carlo Simulation, and for very large-scale integrated circuits on nanoscale interconnect optimiza-
Machine Learning. New York, NY, USA: Springer-Verlag, Jul. 2004. tion, low-power optimization, and design for manufacturability.
[26] K. Chepuri and T. Homem-de-Mello, ‘‘Solving the vehicle routing problem Dr. Hu has served as a Technical Program Committee Member of a
with stochastic demands using the cross-entropy method,’’ Ann. Oper. Res., few conferences, such as ICCAD, ISPD, ISQED, ISVLSI, and ISCAS. He
vol. 134, no. 1, pp. 153–181, 2005. received the Best Paper Award Nomination from ICCAD 2009.
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.