0% found this document useful (0 votes)
64 views10 pages

Cloud Computing Fplan

This document proposes the first VLSI floorplanning algorithm designed for a cloud computing platform. Existing floorplanning techniques are difficult to parallelize, so the paper proposes a new thermal-driven floorplanning algorithm that can be easily parallelized in a cloud environment. The algorithm uses an advanced adjacency probability cross entropy optimization and a new integer linear programming-based resources provisioning to efficiently utilize heterogeneous cloud resources and handle uncertainty in machine waiting times. Experimental results on benchmark circuits show the algorithm can significantly reduce peak temperature compared to simulated annealing, and runs over 30% faster in a simulated cloud while incurring moderate overhead costs due to its parallelization friendliness.

Uploaded by

pankajmudgil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views10 pages

Cloud Computing Fplan

This document proposes the first VLSI floorplanning algorithm designed for a cloud computing platform. Existing floorplanning techniques are difficult to parallelize, so the paper proposes a new thermal-driven floorplanning algorithm that can be easily parallelized in a cloud environment. The algorithm uses an advanced adjacency probability cross entropy optimization and a new integer linear programming-based resources provisioning to efficiently utilize heterogeneous cloud resources and handle uncertainty in machine waiting times. Experimental results on benchmark circuits show the algorithm can significantly reduce peak temperature compared to simulated annealing, and runs over 30% faster in a simulated cloud while incurring moderate overhead costs due to its parallelization friendliness.

Uploaded by

pankajmudgil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IEEE TRANSACTIONS ON

EMERGING TOPICS
IN COMPUTING

Received 12 September 2014; revised 26 February 2015 and 27 May 2015; accepted 5 June 2015.
Date of publication 11 September 2014; date of current version 9 December 2015.
Digital Object Identifier 10.1109/TETC.2015.2443714

Cloud Computing for VLSI Floorplanning


Considering Peak Temperature Reduction
XIAODAO CHEN1 , LIZHE WANG1 , (Senior Member, IEEE),
ALBERT Y. ZOMAYA2 , (Fellow, IEEE), LIN LIU3 , AND SHIYAN HU3 , (Senior Member, IEEE)
1 School of Computer Science, China University of Geosciences, Wuhan 430074, China
2 School of Information Technologies, The University of Sydney, Sydney, NSW 2006, Australia
3 Department of Electrical and Computer Engineering, Michigan Technological University, Houghton, MI 49931 USA

CORRESPONDING AUTHOR: L. WANG ([email protected])

ABSTRACT Cloud computing has recently emerged as a promising computing paradigm, which offers
unprecedented computing power and flexibility in the distributed computing environment. Despite the trend
that electronic design automation industry has prepared to embrace the cloud concept, there is still no research
publication on designing VLSI floorplanning algorithms for a cloud computing platform. This paper proposes
the first such algorithm for thermal driven floorplanning. Since the existing floorplanning techniques are
based on simulated annealing that are sequential algorithms and difficult to parallelize, a new thermal driven
floorplanning algorithm is proposed, which can be easily parallelized in a cloud computing environment.
This algorithm uses an advanced adjacency probability cross entropy optimization and a new integer linear
programming-based resources provisioning to efficiently use the heterogeneous computation resources and
handle the uncertainty of machine waiting time in a cloud. The experimental results on the standard
GSRC benchmark circuits demonstrate that the proposed algorithm can significantly reduce the peak
temperature (up to 24 ◦ C) compared with the simulated annealing technique. In the simulated cloud
computing environment, it runs over 30% faster than the simulated annealing technique with moderate
overhead in monetary expense due to the fact that the proposed algorithm is parallelization friendly. Further,
our algorithm can effectively compute the scheduling solutions considering the uncertainty in waiting time.

INDEX TERMS Cloud computing, green computing, energy-aware computing, VLSI design.

I. INTRODUCTION Services for training EDA tools internally and Cadence has
As an emerging computing paradigm featuring computing as already launched its own SaaS [7].
a service, cloud computing has recently become a research Two of cloud computing features, namely, the massive
focus in the parallel and distributed computing community. computational resources and the pay-per-usage based
One of the most popular cloud computing infrastructure pricing model. To take the advantage of massive
is called Software-as-a-Service (SaaS) which can provide computational resources offered in a cloud, the algorithm
massive computing power to largely improve the needs to be parallelized. On the other hand, since the usage
computational efficiency of softwares. The software of software and computing resources is charged on the
provided by the software service provider (or software pay-per-usage basis, the resource provisioning (scheduling)
provider in short) can be installed in various cloud computers algorithm, which is to allocate the parallelized components
and accessed over the Internet by customers. Many existing to different cloud computers, needs to take the pricing into
commercial cloud service providers adopt this infrastructure consideration.
such as Amazon EC2 and Rackspace Cloud. Refer to Figure 1 (a) for the money flow in the EDA cloude
Despite that cloud computing paradigm has been computing infrastructure. The EDA software customer only
successfully adopted in many research areas [1]–[5], pays to the EDA software provider, while the EDA software
EDA community only recently starts to embrace the cloud provider pays to the cloud computing service provider
concept [6]. In fact, Synopsys has already used Amazon Web due to the machine usage (e.g., an amount of monetary
2168-6750
2015 IEEE. Translations and content mining are permitted for academic research only.
Personal use is also permitted, but republication/redistribution requires IEEE permission.
534 See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. VOLUME 3, NO. 4, DECEMBER 2015

Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction IN COMPUTING

There exist various works on designing cloud


computing algorithms for different applications [1]–[5] but
none of them is related to VLSI floorplanning algorithms.
This work designs the first such algorithm. Note that a cloud
actually refers to both software infrastructure and hardware
infrastructure [9]. The hardware used in a cloud is typically
the datacenter and there are several algorithms proposed to
optimize the power or thermal profile of the computers in a
datacenter in [10]–[14]. In contrast, this work focuses on the
software infrastructure through designing new thermal driven
floorplanning technique and new resource provisioning
algorithm for it.
With fast technology scaling, thermal has manifested
increasingly stronger impacts to threshold voltage, carrier
mobility, saturation velocity and circuit reliability [15]. Thus,
thermal driven floorplanning which targets to minimize
the peak temperature has become a popular research topic
recently [15]–[18]. Following the cloud computing literature,
a processor or a core in a multi-core processor is referred
to as a computational resource or a computational node.
In our thermal-driven floorplanning problem, the customer
first sends a request to a so-called dispatching node, which
is a computational node with the software of the proposed
algorithm installed. At the dispatching node, the software will
compute the floorplanning solution. During the computation,
a set of floorplanning subproblems are formulated and are
dispatched (using the resource provisioning algorithm) to the
other available computational nodes for execution to explore
the parallelism offered in the cloud.
FIGURE 1. (a) The money flow in cloud computing infrastructure To design the above parallelization friendly thermal driven
(b) The scenario chart of cloud computing for EDA. floorplanning technique suitable for the cloud computing
environment, significant amount of works are needed.
expense per hour). This motivates the EDA software provider First of all, one needs to design a parallel floorplanning
to minimize the cloud machine usage so as to maximize the algorithm. However, as is well known, most existing
profit. For example, in Amazon EC2, a software provider can floorplanning techniques are simulated annealing
reserve some cloud computers and install their softwares on based [8], [18]–[21], and are difficult to parallelize.
them. An annual reservation fee will be charged and more Therefore, we will develop an advanced adjacency
importantly, running the software on these computers will probability based cross entropy thermal-driven floorplanning
be further charged. The rate of the latter is determined by technique which is parallelization friendly. Second, one needs
the runtime of the software as well as the machine type to design a new resource provisioning technique to dispatch
(e.g., processor and memory) involved in computation. the subproblems formulated in the above thermal driven
Figure 1 (b) shows a scenario chart for ordinary usage of EDA floorplanning technique to computational nodes in the cloud.
cloud computing. The EDA company receives EDA design For this, an integer linear programming based resource pro-
input files from users and schedules the whole workload to visioning technique will be developed. Further, as the cloud
the cloud according to different strategies. The strategy, for computers are heterogeneous and are often with different
example, can be saving money as much as possible. The work load, the job waiting time (the difference between the
communication is insignificant neither between the users and time the job is submitted and the time it is actually launched)
the EDA company nor between the EDA company and the needs to be considered. Thus, our resource provisioning
cloud. The communication only happens when the procedure technique is further enhanced to handle the uncertainty. The
starts or completes. Furthermore, the transferring data are main contribution of this paper is summarized as follows.
normally from several kilobytes to several megabytes [8]. • An advanced adjacency probability matrix based cross
This work targets to minimize the monetary expense due entropy optimization technique is proposed for thermal
to the cloud machine usage from the perspective of the driven floorplanning, which can be easily parallelized in
EDA software provider. Of course, such a goal needs to be a cloud computing environment.
accomplished in accordance with the high solution quality of • A new integer linear programming based resource
the software to make the product competitive. provisioning technique is proposed to handle the

VOLUME 3, NO. 4, DECEMBER 2015 535

Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
IN COMPUTING Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction

pricing model and the heterogeneous computational problem asks to place the blocks in the fixed outline
resources. (chip size) such that they do not overlap with each other
• The resource provisioning technique is enhanced to and the total wirelength is minimized. The half-perimeter
consider the uncertainty in machine waiting time. wirelength (HPWL) of a net is often used as the
• The experimental results on the standard GSRC approximation of the wirelength in the literature. It is defined
benchmark circuits demonstrate that the proposed as the half perimeter of the smallest bounding box of all the
thermal driven floorplanning technique can significantly pins in the net. Following the existing works [16]–[18], the
reduce the peak temperature (up to 24 ◦ C) compared thermal-driven floorplanning problem can be formulated as
to the simulated annealing technique while running follows.
over 30% faster in the simulated cloud computing Floorplanning for Peak Temperature Minimization: Given
environment due to the fact that the proposed a set of n blocks, denoted by {b1 , b2 , . . . , bn }, the size of
algorithm is parallelization friendly. Further, it can effec- each block, and the chip size, the problem asks to place
tively compute the schedule considering the uncertainty these blocks such that they do not overlap with each other
in waiting time. and the peak temperature is minimized while the total wire-
The rest of the paper is organized as follows. Section II length (HPWL) is still small.
formulates the thermal driven floorplanning problem.
Section III describes the parallelization friendly cross III. THE CROSS ENTROPY BASED THERMAL DRIVEN
entropy based thermal driven flooplanning technique. FLOORPLANNING ALGORITHM
Section IV describes the new resource provisioning To deploy a thermal driven floorplanning technique in a
technique. Section V presents the experimental results with cloud computing environment, one needs to first parallelize
analysis. A summary of work is given in Section VI. it (decompose the original floorplanning problem into a
set of subproblems) and then design a resource provision-
II. PROBLEM FORMULATION ing technique to schedule subproblems to cloud computers.
A. THERMAL MODEL Since most existing floorplanning techniques are simulated
The thermal profile of a floorplan, or the temperature Ti at annealing based techniques [8], [18]–[21], they are essen-
each block i, can be computed using the thermal conductance tially sequential algorithms and are difficult to parallelize.
matrix R and the power density vector P of each block. Therefore, we will first develop a cross entropy optimization
      based thermal-driven floorplanning technique which can be
T1 R11 R12 · · · R1n P1 easily parallelized, and then propose a resource provisioning
T2  R21 R22 · · · R2n  P2  technique in Section IV to deploy it in a cloud computing
 ..  =  .. .. .. ..  ·  ..  (1)
     
.  . . . .  . environment.
Tn Rn1 Rn2 ··· Rnn Pn
A. CROSS ENTROPY FRAMEWORK OVERVIEW
This model has been widely used in the VLSI research field, Our parallel thermal driven floorplanning technique is based
such as [16], [22], and [23]; and it is a fast compact resistive on the advanced cross entropy optimization framework.
model which is not taking too much running time in the Such a framework was originally proposed in [25] and has
proposed design [24]. But this thermal simulation is still com- been been successfully applied in a number of optimization
putationally expensive, fast estimation techniques have been problems such as vehicle routing problem [26] and buffer
proposed. For example, [18] designs a simple power density allocation problem [27]. In [28], this optimization scheme
based thermal estimation equation T = δ ·P, where δ refers to was first introduced to the EDA community. It solves the
the ratio of thickness of the chip over the thermal conductivity decoupling capacitor insertion problem for power grid noise
of the material and P refers to the power density. Based on it, mitigation, which is quite different from our problem.
a model called heat diffusion model is proposed in [18] which The main idea of the cross entropy optimization framework
uses total heat diffusion of a block as an approximation of the is to cast a determinstic optimization problem into a stoach-
temperature of the block. Basically, the total heat diffusion stic optimization problem with mathematical rigor. In our
of any block is computed as the sum of the heat diffusion problem context, the thermal-driven floorplanning problem
between this block and each of its adjacent blocks, where the will be tackled iteratively. A key component in the cross
heat diffusion between two adjacent blocks is proportional to entropy optimization is a probabilistic density function (PDF)
the difference in their power densities as well as their shared called solution PDF. It models the distribution of
boundary length. candidate floorplanning solutions which will be constructed
and updated throughout the optimization. During each iter-
B. PROBLEM FORMULATION ation, a set of cross entropy samples will be generated
Following the literature, a set of n blocks are given in the according to the solution PDF, where each sample represents
floorplanning problem. They are interconnected by some a candidate floorplan. Each cross entropy sample will be
nets, where each net consists of some pins located at then evaluated. That is, the peak temperature of a candidate
some blocks. The standard wirelength driven floorplanning floorplan can be evaluated through performing the thermal

536 VOLUME 3, NO. 4, DECEMBER 2015

Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction IN COMPUTING

simulation on it. Subsequently, a few samples with the lowest with high possibility. Subsequently, one can just find the
peak temperatures will be used to update the solution PDF. maximum lower bound x ∗ such that θ ( f (x ∗ )) still
In the next iteration, the updated solution PDF will be used to approaches 0. As θ (x ∗ ) cannot be directly computed in a
generate new cross entropy samples. They are then evaluated closed form, Monte Carlo simulations will be performed in
through thermal simulation and the top few samples are the cross entropy optimization. For this, n samples, denoted
used to update the solution PDF. This process is repeated by X = {x1 , x2 , . . . , xn }, are generated according to the
until convergence. In our problem, the adjacency probabil- solution PDF Pr(x, u). One then computes ψ(f (x) ≤ f (x ∗ ))
ity matrix will be used as the solution PDF as indicated for each sample and computes the average among all samples
in Section III-B. Refer to Figure 2 for a simple illustration to estimate θ( f (x ∗ )) as [25]
of the above process. m
1X
θ̃ (f (x ∗ )) = ψ(f (xi ) ≤ f (x ∗ )). (2)
m
i=1

In our problem, each xi is a candidate floorplan and


f (x) refers to the peak temperature of each x.
During the above process, Monte Carlo samples are
generated according to the solution PDF Pr(x, u) parameter-
ized by u. To determine the value of u, cross entropy opti-
mization uses the importance sampling technique through
minimizing the Kullback-Leibler distance between two
probabilistic distribution functions. In importance sampling,
the other PDF Pr(x,e u) is used, where e u is known.
A Kullback-Leibler distance is defined, which is also
called cross entropy in information theory, as the distance
between the two PDFs Pr(x, u) and Pr(x,e u). The
Kullback-Leibler distance between Pr(x, u) and Pr(x,e u),
denoted by d(Pr(x, u), Pr(x,e u)) is computed as follows [25].
Pr(x, u)
u)) = Eu [ψ(f (x) ≤ f (x ∗ ))] ln
d(Pr(x, u), Pr(x,e .
Pr(x,eu)
(3)

To minimize the Kullback-Leibler distance d(Pr(x, u),


Pr(x,eu)) is equivalent to looking for a u. An effective way
is to maximize Eu [ψ(f (x) ≤ f (x ∗ ))]ψ(f (x) ≤ f (x ∗ ))
ln Pr(x,e
u) [25].
Denote by u∗ the optimal solution of u. One has
FIGURE 2. Flow chart for the cross entropy based thermal driven
floorplanning algorithm. u∗ = arg max Eu [ψ(f (x) ≤ f (x ∗ ))]ψ(f (x) ≤ f (x ∗ ))
ln Pr(x,e
u). Refer to [25] for the further details.
In details, the cross entropy optimization iteratively
generates a set of cross entropy samples, which will B. THE PROPOSED FLOORPLANNING ALGORITHM
gradually converge to the approximate optimal solution of Our algorithm works under the cross entropy framework.
the problem [25], [29]. Let min f (x) denote a constrained Refer to Figure 2 for an overview. To develop the cross
minimization problem where x denotes the set of variables entropy optimization algorithm for thermal-driven floorplan-
and are restricted to certain space X. A set of solution ning, the key is to design a salient solution PDF. In this work,
PDFs, denoted by Pr(x, u), are constructed on x and u where the adjacency probability between each pair of blocks in the
u is some parameter to be discussed later. Cross entropy floorplan is proposed to be the solution PDF. Precisely, given
optimization will compute a solution x ∗ to be an approximate any two blocks bi and bj , denote by ai,j the adjacency proba-
optimal solution of f (x) through making f (x) ≤ f (x ∗ ) a bility associated with them, which indicates the probability
rare event as follows. Define an indicator function, denoted to place them adjacent to each other in the floorplanning
by ψ(·), such that ψ(f (x) ≤ f (x ∗ )) is equal to 1 when solution. Clearly, large value of adjacency probability means
f (x) ≤ f (x ∗ ) and equal to 0 otherwise. Subsequently, denote that the corresponding blocks will be placed together with
by Eu [·] the expectation function associated with Pr(x, u). high probability. Given n blocks, one can form an adjacency
Define θ ( f (x ∗ )) as the expectation of ψ(f (x) ≤ f (x ∗ )). probability matrix as A = {ai,j } where the entry ai,j is the
That is, θ ( f (x ∗ )) = Eu [ψ(f (x) ≤ f (x ∗ ))] [25]. As θ ( f (x ∗ )) adjacency probability associated with block bi and bj for
is close to 0, i.e., a rare event, x ∗ becomes a lower bound 1 ≤ i, j ≤ n. In particular, we set ai,i = 0 for all i for
of the optimal solution since f (x) is greater than f (x ∗ ) the convenience of illustration. The adjacency probability

VOLUME 3, NO. 4, DECEMBER 2015 537

Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
IN COMPUTING Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction

matrix A is the solution PDF for our problem. The solutions adjacent to b2 and the smaller a1,2 indicates that b1 and b2
quality are evaluated by a weighted sum of both thermal cost tend not to be adjacent to each other. Similarly, the probability
and the half-perimeter wirelength (HPWL) of the floorplan. to choose b2 for relocation is 1 − a1,2 · a2,3 , to choose b3 for
The thermal cost value can by computed by the model which relocation is 1 − a2,3 · a3,4 , and to choose b4 for relocation
has been discussed in the Section II. A, and the HPWL value is 1 − a3,4 .
of the floorplan is calculated by summing up the HPWL of After all k samples are generated, their thermal profiles
all nets. will be evaluated. Since evaluating the thermal profile of a
At a high level, our cross entropy based thermal driven floorplan using the full-fledge thermal simulation is compu-
floorplanning algorithm works as follows. In the first iter- tationally expensive, the thermal evaluation strategy which
ation, starting from an initial floorplan, a set of k samples integrates the rough estimation with the accurate thermal
which are candidate floorplans are generated according to an simulation will be used. Basically, for several iterations the
initial adjacency probability matrix A. The thermal profile full-fledged thermal simulation will be performed using [22]
of each sample will be evaluated and the top ω samples while for the other iterations the fast estimation using [18]
with the lowest peak temperatures will be used to update will be used.
the adjacency probability matrix, where ω is a user specified After evaluating the k samples, the top ω samples with the
parameter. In the second iteration, with the updated adjacency lowest peak temperatures will be used to update the adjacency
probability matrix A, a set of new k samples will be generated. probability matrix as follows. In each of these top cross
The thermal profile of each sample will be evaluated and entropy samples, if bi is adjacent to bj , then the entry ai,j
and the top ω samples will be used to update the adjacency in the adjacency probability matrix will be increased by a
probability matrix. This process is repeated until certain small value δ. This will be performed for all of the ω cross
stopping criterion is satisfied. entropy samples. After this update, the adjacent probability
We are to illustrate the details of the above process using a matrix might not represent a valid probability P distribution
simple one-dimensional floorplanning problem. Suppose that and thus scaling will be performed to ensure nj=1 ai,j = 1
our floorplanning problem consists of four blocks b1 , . . . , b4 for each i. The stopping criterion for the cross entropy
and an initial floorplan is given where the four blocks based thermal driven floorplanning technique is designed as
b1 , b2 , b3 , b4 are located from left to right in one dimension. follows. When the average peak temperature of the top ω
Given an initial adjacency PDF, we are to describe how solutions in the current iteration is sufficiently close to that
to generate a single cross entropy sample and other cross in the previous iteration, the cross entropy algorithm finishes.
entropy samples can be similarly generated. In generating In the implementation, the convergence condition is set to be
a cross entropy sample, suppose that b1 is to be relocated either the solutions qualities are changing with a small range
while relative locations among all other blocks will be kept or the algorithm reaches the maximum iteration number.
unchanged. Recall that the adjacency probability matrix In the experimental part, this number is set to be 1000 to
models the probability for blocks to be adjacent to each other. ensure the runtime of the proposed algorithm is always with
The probability of relocating b1 to be left of b2 (i.e., no 15 minutes for the real case.
relocation is performed) is a1,2 , to be between b2 and b3 is
a1,2 · a1,3 , to be between b3 and b4 is a1,3 · a1,4 , and to the IV. RESOURCE PROVISIONING FOR THERMAL
right of b4 is a1,5 . These probability values will be used to DRIVEN FLOORPLANNING
determine the new locations of b1 . For this, one can sum In the cross entropy based thermal-driven floorplanning, a
up the above four probability values and generate a random set of cross entropy samples are generated in each iteration
number between 0 and the sum according to uniform distri- whose thermal profiles are then evaluated using either fast
bution. Treating each probability value as an interval and link estimation [18] or full-fledged thermal simulation [22]. Our
all the intervals in a non-overlapping fashion to form a single experiments show that the runtime bottleneck of our tech-
long interval. The generated random number can be viewed nique is due to the full-fledged thermal simulations. Thus,
as a point in the long interval and then the interval which the process of evaluating cross entropy samples using full-
the random number falls in can be found. Subsequently, the fledged thermal simulations will be parallelized in the cloud
block b1 can be relocated to the location corresponding to computing environment. We call the problem of evaluating
the interval. The probabilities corresponding to relocate other a cross entropy sample using the full-fledged thermal sim-
blocks can be similarly computed. ulation the subproblem. We are to design a new resource
To generate a cross entropy sample, z relocations will be provisioning (scheduling) technique to efficiently schedule
performed where z is a user specified parameter. In each subproblems in the cloud in Section IV-A and then enhance
relocation, the block to be relocated is first determined and it to consider the uncertainty in waiting time in Section IV-B.
the location it will be moved to is determined by the above Recall that the EDA software provider, who computes the
process. Which block to be relocated will be determined by floorplan using our technique in the cloud, is a customer of the
the adjacency probability using the current floorplan. In the cloud computing service provider (Amazon EC2) and it needs
above initial floorplan, b1 , . . . , b4 , the probability to choose to pay for using the cloud computers. Such a charge depends
b1 for relocations is 1 − a1,2 since b1 is currently only on the type of the machines it uses and the time it uses.

538 VOLUME 3, NO. 4, DECEMBER 2015

Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction IN COMPUTING

Moreover, the waiting time, which is defined as the duration to be considered, ri is updated to include wi which denotes
between the time when subproblems are sent to a cloud the average waiting time on type-i processors. Let rf denote
computer and the actual starting time of solving the the finish time, i.e., the finishing time of the last subproblem.
subproblems, is important. Waiting time could be significant It can be computed as rf = max{r1 , r2 , . . . , rm }. The finish
since when the software provider reserves a cloud computer, time needs to be bounded by a deadline T , specified by the
it tends to maximize its usage through launching various user. The objective of the ILP is to minimize the overhead
applications (floorplanning, routing, etc) on the reserved cost which depends on the process types and how long a
machine. Further, since it is difficult to know the exact processor is used. Let oi denote the overhead cost when the
waiting time during the scheduling, the uncertainty in waiting type-i processor has been employed for one second and
time needs to be considered. oi is provided from the cloud service provider. Note that the
It is worth noting that although there are a large number of runtime on type-i processor ri is the actual finishing time on
computers in a commercial cloud, the number of computers, this processor which includes waiting time. In Amazon EC2,
where the target softwares have been installed and thus can waiting time will not be charged and only the time between
be used in optimization, would be quite limited. As in our the begin and the end of a program is charged. Thus, the vari-
problem, the number of cloud computers installed with ther- able ti is introduced to compute the actual usage on the type-i
mal simulators is limited due to the machine reservation fee processor. Subsequently, the overhead cost can be computed
as m
P
and the maintenance cost. For example, Amazon EC2 could i=1 oi ti . Finally, note that although the waiting time is not
charge $1820 per year (depending on the type of computers) charged, it still needs to be in the formulation since it impacts
for reserving only a single cloud computer. Note that as the scheduling solution due to the deadline constraint. The
indicated in Amazon EC2, when a cloud computer is actually ILP formulation can be formulated as follows.
used, it will be further charged and the rate is proportional to Xm
the runtime. min oi ti
i=1
m
A. ILP FORMULATION X
s.t. ci = k,
We will first characterize cloud computers using types.
i=1
One can use multiple metrics such as frequency and γ f0 ci
memory to define type. In this work, for illustration purpose ri = + wi , ∀i
f i mi
the processors are characterized by frequencies. However, rf = max{r1 , r2 , . . . , rm },
other metrics can be integrated into our technique as well.
rf ≤ T,
In this subsection, the nominal waiting time of each processor
γ f0 ci
is used, while in the next subsection, the variations in waiting ti = , ∀i
time will be considered. The nominal waiting time can be fi
0 ≤ ci ≤ k, ∀i
obtained through taking the average waiting time of the recent
ci ∈ N , ∀i (4)
runs on the cloud computers.
A new integer linear programming (ILP) based resource Note that the above ILP formulation only computes a rough
provisioning technique considering pricing and waiting time assignment of the subproblems and it will be discretized in
is designed as follows. Recall that in the cross entropy a straightforward fashion to compute the actual schedule.
optimization framework, a subproblem corresponds to a In addition, the scalability of the ILP formulation is not an
sample and the evaluation of each sample takes similar time issue in our cross entropy based floorplanning technique.
compared to each other. Suppose that there are m types of The reason is that each time only a small number (50 in our
processors, where m should be small in practice since the experiments) of samples are generated, and thus at most the
cloud computers installed with the thermal simulators are same number of computers are needed and the number of
limited. Denote by k the number of subproblems to be sched- machine types is even fewer.
uled, i.e., there are k cross entropy samples in each iteration.
Let ci (1 ≤ i ≤ m) denote the number of subproblems B. UNCERTAINTY-AWARE RESOURCE PROVISIONING
assigned to type-i processors. Denote by fi the frequency The above ILP formulation for resource provisioning does not
of a type-i processor. Since each subproblem takes similar consider the uncertainty in waiting time, which is however
time to solve, solving a single subproblem on a processor important. Thus, an uncertainty-aware ILP resource provi-
with frequency f0 can be assumed to approximately take sioning technique will be designed. It follows our previous
γ seconds which can be estimated offline through performing work [30] which designs an fault adaptation parameterized
the thermal simulation on a subproblem. When the subprob- technique to handle faults in real-time scheduling. In this
lem executes on a type-i processor, the runtime is f0 /fi · γ . work, we similarly introduce a parameter α to handle the
In the scheduling, we assume that only mi processors are uncertainty-aware optimization in our problem context. The
available for type-i processors and then the runtime spent waiting time wi in type-i processors will be modeled as
on type-i processors, denoted by ri , can be approximately (1 − α)L(wi ) + αU (wi ), where L(wi ) denotes the lower bound
computed as ri = f0 /fi ·γ ·ci /mi . Since the waiting time needs of wi and U (wi ) denotes the upper bound of wi . In practice,

VOLUME 3, NO. 4, DECEMBER 2015 539

Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
IN COMPUTING Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction

one can use the following procedure to estimate the bounds.


Assuming that wi follows a Gaussian distribution with mean
µi and standard deviation σi , setting L(wi ) = µi − 3σi and
U (wi ) = µi +3σ can account for 99% cases. It is worth noting
that this technique is not restricted to Gaussian distribution
and other distribution can be easily handled as well. Different
α can be used to model different variations on waiting time.
For example, when α = 0, the minimum waiting time
(lower bound) will be used and when α = 1, the maximum
waiting time (upper bound) will be used. Varying α, different
tradeoff can be achieved.
We call the objective of ILP formulation m
P
i=1 oi ti the cost
of the scheduling solution. Define the R-percent cost of a
scheduling solution as the cost which can be achieved with
at least R-percent chance when uncertainty in waiting time
is considered. For example, one can set R = 99% to account
for most cases in practice. To compute the R-percent cost, the
following procedure can be used. Generate a large number
(e.g., 5000) of Monte Carlo samples according to the PDF on
waiting time (i.e., the Gaussian distribution with µ and σ ),
and evaluate the cost of the scheduling solution for each
sample. Subsequently, sort all the costs and the R-percent cost
can be then obtained through finding the R-percent largest
value over those 5000 samples.
Our scheduling problem has been cast to a stochastic
scheduling problem which asks to compute a scheduling solu-
tion such that the 99% cost is minimized. To solve it, one first
formulates multiple ILP formulations using different α. That
FIGURE 3. Flow chart for the whole algorithm including
is, ri = γfifm0 ci i + wi is changed to ri = γfifm0 ci i + (1 − α)L(wi ) + cross entropy based thermal driven floorplanning and
αU (wi ) where α can be set to e.g., 0.1, 0.2, . . . , 1.0. uncertainty-aware ILP based resource provisioning.
Subsequently, each ILP formulation will be solved using the
standard solver. Each scheduling solution will be evaluated For the reservation based model provided in Amazon EC2,
to compute the R-percent cost. The scheduling solution with we choose two types of computers for Linux OS. They are
the smallest R-percent cost will be returned. The above pro- ‘‘Standard Reserved Instance (Large) at US West (Northern
cess needs to perform Monte Carlo simulations. It usually California)’’ running at frequency f1 with rate $0.196 per
needs a large number of Monte Carlo samples, which is hour and ‘‘High-CPU Reserved Instances (Extra Large) at
not computationally efficient. Thus, as in [30] the standard US West (Northern California)’’ running at frequency f2 with
Latin Hypercube sampling technique [31] will be used for rate $0.5 per hour. Note that Amazon EC2 does not specify
acceleration. Basically, such a technique allows the usage of the frequencies of those computers while they measure them
much fewer (e.g., 200) samples to approximate the simulation with a new metric called Amazon EC2 Compute Unit. In this
results generated from the full-fledged Monte Carlo simula- work, we assume that f1 is 2.0 GHz and f2 is 3.25 GHz,
tion. Refer to the details in [31]. The flow chart of the whole which are reasonable estimation of commonly-used fast and
technique is shown in Figure 3. slow machines. Note that this assumption will not impact the
conclusion of our work since we also implement a simulated
V. EXPERIMENTAL RESULTS annealing based thermal floorplanning technique and deploy
A. EXPERIMENTAL SETUP it in the same simulated cloud computing environment for
The proposed adjacency probability matrix based cross comparison.
entropy thermal-driven floorplanning and the uncertainty- B. EXPERIMENTAL RESULTS ON
aware resource provisioning technique are implemented in GSRC BENCHMARK CIRCUITS
C++ and tested on a machine with 2.66 GHz CPU and As a case study, we first analyze the performance of the
8 GB memory. The GSRC floorplanning benchmark circuits algorithm on the largest GSRC benchmark circuit n300
are used in experiments. We will simulate the realistic cloud which consists of 300 blocks. In the cross entropy based
computing environment as provided in Amazon EC2 and map thermal driven floorplanning technique, there are 120 cross
the runtime on our machine to the simulated cloud computing entropy iterations. 3 of them use the full-fledged thermal
environment. It is called simulated runtime in this paper. simulation [22] to evaluate samples and the other iterations

540 VOLUME 3, NO. 4, DECEMBER 2015

Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction IN COMPUTING

use heat diffusion based fast thermal estimation [18] to For comparison, we also implement the simulated annealing
evaluate samples. Among the latter, there are 8 iterations based thermal driven floorplanning technique. Note that it is
each of which involves 1 full-fledged thermal simulation for a sequential algorithm which cannot be parallelized and we
further improving the thermal profile. There are 50 samples assume that it runs on a machine with frequency 2.0 GHz
generated in each iteration of the cross entropy optimiza- as above. Comparing the two solutions, the peak temperature
tion. Our experiments show that the cross entropy iterations (evaluated using the full-fledged thermal simulation [22]) of
involving full-fledged thermal simulations is the runtime the simulated annealing is 136.4 ◦ C, which is significantly
bottleneck of our algorithm, which contributes over 90% (about 24 ◦ C) higher than the peak temperature 112.5 ◦ C
of the total runtime. Thus, we only need to parallelize the of our cross entropy optimization technique. Their thermal
3 iterations with full-fledged thermal simulations in cloud. profiles are shown in Figure 4, Figure 5 and Figure 6. It is
For the remaining part, we assume that it runs on a machine clear that our algorithm leads to much more balanced thermal
with frequency 2.0 GHz. distribution compared to the simulated annealing technique.
In addition, due to the fact that our cross entropy optimization
is parallelization friendly, its (simulated parallel) runtime
is only 820.3 seconds in contrast to the runtime of 1235.6
seconds (about 34% slower) using the simulated annealing
technique. Since the total machine usage over all computers
of our technique is more than that of simulated annealing,
the monetary cost of our technique is larger. However, the
large improvement in thermal profile outweighs the overhead
in monetary cost. Further, it can be seen that the wirelength
degradation of our cross entropy technique is minor compared
to the simulated annealing solution.
FIGURE 4. Thermal profile for n100. (a) Floorplan from thermal
driven simulated annealing optimization with peak temperature
62.9 ◦ C. (b) Floorplan from thermal driven cross entropy
optimization with peak temperature 56.9 ◦ C.

FIGURE 6. Thermal profile for n300. (a) Floorplan from thermal


driven simulated annealing optimization with peak temperature
136.4 ◦ C. (b) Floorplan from thermal driven cross entropy
optimization with peak temperature 112.5 ◦ C.
FIGURE 5. Thermal profile for n200. (a) Floorplan from thermal
driven simulated annealing optimization with peak temperature We perform the above experiments on the largest three
92.7 ◦ C. (b) Floorplan from thermal driven cross entropy
optimization with peak temperature 86.3 ◦ C. GSRC benchmark circuits and the results are summarized
in Table 1. We make the following observations.
The average evaluation time of the full-fledged thermal • Comparing our cross entropy based thermal driven
simulation on each sample is 23.0 seconds per sample at floorplanning with simulated annealing technique,
our machine. Thus, it will be scaled to 23.0 · 2.66/3.25 = our algorithm leads to significant peak temperature
18.8 seconds for running on a processor with frequency reduction over all testcases. Our algorithm has
3.25 GHz and scaled to 23.0 · 2.66/2.0 = 30.6 seconds similar wirelength compared to the simulated annealing
with frequency 2.0 GHz. We assume that 10 machines technique.
are available for each processor type. In the objective of • The proposed cross entropy based optimization
ILP formulation, oi is set to $0.196 for 2.0 GHz computer technique is parallelization friendly. In addition, the
and $0.5 for 3.25 GHz computer, respectively. The deadline proposed ILP based resource provisioning technique
constraint in the ILP formulation is set to be 100 seconds. can effectively and efficiently compute the scheduling
Thus, each cross entropy iteration involving full-fledged solution. The simulated parallel runtime of our
thermal simulations needs to finish in 100 seconds. technique is much less than the simulated runtime
We first investigate the performance of the proposed of simulated annealing technique due to the fact that
algorithm without considering waiting time, i.e., wait- simulated annealing technique is not parallelization
ing time is assumed to be 0 second at each computer. friendly.

VOLUME 3, NO. 4, DECEMBER 2015 541

Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
IN COMPUTING Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction

TABLE 1. Comparison of floorplanning results on large GSRC benchmark circuits. HPWL refers to the half perimeter wire length, peak
temp. refers to the peak temperature, SA refers to simulated annealing, CE refers to cross entropy, runtime refers to the runtime in
seconds in the simulated cloud computing environment. no waiting time is considered in each computer.

TABLE 2. Comparison of best case, worst case and uncertainty-aware resource provisioning of the proposed ILP formulation on the
floorplanning solution considering the variations in waiting time.

• The simulated runtime for our cross entropy optimiza- evaluated using the Monte Carlo simulations, its runtime
tion includes the runtime for ILP solving, which is actually violated the deadline constraint. One can see
quite efficient (<1 second) due to the fact that there are that the runtime of Table 1 is similar the runtime in the
only 50 samples which need to be scheduled. Thus, our best case design since both cases assume no or little
ILP formulation is fast enough for resource provisions waiting time.
for our problem. • The worst case design computes the scheduling assum-
• Although our cross entropy based optimization leads ing that the waiting time is always set to the upper
to more monetary cost, our technique is still more bound on the waiting time (i.e., µ + 3σ ). Thus, it tends
desired than the simulated annealing technique since the to use fast machines as it is relatively hard to satisfy
large improvement in thermal profile outweighs the cost the deadline constraint. Due to this, its scheduling is
overhead. conservative and it needs to use many fast machines
We next perform the experiments considering the to meet the deadline. Consequently, its monetary cost
uncertainty in waiting time using our uncertainty-aware is high.
ILP based resource provisioning. In the experiments, since • The uncertainty-aware ILP formulation computes
we cannot access the historical data of the waiting time in a good tradeoff between the above two methods.
Amazon EC2, we assume that the waiting time follows a Compared to worst case design, it saves the runtime
Gaussian distribution with µ = 10 seconds and σ = 2 while compared to best case design, it always meets the
seconds on each machine. Multiple α are searched in the deadline for 99% cases.
experiments, which are from 0.1 to 1.0 with a step size of 0.1. VI. CONCLUSION
For each α, an ILP is formulated and solved, whose solution In this paper, we propose the first thermal driven floorplan-
is then evaluated using 200 Latin Hypercube Monte Carlo ning technique suitable for a cloud computing environment.
samples generated according to the distribution on waiting Our technique includes a parallelization friendly adjacency
time. The deadline constraint is still set to 100 seconds for probability cross entropy optimization based thermal driven
each cross entropy iteration involving full-fledged thermal floorplanning and a new integer linear programming based
simulations. R is set to 99% to account for most cases in resource provisioning technique. The experimental results on
practice. Note that the solution quality of floorplanning does GSRC benchmark circuits demonstrate that our parallelized
not change. Only the scheduling solution and thus the cost technique can reduce the peak temperature by up to 24 ◦ C
and runtime are changed (note that even if Amazon EC2 compared to the simulated annealing technique while still
does not charge the user due to waiting time, the cost still running over 30% faster in the simulated cloud computing
changes since the scheduling solution is different). The environment.
cost and time corresponding to the 99%-th sample in the ACKNOWLEDGMENT
Monte Carlo simulation are summarized in Table 2. We make This study was supported by the Fundamental Research
the following observations. Funds for Central Universities, China University of
• The best case design computes the schedule assuming Geosciences (Wuhan) (No. CUG140612).
that the waiting time is always set to the lower bound on
REFERENCES
the waiting time (i.e., µ − 3σ ). Thus, it tends to use slow
[1] F. Marozzo, D. Talia, and P. Trunfio, ‘‘A cloud framework for parameter
machines as it is relatively easy to satisfy the deadline sweeping data mining applications,’’ in Proc. IEEE 3rd Int. Conf. Cloud
constraint. However, when the scheduling solution is Comput. Technol. Sci. (CloudCom), Nov./Dec. 2011, pp. 367–374.

542 VOLUME 3, NO. 4, DECEMBER 2015

Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON
EMERGING TOPICS
Chen et al.: Cloud Computing for VLSI Floorplanning Considering Peak Temperature Reduction IN COMPUTING

[2] Y. Charalabidis, S. Koussouris, and A. Ramfos, ‘‘A cloud infrastructure for [27] G. Alon, D. P. Kroese, T. Raviv, and R. Y. Rubinstein, ‘‘Application of
collaborative digital public services,’’ in Proc. IEEE 3rd Int. Conf. Cloud the cross-entropy method to the buffer allocation problem in a simulation-
Comput. Technol. Sci. (CloudCom), Nov./Dec. 2011, pp. 340–347. based environment,’’ Ann. Oper. Res., vol. 134, no. 1, pp. 137–151, 2005.
[3] D. Yuan, Y. Yang, X. Liu, and J. Chen, ‘‘A local-optimisation based strategy [28] X. Zhao, Y. Guo, X. Chen, Z. Feng, and S. Hu, ‘‘Hierarchical
for cost-effective datasets storage of scientific applications in the cloud,’’ cross-entropy optimization for fast on-chip decap budgeting,’’ IEEE
in Proc. IEEE Int. Conf. Cloud Comput. (CLOUD), Jul. 2011, pp. 179–186. Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 30, no. 11,
[4] H. Li, L. Zhong, J. Liu, B. Li, and K. Xu, ‘‘Cost-effective partial migra- pp. 1610–1620, Nov. 2011.
tion of VoD services to content clouds,’’ in Proc. IEEE Int. Conf. Cloud [29] P.-T. de Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, ‘‘A tuto-
Comput. (CLOUD), Jul. 2011, pp. 203–210. rial on the cross-entropy method,’’ Ann. Oper. Res., vol. 134, no. 1,
[5] S. Chaisiri, B.-S. Lee, and D. Niyato, ‘‘Optimization of resource provision- pp. 19–67, 2005.
ing cost in cloud computing,’’ IEEE Trans. Services Comput., vol. 5, no. 2, [30] T. Wei, X. Chen, and S. Hu, ‘‘Reliability-driven energy-efficient
pp. 164–177, Apr./Jun. 2012. task scheduling for multiprocessor real-time systems,’’ IEEE Trans.
[6] W. Haider and A. Wahab, ‘‘A review on cloud computing architectures and Comput.-Aided Design Integr. Circuits Syst., vol. 30, no. 10,
applications,’’ Comput. Eng. Intell. Syst., vol. 2, no. 4, pp. 206–210, 2011. pp. 1569–1573, Oct. 2011.
[7] N. Mokhoff, ‘‘DAC: EDA preps to embrace cloud computing,’’ EETimes, [31] M. D. Mckay, R. J. Beckman, and W. J. Conover, ‘‘Comparison of three
San Francisco, CA, USA, Tech. Rep., Jun. 2010. methods for selecting values of input variables in the analysis of output
[8] Y.-C. Chang, Y.-W. Chang, G.-M. Wu, and S.-W. Wu, ‘‘B∗ -trees: A new from a computer code,’’ Technometrics, vol. 21, no. 2, pp. 239–245, 1979.
representation for non-slicing floorplans,’’ in Proc. IEEE/ACM Design
Autom. Conf. (DAC), Jun. 2000, pp. 458–463. XIAODAO CHEN received the B.Eng. degree in telecommunication from
[9] M. Armbrust et al., ‘‘Above the clouds: A Berkeley view of cloud com- the Wuhan University of Technology, Wuhan, China, in 2006, the M.Sc.
puting,’’ Dept. Elect. Eng. Comput. Sci., Univ. California at Berkeley, degree in electrical engineering from Michigan Technological University,
Berkeley, CA, USA, Tech. Rep. UCB/EECS-2009-28, 2009.
Houghton, USA, in 2009, and the Ph.D. degree in computer engineering from
[10] Q. Tang, S. K. S. Gupta, and G. Varsamopoulos, ‘‘Energy-efficient thermal-
Michigan Technological University, Houghton, USA, in 2012.
aware task scheduling for homogeneous high-performance computing data
He is currently an Associate Professor with the School of Computer
centers: A cyber-physical approach,’’ IEEE Trans. Parallel Distrib. Syst.,
vol. 19, no. 11, pp. 1458–1472, Nov. 2008. Science, China University of Geosciences, Wuhan.
[11] G. Chen et al., ‘‘Energy-aware server provisioning and load dispatching
for connection-intensive Internet services,’’ in Proc. 5th Int. Symp. Netw. LIZHE WANG (SM’09) received the B.Eng. (Hons.) degree and the M.Eng.
Syst. Design Implement. (NSDI), 2008, pp. 337–350. degree from Tsinghua University, Beijing, China, and the D.Eng. (magna
[12] E. Pakbaznia and M. Pedram, ‘‘Minimizing data center cooling and server cum laude) degree in applied computer science from the Karlsruhe Institute
power costs,’’ in Proc. Int. Symp. Low Power Electron. Design (ISLPED), of Technology, Karlsruhe, Germany.
2009, pp. 145–150. He is a 100-Talent Program Professor with the Institute of Remote Sens-
[13] E. Pakbaznia, M. Ghasemazar, and M. Pedram, ‘‘Temperature-aware ing and Digital Earth, Chinese Academy of Sciences, Beijing, China and
dynamic resource provisioning in a power-optimized datacenter,’’ in Proc. a ChuTian Chair Professor with the School of Computer Science, China
Design Autom. Test Eur., Mar. 2010, pp. 124–129. University of Geosciences, Wuhan, China.
[14] J. Xu and J. Fortes, ‘‘A multi-objective approach to virtual machine man- Prof. Wang is a fellow of IET and BCS.
agement in datacenters,’’ in Proc. ACM Int. Conf. Auto. Comput., 2011,
pp. 225–234.
[15] J. Cong and Y. Zhang, ‘‘Thermal-aware physical design flow for 3-D ICs,’’ ALBERT Y. ZOMAYA (F’04) is currently the Chair Professor of High
in Proc. 23rd Int. VLSI Multilevel Interconnection Conf. (VMIC), 2006, Performance Computing and Networking and an Australian Research Coun-
pp. 73–80. cil Professorial Fellow with the School of Information Technologies, The
[16] J. Cong, J. Wei, and Y. Zhang, ‘‘A thermal-driven floorplanning algorithm University of Sydney. He is also the Director of the Centre for Distributed
for 3D ICs,’’ in Proc. IEEE Int. Conf. Comput.-Aided Design (ICCAD), and High Performance Computing, which was established in late 2009.
Nov. 2004, pp. 306–313. Prof. Zomaya received the Ph.D. degree from the Department of Auto-
[17] V. Nookala, D. J. Lilja, and S. S. Sapatnekar, ‘‘Temperature-aware floor- matic Control and Systems Engineering, Sheffield University, U.K. He held
planning of microarchitecture blocks with IPC-power dependence mod- the CISCO Systems Chair Professor of Internet working from 2002 to 2007,
eling and transient analysis,’’ in Proc. Int. Symp. Low Power Electron. and also was the Head of School for 2006-2007 in the same school. Prior
Design (ISLPED), 2006, pp. 298–303. to his current appointment, he was a Full Professor with the School of
[18] Y. Han and I. Koren, ‘‘Simulated annealing based temperature aware Electrical, Electronic and Computer Engineering, University of Western
floorplanning,’’ J. Low Power Electron., vol. 3, no. 2, pp. 141–155, 2007. Australia, where he also led the Parallel Computing Research Laboratory
[19] T.-C. Chen and Y.-W. Chang, ‘‘Modern floorplanning based on fast sim- from 1990 to 2002. He served as an Associate-, a Deputy-, and an Acting-
ulated annealing,’’ in Proc. ACM Int. Symp. Phys. Design (ISPD), 2005, Head in the same department, and held numerous visiting positions and has
pp. 104–112. extensive industry involvement.
[20] T.-C. Chen, Y.-W. Chang, and S.-C. Lin, ‘‘IMF: Interconnect-driven
multilevel floorplanning for large-scale building-module designs,’’ in
Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2005, LIN LIU is currently pursuing the Ph.D. degree with the Department of
pp. 159–164. Electrical and Computer Engineering, Michigan Technological University,
[21] S. Logan and M. R. Guthaus, ‘‘Fast thermal-aware floorplanning using Houghton, USA.
white-space optimization,’’ in Proc. 17th Int. Conf. Very Large Scale
Integr. (VLSI-SoC), Oct. 2009, pp. 65–70. SHIYAN HU (SM’10) received the Ph.D. degree in computer engineering
[22] Z. Feng and P. Li, ‘‘Fast thermal analysis on GPU for 3D-ICs from Texas A&M University, College Station, in 2008.
with integrated microchannel cooling,’’ in Proc. IEEE/ACM Int. Conf. He is currently an Assistant Professor with the Department of Electrical
Comput.-Aided Design (ICCAD), Nov. 2010, pp. 551–555.
and Computer Engineering, Michigan Technological University, Houghton,
[23] J. Cong, G. Luo, and Y. Shi, ‘‘Thermal-aware cell and through-silicon-via
where he serves as the Director of the Michigan Tech VLSI CAD Research
co-placement for 3D ICs,’’ in Proc. 48th ACM/EDAC/IEEE Design Autom.
Laboratory. He was a Visiting Professor with the IBM Austin Research
Conf. (DAC), Jun. 2011, pp. 670–675.
Laboratory, Austin, TX, in 2010. He has over 50 journal and conference
[24] CFD-ACE+ Module Manual, Version I, ESI Group, Paris, France, 2002.
publications. His current research interests include computer-aided design
[25] R. Y. Rubinstein and D. P. Kroese, The Cross-Entropy Method: A Unified
Approach to Combinatorial Optimization, Monte-Carlo Simulation, and for very large-scale integrated circuits on nanoscale interconnect optimiza-
Machine Learning. New York, NY, USA: Springer-Verlag, Jul. 2004. tion, low-power optimization, and design for manufacturability.
[26] K. Chepuri and T. Homem-de-Mello, ‘‘Solving the vehicle routing problem Dr. Hu has served as a Technical Program Committee Member of a
with stochastic demands using the cross-entropy method,’’ Ann. Oper. Res., few conferences, such as ICCAD, ISPD, ISQED, ISVLSI, and ISCAS. He
vol. 134, no. 1, pp. 153–181, 2005. received the Best Paper Award Nomination from ICCAD 2009.

VOLUME 3, NO. 4, DECEMBER 2015 543

Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 07:37:32 UTC from IEEE Xplore. Restrictions apply.

You might also like