Hogade 2021
Hogade 2021
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSUSC.2021.3086087, IEEE Transactions on Sustainable Computing
Index Terms — geo-distributed data centers, workload management, peak shaving, net metering, network cost, game theory,
Nash equilibrium
—————————— ◆ ——————————
1 INTRODUCTION
I n recent years, the ever-increasing use of smartphones,
wearable devices, portable computers, Internet-of-Things
consumed at any instant during a billing period, e.g., month
[7]. In some cases, such peak demand charges can be higher
(IoT) devices, etc., has fueled the use of cloud computing. than the energy use portion of the electric bill. Moving work-
Data-intensive applications in the domains of artificial intelli- loads among geo-distributed data centers is one effective ap-
gence, distributed manufacturing systems, smart and con- proach to reduce electricity expenditures. This approach al-
nected energy, and autonomous vehicles are beginning to lev- lows cloud service providers to allocate workloads to data
erage cloud computing extensively [1]. To support these ap- centers with lower energy costs. This not only reduces operat-
plications, cloud service providers are increasingly deploying ing costs for cloud service providers, but also can reduce cloud
new data centers that can bolster the capabilities of their cloud computing costs for customers.
computing offerings. While centralized data centers were Distributing workloads geographically entails an overhead:
common in the past, more recently there has been a trend to- network costs for transferring workloads and their data be-
wards deploying data centers across geographically diverse tween data centers. Many workloads hosted in geographically
locations [2], [3]. Distributing data centers geographically distributed data centers need to transfer data among data
across the globe leads to many benefits. It brings them closer centers for data replication, collection, and synchronization.
to customers offering better performance (e.g., latency) and Such data movement significantly increases the cloud net-
lower network costs. It also provides better resilience to un- working costs. Even though the price of Internet data transfers
predictable failures (e.g., environmental hazards) due to the continues to decline by approximately 30% per year [8], inter-
redundancy that distributed data centers enable. data center traffic is exploding [9]. Moreover, operating en-
Another strong motivation to geographically distribute ergy-efficient data centers at high capacity can sometimes
data centers is to reduce electricity costs by exploiting time- overwhelm the intra-data center servers and networking
of-use (TOU) electricity pricing [4]. The cost of electricity varies equipment, causing a delay in processing of the incoming
based on the time of day and follows the TOU electricity pric- workload. We call this the data center queueing delay, and this
ing model [5]. Electricity prices are higher when the total elec- reduces cloud provider performance and revenues.
trical grid demand is high and falls during periods when elec- Data centers today are energy intensive, and are estimated
trical grid demand is low [6]. For non-residential or commer- to account for around 1% of worldwide electricity use. The to-
cial customers, utilities also charge an additional flat-rate tal energy used by the world’s data centers has doubled over
(peak demand) charge based on the highest (peak) power the past decade and some studies claim that it will triple or
———————————————— even quadruple within the next decade [10]. We define cloud
• N. Hogade is with the Dept. of Electrical and Computer Engineer- operating costs as the dollar energy costs for all geo-dis-
ing, Colorado State University, Fort Collins, CO 80523. E-mail: ni-
[email protected]. tributed data centers plus the dollar inter-data center data
• S. Pasricha and H.J. Siegel are with the Dept. of Electrical and Com- transfer costs Minimizing such costs is important as the an-
puter Engineering, and the Dept. of Computer Science, Colorado nual electricity expenditure for powering data centers is grow-
State University, Fort Collins, CO 80523. E-mail: {sudeep, hj}@colos- ing rapidly: China’s data centers alone are on track to use more
tate.edu.
2377-3782 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
xxxx-xxxx/0x/$xx.00 © 20xx IEEE
See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Published by the IEEE Computer Society
Authorized licensed use limited to: Tencent. Downloaded oninformation. June 30,2021 at 09:30:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSUSC.2021.3086087, IEEE Transactions on Sustainable Computing
energy than all of Australia by 2023 [11]. These energy ex- model in Section 3. Sections 4 and 5 describe our specific
penses annually can sometimes even surpass the costs of pur- problem and the framework we propose to solve it. The sim-
chasing all of the data center equipment. ulation environment is discussed in Section 6. Lastly, we ana-
It should be noted that geo-distributed data centers have lyze and evaluate the results of our approach in Section 7.
significant heterogeneity in operating costs and performance,
because of aspects including: (a) inter-data center network 2 RELATED WORK
costs, that are affected by the amount of data transferred (a) Data Center Electricity Cost, Renewable Power, and Peak
among sites; (b) queueing delay caused by the intra-data cen- Demand: There have been many recent efforts proposing
ter network congestion in the data centers executing at high methods to minimize electricity costs across geo-distrib-
capacity; (c) variable TOU pricing, as data centers are often lo- uted data centers, with the fundamental decisions of the
cated in different times zones; (d) the use of on-site green/re- optimization problem relying on a TOU electricity pricing
newable energy sources, e.g., solar and wind, to various ex- model [15], [16]. Electricity costs are often much higher dur-
tents across sites; (e) the availability (or absence) of net meter- ing peak hours of the day (typically 8 A.M. to 5 P.M.). The
ing, which is a mechanism that gives renewable energy cus- electricity cost models, sometimes in combination with a
tomers credit on their utility bills for the excess clean energy model that considers renewable energy, motivates the use
they sell back to the grid [12]; (f) variable peak demand pric- of optimization techniques to minimize energy cost or, if
ing from utility providers across sites; (g) the availability of di- provided a revenue model, to maximize total profit [17],
verse resources within a data center, such as cooling infra- [18]. A few other papers consider peak demand pricing
structure, and heterogeneity across compute nodes (e.g., from models often in conjunction with energy storage de-
the perspective of power and performance characteristics); vices/batteries to optimize electricity costs [19], [20].
and (h) co-location interference, a phenomenon that occurs (b) Data Center Network Management: Many recent papers
when multiple cores within the same multicore processor are focus on intelligent workload scheduling among cloud cen-
executing tasks simultaneously and compete for shared re- ters to optimize the quality of service (QoS) while increas-
sources, e.g., last-level caches or DRAM. It is important to fac- ing cloud operating profits [21], [22]. As data migration
tor in this heterogeneity while making decisions about work- among cloud data centers increases network costs, some
load management in geo-distributed data centers. efforts try to address this issue by proposing techniques
The goal of this work is to design and evaluate a geograph- that intelligently allocate workloads to reduce overall net-
ical workload distribution solution for geo-distributed data work costs [23], [24]. In [25] and [26], the authors propose
centers that will minimize the cloud operating cost for execut- a multi-objective framework that simultaneously optimizes
ing incoming workloads considering all of the aspects men- the resource wastage and migration costs while meeting
tioned above. This work is applicable to environments where QoS requirements.
execution information about the workloads is readily available
or can be predicted by some form of a workload prediction (c) Game Theory for Data Center Resource Management: A
technique, e.g., [13], [14]. Examples of such environments that few efforts consider game-theoretic approaches for cost opti-
exist in the industry include commercial companies (Digital- mization in geo-distributed cloud data centers. In [27], the au-
Globe, Google), military computing installations (Department thors propose a technique to allocate computing resources
of Defense), and government labs (National Center for Atmos- according to the service subscriber’s requirements by using
pheric Research). More specifically, we propose a novel game non-cooperative game theory. The authors in [28], [29] pro-
theory-based workload management framework that distrib- pose a bandwidth resource management technique for geo-
utes workload among data centers over time, while meeting distributed cloud data centers. In [30], the authors propose an
their performance requirements. The novel contributions of algorithm that tries to reduce data migration costs and max-
our work can be summarized as follows: imize the profit of all cloud service providers involved in a fed-
eration. In [31], the authors use a cooperative game theory to
• we formulate the cloud workload distribution problem as model the cooperative electricity procurement process of data
a non-cooperative game and design a new Nash equilib- centers as a cooperative game and show the cost-saving ben-
rium based intelligent game-theoretic workload distribu- efits of aggregation. In [32], an approach is proposed to ad-
tion framework to minimize the cloud operating cost; dress the tradeoff between QoS and energy consumption in
• our framework simultaneously considers energy and net- cloud data centers. The approach uses a game-theoretic for-
work cost minimization, while satisfying workload perfor- mulation that allows for appropriate loss of QoS while maxim-
mance goals; izing the cloud service provider's profits. In [33], the authors
• our decisions leverage detailed models for a compre- consider a cloud data center and smart grid utility demand-
hensive set of characteristics that impact cloud operating response program. The work proposes a cooperative game
costs and workload performance, including data center theory-based technique to migrate workloads between data
compute and cooling power, co-location performance in- centers and better utilize the benefits provided by demand re-
terference, TOU electricity pricing, renewable energy, net sponse schemes over multiple data center locations. In [34]
metering, peak demand pricing distribution, data center and [35], the authors propose a non-cooperative game-theo-
queueing delay, and the costs involved with inter-data retic workload management technique to minimize data cen-
center data transfers. ter energy costs, while considering TOU electricity pricing in-
We organize the rest of the paper as follows. In Section 2, formation.
we review relevant prior work. We characterize our system Similar to [34] and [35], our work uses information about
2377-3782 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: Tencent. Downloaded oninformation. June 30,2021 at 09:30:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSUSC.2021.3086087, IEEE Transactions on Sustainable Computing
HOGADE ET AL.: ENERGY AND NETWORK COST AWARE WORKLOAD MANAGEMENT FOR GEOGRAPHICALLY DISTRIBUTED HETEROGENEOUS
CLOUD DATA CENTERS 3
data center locations like [34]. However, both [34] and [35] do The CWM performs this assignment such that the total
not consider detailed models for data center compute and cloud operating (energy and network) cost across all data cen-
cooling power, co-location interference, and peak demand ters is minimized, with the constraint that the execution rates
pricing distribution. Moreover, these efforts do not consider of all task types exceed their arrival rates at each data center,
heterogeneity in workload (applications/task types executing i.e., all tasks complete without being dropped or unexecuted.
in the data center) and heterogeneity within the data center At the start of each epoch, the CWM calculates (explained in
(types of servers, server rack arrangements, etc.). Our previous Section 3.3.2) a co-location aware data center maximum exe-
work [36] considers many of the above-mentioned aspects. cution rate 𝐸𝑅𝑖,𝑑 for each task type 𝑖 at each data center 𝑑.
We extend [36] by developing a new inter-data center network Later, it splits global arrival rate 𝐺𝐴𝑅𝑖 (𝜏) for each task type 𝑖
cost model, a data center queueing delay model, and consid- into local data center level arrival rates 𝐴𝑅𝑖,𝑑 (𝜏) such that the
ering these factors in our resource management problem. data center maximum execution rate 𝐸𝑅𝑖,𝑑 (𝜏) exceeds the
Moreover, we propose a new game theory-based workload corresponding arrival rate 𝐴𝑅𝑖,𝑑 (𝜏), thus ensuring the work-
management framework that takes a holistic approach to the load is completed. That is,
cloud operating cost minimization problem, which is shown to
be more effective than the multiple resource management 𝐸𝑅𝑖,𝑑 (𝜏) > 𝐴𝑅𝑖,𝑑 (𝜏), ∀𝑖 ∈ 𝐼, ∀𝑑 ∈ 𝐷. (2)
strategies presented in [36] (Section 7 presents a detailed Note that 𝐸𝑅𝑖,𝑑 (𝜏) must be strictly greater than 𝐴𝑅𝑖,𝑑 (𝜏) to
analysis to quantify the improvement). minimize the expected average queueing delay for task type 𝑖
at data center 𝑑, as discussed later in Section 3.3.8.
3 SYSTEM MODEL
3.1 Overview
Our framework comprises a geo-distributed-level Cloud
Workload Manager (CWM) that distributes incoming work-
load requests to geographically distributed data centers. Each
data center has its own local Data center Workload Manager
(DWM) that takes the workload assigned to it by the CWM
and maps requests to compute nodes within the data center. Fig. 1. Cloud workload manager (CWM) performing geo-distributed
We first describe the system model at the geo-distributed level task (workload) assignment to data centers
level and then provide further details into the models of com- 3.3 Data Center Level Model
ponents at the data center level. We provide a list of abbrevi-
ations and notations in the appendix.
3.3.1 Organization of Each Data Center
Each data center 𝑑 houses 𝑁𝑁𝑑 number of nodes and a
3.2 Geo-Distributed Level Model cooling system comprised of computer room air conditioning
We consider a rate-based workload management scheme, (CRAC) units. Let 𝑁𝐶𝑅𝑑 be the number of CRAC units. Heter-
where the workload arrival rate can be predicted over a deci- ogeneity exists across compute nodes, where nodes vary in
sion interval called an epoch [13], [14]. In our work, an epoch their execution speeds, power consumption characteristics,
length 𝑇 𝑒 is one hour, and a 24-epoch period represents a full and the number of cores. The number of cores in node 𝑛 is
day. Within the short duration of an epoch, the workload arri- 𝑁𝐶𝑁𝑛 , and 𝑁𝑇𝑘 is the node type to which core 𝑘 belongs.
val rates can be reasonably approximated as constant, e.g., the
3.3.2 Co-Location Aware Execution Rates
Argonne National Lab Intrepid log shows mostly constant ar-
rival rates over larger intervals of time [37]. Tasks competing for shared memory in multicore proces-
Let 𝐷 be a set of |𝐷| data centers and let 𝑑 represent an sors can cause severe performance degradation, especially
individual data center. We assume that a cloud infrastructure when competing tasks are memory intensive. The memory-
(Fig. 1) is composed of |𝐷| data centers, that is, 𝑑 = 1,2, … , |𝐷| intensity of a task refers to the ratio of last-level cache misses
and 𝑑 ∈ 𝐷. Let 𝐼 be a set of |𝐼| task types and 𝑖 represents an to the total number of instructions executed. We use a linear
individual task type. We consider |𝐼| task (i.e., workload) types, regression model that combines a set of disparate features
that is, 𝑖 = 1,2, … , |𝐼| and 𝑖 ∈ 𝐼. A task type 𝑖 ∈ 𝐼 is character- based on the current tasks assigned to a multicore processor
ized by its arrival rate and its execution rate, i.e., reciprocal of to predict the execution time of a target task 𝑖 on core 𝑘 in the
the estimated time required to complete a task of task type 𝑖 presence of performance degradation due to interference
on each of the heterogeneous compute nodes, in each per- from task co-location [38].
formance state (P-state). We assume that the beginning of We classify the task types into memory-intensity classes on
each epoch 𝜏 represents a steady-state scheduling problem each of the node types, and calculate the coefficients for each
where the CWM splits the global arrival rate 𝐺𝐴𝑅𝑖 (𝜏) for each memory-intensity class using the linear regression model to
task type 𝑖 into the local data center level arrival rate 𝐴𝑅𝑖,𝑑 (𝜏) determine a co-located execution rate for task type 𝑖 on core
and assigns it to each data center 𝑑 ∈ 𝐷. That is,
𝑐𝑜𝑟𝑒
𝑘, 𝐶𝐸𝑅𝑖,𝑘 (𝜏) [38]. When considering co-location at a data
center 𝑑, the co-location aware data center execution rate for
task type 𝑖 is given by:
2377-3782 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: Tencent. Downloaded oninformation. June 30,2021 at 09:30:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSUSC.2021.3086087, IEEE Transactions on Sustainable Computing
HOGADE ET AL.: ENERGY AND NETWORK COST AWARE WORKLOAD MANAGEMENT FOR GEOGRAPHICALLY DISTRIBUTED HETEROGENEOUS
CLOUD DATA CENTERS 5
If the CWM migrates a task from one location to another lo- Observe that, at data center 𝑑 with zero 𝑃𝑅𝑑 (𝜏) for each task
cation, there is a data transfer cost (as discussed in Section type 𝑖, 𝑃𝐷𝑖,𝑑
𝐸
(𝐴𝑅, 𝜏) in (10) and 𝑁𝐶𝑖,𝑑
𝐸
(𝐴𝑅, 𝜏) in (12) will
3.3.6) associated with it. The objective of a CWM is to allocate
increase to 𝑃𝐷𝑚𝑎𝑥 and 𝑁𝐶𝑖,𝑑𝑚𝑎𝑥
, respectively as the data cen-
the workload across geo-distributed data centers to minimize 𝑑
the monetary cloud operating (energy and network) cost of ter arrival rates 𝐴𝑅𝑖,𝑑 (𝜏) approaches to its maximum execu-
the system (the sum of (7) across all data centers) while en- tion rate 𝐸𝑅𝑖,𝑑 (𝜏). Thus, we can say that both 𝑃𝐷𝑖,𝑑 𝐸
(𝐴𝑅, 𝜏)
suring that the workload is completed according to the con- and 𝑁𝐶𝑖,𝑑 (𝐴𝑅, 𝜏) are functions of 𝐴𝑅 and 𝜏.
𝐸
straint defined by (2). Then the estimated data center cost 𝐷𝐶𝑖,𝑑 𝐸
(𝐴𝑅, 𝜏) incurred
The complexity of the CWM workload allocation problem by the task type 𝑖 with a data center maximum execution rate
makes it NP-hard [42], and therefore we propose a game the- 𝐸𝑅𝑖,𝑑 (𝜏) at data center 𝑑, can be calculated by modifying (7)
ory-based workload management heuristic. Here, we model as:
the cloud workload distribution problem as a non-cooperative 𝐸
𝐷𝐶𝑖,𝑑 (𝐴𝑅, 𝜏) = 𝐸𝑑𝑝𝑟𝑖𝑐𝑒 (𝜏) ⋅ 𝛼𝑑 ⋅ 𝑃𝐷𝑖,𝑑
𝐸
(𝐴𝑅, 𝜏) + Δ𝑝𝑒𝑎𝑘
𝑑 (𝜏)
game. In a non-cooperative game, there could be a finite (or 𝐸
+ 𝑁𝐶𝑖,𝑑 (𝐴𝑅, 𝜏) (13)
infinite) number of players who aim to maximize/minimize
their objective independently but ultimately reach an equilib- where 𝛼𝑑 = 1 if 𝑃𝐷𝑖,𝑑
𝐸
(𝐴𝑅, 𝜏) is positive and 0 ≤ 𝛼𝑑 ≤ 1 other-
rium. For a finite number of players, this equilibrium is called wise.
the Nash equilibrium [43].
5.2.2 Estimated Delay Cost
5 NASH EQUILIBRIUM-BASED HEURISTIC The estimated delay cost is associated with the data center
5.1 Overview queueing delay (8) and captures the lost revenue incurred
from the delay experienced by the requests. For this loss, we
Our proposed Nash equilibrium-based intelligent load dis-
use the model from [35]. NILD considers the queueing delay
tribution (NILD) heuristic is a non-cooperative game-theoretic
within the data center. Let 𝛽 be a constant known as a delay
load balancing approach. The following subsections describe
the components of the heuristic in more detail. cost factor. As per (8), the estimated delay cost 𝐷𝑒𝑙𝐶𝑖,𝑑
𝐸
(𝐴𝑅, 𝜏)
for the task type 𝑖 at the data center 𝑑, can be calculated as
5.2 Objective Function 𝑞𝑢𝑒𝑢𝑒
𝐸
𝐷𝑒𝑙𝐶𝑖,𝑑 (𝐴𝑅, 𝜏) = 𝛽 ⋅ 𝐴𝑅𝑖,𝑑 (𝜏) ⋅ 𝑑𝑖,𝑑 (𝜏). (14)
NILD jointly optimizes the estimated data center (energy
Substituting the value of from (8), we get:
𝑞𝑢𝑒𝑢𝑒
and network) cost and the estimated delay cost, as discussed 𝑑𝑖,𝑑 (𝜏)
in the following subsections.
𝐸
𝛽 ⋅ 𝐴𝑅𝑖,𝑑 (𝜏) (15)
𝐷𝑒𝑙𝐶𝑖,𝑑 (𝐴𝑅, 𝜏) = .
5.2.1 Estimated Data Center Cost 𝐸𝑅𝑖,𝑑 (𝜏) − 𝐴𝑅𝑖,𝑑 (𝜏)
Let 𝐽𝑑 be a set of node types in a data center 𝑑 and 𝑗 repre- Observe that 𝐷𝑒𝑙𝐶𝑖,𝑑
𝐸
(𝐴𝑅, 𝜏) will increase (to infinity) as the dif-
sents an individual node type. In a data center 𝑑, there are a ference (𝐸𝑅𝑖,𝑑 (𝜏) – 𝐴𝑅𝑖,𝑑 (𝜏)) decreases to zero. This shows
total of 𝑁𝑁𝑑.𝑗 nodes of node type 𝑗. Let 𝑃𝑗𝐷 be the average that the data centers running at very high capacity result in a
peak dynamic power for node type 𝑗. It is calculated by aver- very high estimated delay cost. Because of this, NILD avoids
aging (over all task types) the peak power for each task type 𝑖 assigning the highest possible arrival rates to data centers.
executing on node type 𝑗. Let 𝑃𝐷𝑑𝑚𝑎𝑥 be the maximum power Therefore, with a constant 𝐸𝑅𝑖,𝑑 (𝜏) and 𝛽, the estimated delay
cost becomes a function of 𝐴𝑅 and 𝜏.
dissipation possible at data center 𝑑, is calculated as:
5.2.3 Estimated Overall Cost Incurred
𝑃𝐷𝑑𝑚𝑎𝑥 = (𝑁𝐶𝑅𝑑 ⋅ 𝑚𝑎𝑥
𝑃𝐶𝑅𝑑,𝑐 + ∑ 𝑁𝑁𝑑,𝑗 ⋅ 𝑃𝑗𝐷 ) ⋅ 𝐸𝑓𝑓𝑑 . (9) The goal of NILD is to minimize the overall cost incurred,
𝑗∈𝐽𝑑 which has two components: estimated data center cost, de-
fined in (13), and the estimated delay cost, defined in (15). The
Recall that 𝑃𝑅𝑑 (𝜏) is the total renewable power available at estimated overall cost 𝑂𝐶𝑖𝐸 (𝐴𝑅, 𝜏) incurred for the task type 𝑖,
data center 𝑑. Let 𝑃𝐷𝑖,𝑑
𝐸
(𝐴𝑅, 𝜏) be the estimated power dissi- can be calculated as:
pation possible for each task type i at data center 𝑑, is calcu-
|𝐼|
lated as:
𝑂𝐶𝑖𝐸 (𝐴𝑅, 𝜏) = ∑ (𝐷𝐶𝑖,𝑑
𝐸 𝐸
(𝐴𝑅, 𝜏) + 𝐷𝑒𝑙𝐶𝑖,𝑑 (𝐴𝑅, 𝜏)) . (16)
𝑃𝐷𝑑𝑚𝑎𝑥 ⋅ 𝐴𝑅𝑖,𝑑 (𝜏)
𝐸
𝑃𝐷𝑖,𝑑 (𝐴𝑅, 𝜏) = ( − 𝑃𝑅𝑑 (𝜏)). (10) 𝑑=1
𝐸𝑅𝑖,𝑑 (𝜏)
5.3 Load Distribution as a Non-cooperative Game
Let 𝑁𝐶𝑖,𝑑
𝑚𝑎𝑥
be the maximum network cost possible for each In our workload distribution problem, the objective of NILD
task type i at data center 𝑑, is calculated as: is to split the global arrival rate 𝐺𝐴𝑅𝑖 (𝜏) for each task type 𝑖
into local data center level arrival rates 𝐴𝑅𝑖,𝑑 (𝜏) and assign it
𝑚𝑎𝑥
𝑁𝐶𝑖,𝑑 = 𝑁 𝑝𝑟𝑖𝑐𝑒 ⋅ 𝑁𝑁𝑑 ⋅ 𝑆𝑖 . (11)
to each data center 𝑑 ∈ 𝐷. We model this problem as a non-
Let 𝑁𝐶𝑖,𝑑
𝐸
(𝐴𝑅, 𝜏) be the estimated network cost possible for cooperative game that is played among a set of players. Each
each task type i at data center 𝑑, is calculated as: player has a set of strategies and the estimated overall cost
𝑚𝑎𝑥
𝑁𝐶𝑖,𝑑 ⋅ 𝐴𝑅𝑖,𝑑 (𝜏) that results from using each strategy. In this game, each task
𝐸
𝑁𝐶𝑖,𝑑 (𝐴𝑅, 𝜏) = . (12) type 𝑖 is a player. NILD uses Algorithm 1, as discussed later in
𝐸𝑅𝑖,𝑑 (𝜏)
Section 5.5, to find the strategy 𝐴𝑅𝑖 of each player i. Then NILD
2377-3782 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: Tencent. Downloaded oninformation. June 30,2021 at 09:30:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSUSC.2021.3086087, IEEE Transactions on Sustainable Computing
uses Algorithm 2, as discussed later in Section 5.6, to find a equilibrium, the strategies of other players are constants. As
feasible cloud workload distribution strategy 𝐴𝑅 = our optimization framework has the objective of minimizing
{𝐴𝑅1 , 𝐴𝑅2 , … , 𝐴𝑅|𝐼| } such that 𝑂𝐶𝑖𝐸 (𝐴𝑅, 𝜏) is minimized. The the estimated operating costs, we try to assign larger work-
components of our non-cooperative game are: loads to less costly data centers. We determine whether a data
center is expensive by calculating the value for a metric called
• Players: a finite set of players, where each task type
capacity factor [35]. A lower capacity factor represents a less
𝑖 is a player; 𝑖 = 1,2, … , |𝐼| (as per Section 3.2)
• Strategy sets: load distribution strategy of a player expensive data center, and vice versa. The capacity factor is
𝐴𝑅𝑖 = {𝐴𝑅𝑖,1 , 𝐴𝑅𝑖,2 , … , 𝐴𝑅𝑖,|𝐷| } nothing but the estimated data center maximum cost that is
• Cost: each player wants to minimize the overall independent of the arrival rate. Let 𝐶𝐹𝑖,𝑑 𝐸
(𝜏) be the estimated
cost 𝑂𝐶𝑖𝐸 (𝐴𝑅, 𝜏) associated with its strategy. capacity factor of a data center 𝑑 for task type 𝑖. Modifying
(13) we get:
5.4 Nash Equilibrium 𝑃𝐷𝑑𝑚𝑎𝑥
To obtain the workload distribution strategy for the cloud
𝐸
𝐶𝐹𝑖,𝑑 (𝜏) = 𝐸𝑑𝑝𝑟𝑖𝑐𝑒 (𝜏) ⋅ 𝛼𝑑 ⋅ ( 𝑎𝑣 − 𝑃𝑅𝑑 (𝜏))
𝐸𝑅𝑖,𝑑 (𝜏)
data centers, the game described in Section 5.3 can be solved 𝑚𝑎𝑥
(20)
using a Nash equilibrium, which is the most widely used solu- 𝑝𝑒𝑎𝑘 𝑁𝐶𝑖,𝑑
+ Δ𝑑 (𝜏) + ( 𝑎𝑣 )
tion for such games. The Nash equilibrium of our non-coop- 𝐸𝑅𝑖,𝑑 (𝜏)
erative game is a load distribution strategy of the entire cloud where 𝛼𝑑 =1 if 𝑃𝐷𝑑𝑚𝑎𝑥 is positive and 0 ≤ 𝛼𝑑 ≤ 1 otherwise.
𝐴𝑅 = {𝐴𝑅1 , 𝐴𝑅2 , … , 𝐴𝑅|𝐼| } such that for each player 𝑖:
First sort the data centers based on their estimated capacity
𝐴𝑅𝑖 ∈ 𝑎𝑟𝑔𝑚𝑖𝑛 𝑂𝐶𝑖𝐸 (𝐴𝑅, 𝜏). factors 𝐶𝐹𝑖,𝑑𝐸
(𝜏). As per [35], let 𝑞 be the smallest integer
𝐴𝑅𝑖
(17) (number of data centers) that satisfies
If no player can further minimize the overall cost incurred by 𝑞
𝑎𝑣
𝛽 ⋅ 𝐸𝑅𝑖,𝑑 (𝜏)
𝑞
HOGADE ET AL.: ENERGY AND NETWORK COST AWARE WORKLOAD MANAGEMENT FOR GEOGRAPHICALLY DISTRIBUTED HETEROGENEOUS
CLOUD DATA CENTERS 7
larger 𝐴𝑅𝑖,𝑑 to inexpensive data centers, and vice versa. Equa- system the best possible performance. Here, each player de-
tion (26) is used to find the best reply strategy 𝐴𝑅𝑖 = termines its 𝐵𝑒𝑠𝑡-𝑅𝑒𝑝𝑙𝑦 strategy using the current load distri-
{𝐴𝑅𝑖,1 , 𝐴𝑅𝑖,2 , … , 𝐴𝑅𝑖,|𝐷| }. bution strategies of other players and updates its strategy.
Based on (25) and (26), the following 𝐵𝑒𝑠𝑡-𝑅𝑒𝑝𝑙𝑦 algorithm Algorithm 2. Pseudo-code for NILD heuristics
is formulated to find the optimal load distribution strategy, i.e., inputs: global arrival rates:
the best reply of player 𝑖. 𝐺𝐴𝑅1 (𝜏), … , 𝐺𝐴𝑅𝑖 (𝜏), … , 𝐺𝐴𝑅|𝐼| (𝜏)
data center maximum execution rates:
Algorithm 1. Pseudo-code for 𝐵𝑒𝑠𝑡-𝑅𝑒𝑝𝑙𝑦 strategy 𝐸𝑅1,1 (𝜏), … , 𝐸𝑅𝑖,𝑑 (𝜏), … , 𝐸𝑅|𝐼|,|𝐷| (𝜏)
inputs: global arrival rate: 𝐺𝐴𝑅𝑖 (𝜏) output: cloud load distribution strategy:
𝑎𝑣 (𝜏), 𝑎𝑣 (𝜏)
available execution rates: 𝐸𝑅𝑖,1 … , 𝐸𝑅𝑖,|𝐷| 𝐴𝑅 = {𝐴𝑅1 , … , 𝐴𝑅𝑖 , … , 𝐴𝑅|𝐼| }
𝐸 𝐸
capacity factors: 𝐶𝐹𝑖,1 (𝜏), … , 𝐶𝐹𝑖,|𝐷| (𝜏) 1. initialize:
output: load distribution strategy for a player : iteration number: 𝑙 = 0
𝐴𝑅𝑖 = {𝐴𝑅𝑖,1 , … , 𝐴𝑅𝑖,|𝐷| } load distribution strategy: 𝐴𝑅𝑖 = 0
1. sort data centers in ascending order of capacity factors estimated overall cost incurred: 𝑂𝐶𝑖𝐸 (𝐴𝑅, 𝜏) = 0
i.e., 𝐶𝐹𝑖,1
𝐸 (𝜏)
≤ ⋯ ≤ 𝐶𝐹𝑖,𝑑 𝐸 (𝜏)
≤ ⋯ ≤ 𝐶𝐹𝑖,|𝐷|𝐸 (𝜏)
tolerance: 𝜀 = 10−3
2. 𝑞 = |𝐷| 𝑓𝑙𝑎𝑔 = 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒
3. initialize 𝛾 and 𝛿 using (23) and (24), respectively 2. while 𝑓𝑙𝑎𝑔 = 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑒
4. while 𝛾 > 𝛿 3. for each 𝑖 in {1,2, … , |𝐼|}
𝑎𝑣 (𝜏)
5. 𝛾 = 𝛾 − 𝐸𝑅𝑖,𝑞 4. for each 𝑑 in {1,2, … , |𝐷|}
6. 𝐴𝑅𝑖,𝑞 = 0 5. 𝑎𝑣 (𝜏)
calculate 𝐸𝑅𝑖,𝑑 using (18)
7. 𝑞 =𝑞−1 6. 𝐸 (𝜏)
calculate 𝐶𝐹𝑖,𝑑 using (20)
8. update 𝛿 using (24)
7. find optimal 𝐴𝑅𝑖 using 𝐵𝑒𝑠𝑡-𝑅𝑒𝑝𝑙𝑦 algorithm
9. for each data center in {1,2, … , 𝑞} 8. calculate 𝑂𝐶𝑖𝐸 (𝐴𝑅, 𝜏) using (16)
10. calculate 𝐴𝑅𝑖,𝑑 (𝜏) using (26) |𝐼|
9. 𝑛𝑜𝑟𝑚 = ∑𝑖=1|𝑂𝐶𝑖𝐸(𝑙−1) (𝐴𝑅, 𝜏) − 𝑂𝐶𝑖𝐸(𝑙) (𝐴𝑅, 𝜏)|
The 𝐵𝑒𝑠𝑡-𝑅𝑒𝑝𝑙𝑦 algorithm takes global arrival rate, available 10. if 𝑛𝑜𝑟𝑚 < 𝜀
11. 𝑓𝑙𝑎𝑔 = 𝑠𝑡𝑜𝑝
execution rates, and capacity factors of player 𝑖 as inputs. As
the output, it determines the best reply strategy 𝐴𝑅𝑖 =
6 SIMULATION ENVIRONMENT
{𝐴𝑅𝑖,1 , 𝐴𝑅𝑖,2 , … , 𝐴𝑅𝑖,|𝐷| }. First, the data centers are sorted
6.1 Comparison Heuristics
based on their capacity factors (step 1). The heuristic aims to
We compare our proposed NILD heuristic with (a) co-loca-
assign larger workloads to less expensive data centers. It ini- tion aware force-directed load distribution (FDLD) [36], (b)
tializes the values of 𝑞, 𝛾, and 𝛿 (steps 2 and 3). We use the genetic algorithm load distribution (GALD) [36], and (c) Nash
while loop (steps 4-8) to find the smallest index data center equilibrium based simple load distribution (NSLD) [35].
that satisfies (25). In the loop, the algorithm does not assign FDLD [36] is a variation of force-directed scheduling [44], fre-
any workload to the last (most expensive) data center i.e., quently used for optimizing semiconductor logic synthesis.
𝐴𝑅𝑖,𝑞 = 0, if the (𝑞 − 1)𝑡ℎ data center’s available execution FDLD is an iterative heuristic that selectively performs oper-
rate is capable of satisfying (25). This while loop updates the ations to minimize system forces until all constraints are met.
values of 𝑞, 𝛾, and 𝛿 and continues until the condition (25) or The Genitor style [45] GALD has two parts: a genetic algo-
rithm-based CWM and a local data center level greedy heu-
𝛾 > 𝛿 is met. Finally, the algorithm determines 𝐴𝑅𝑖,𝑑 (𝜏) ∈ 𝐴𝑅𝑖
ristic that is used to calculate the fitness value of the genetic
for each data center using (26). algorithm. The local greedy heuristic has information about
5.6 NILD Heuristic task-node power dynamic voltage and frequency scaling
(DVFS) models [36]. [35] also proposes a game-theoretic for-
We designed the final NILD algorithm to compute the Nash
mulation for the load distribution problem. However, it ad-
equilibrium of the non-cooperative game. It uses the dresses a much simpler problem, with homogeneous work-
𝐵𝑒𝑠𝑡-𝑅𝑒𝑝𝑙𝑦 algorithm explained in the previous section. The loads and datacenters, and a simplistic data center cost model.
CWM uses Algorithm 2 shown below to assign (i.e., split) the We adapt the approach proposed in [35] to our heterogene-
global workload arrival rates to data center level arrival rates ous environment we have outlined in Section 3, and create the
for all task types. Each player 𝑖 computes its optimal NSLD. The simpler models used NSLD affect the data center
𝐵𝑒𝑠𝑡-𝑅𝑒𝑝𝑙𝑦 strategy (steps 3-8), i.e., it determines 𝐴𝑅𝑖,𝑑 (𝜏) for and delay costs. They also affect the data center capacity fac-
each data center 𝑖. The heuristic then calculates the 𝑛𝑜𝑟𝑚, by tor and arrival rate calculation, which changes the 𝐵𝑒𝑠𝑡-𝑅𝑒𝑝𝑙𝑦
summing the absolute value of the difference among its op- strategy of a player (task type). In contrast, our proposed
NILD framework integrates detailed models for data center
erating costs across all players (task types) from the current 𝑙 𝑡ℎ
compute and cooling power, co-location interference, net me-
iteration and the previous (𝑙 − 1)𝑡ℎ iteration. This continues
tering, peak demand, and network pricing distribution.
until the algorithm comes to an equilibrium i.e., the difference Moreover, we also consider heterogeneity in workload (appli-
in the total operating cost across all the players in successive cations/task types executing in the data center) and heteroge-
iterations (i.e., 𝑛𝑜𝑟𝑚) is less than the pre-defined stopping cri- neity within the data center (types of servers, server rack ar-
terion (tolerance 𝜀). The value of 𝜀 is determined empirically rangements, etc.), with more detailed models to calculate the
through simulation studies to provide a value that gives the
2377-3782 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: Tencent. Downloaded oninformation. June 30,2021 at 09:30:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSUSC.2021.3086087, IEEE Transactions on Sustainable Computing
latency and power overhead for computation and communi- Organizations are heavily using platform as a service (PaaS)
cation. offerings from cloud providers. These services mainly involve
data warehouse, data analytics, and machine learning/artificial
6.2 Experimental Setup intelligence workloads [49], [50]. Due to the popularity of such
Experiments were conducted for three geo-distributed data workloads among cloud providers, we use data intensive (e.g.,
center configurations containing four, eight, and sixteen data offline analytics and artificial intelligence) workloads from the
centers. Locations of the data centers in the three configura- BigDataBench 5.0 [39] benchmark suite. Table 2 summarizes
tions were selected from major cities around the continental the workloads from this suite that we consider in our work.
United States to provide a variety of wind and solar condi- Task execution times and co-located performance data for
tions among sites and at various times of the day (Fig. 2). The tasks of the different memory intensity classes were obtained
sites of each configuration were selected so that each configu- from running the benchmark applications on the nodes listed
ration would have an even east coast to west coast distribu- in Table 1 [38]. Fig. 3 depicts a synthetic sinusoidal task arrival
tion to better exploit TOU pricing, peak demand pricing, net rate pattern and a flat task arrival rate pattern with line plots.
metering, and renewable power. Each data center comprises The left y-axis is normalized to the highest total global work-
4,320 nodes arranged in four aisles, and is heterogeneous load arrival rate. A sinusoidal task arrival rate pattern exists in
within itself, having nodes from either two or three of the environments where the workload traffic depends on con-
node types given in Table 1, with most locations having three sumer interaction and follows their demand during the day,
node types and per-node core counts that range from 4 to 12 e.g., Netflix [51], Facebook [52]. However, for the environments
cores. where continuous computation is needed and the workload
pattern is non-user/consumer interaction specific, the task ar-
rival rate pattern is usually flat (nearly constant). Examples of
such environments exist in military computing installations
(Department of Defense) and government research labs (Na-
tional Center for Atmospheric Research). For reference, Fig. 3
also depicts TOU prices for New York (east coast) and Los An-
geles (west coast) data center locations with bar plots.
HOGADE ET AL.: ENERGY AND NETWORK COST AWARE WORKLOAD MANAGEMENT FOR GEOGRAPHICALLY DISTRIBUTED HETEROGENEOUS
CLOUD DATA CENTERS 9
Fig. 5. Cloud operating costs for each heuristic over a day, for a configuration with (a) four, (b) eight, and (c) sixteen data centers
and (d) peak shaving, net metering, and network (PS, NM, the energy cost reduction decreases with the increasing num-
NW) aware. Heuristic variants that are referred to as “peak ber of data centers. Here, as the number of data centers in the
shaving unaware” or “network unaware” do not include the group grows larger, the problem size increases, and the num-
peak demand pricing factor or the network cost, respectively, ber of GALD generations that can take place within the time
in their objective functions but consider them while calculat- limit (one hour by default) decreases, which decreases the per-
ing the total monthly operating cost at the end of the billing formance of GALD. In contrast, FDLD, NSLD, and NILD reach
period. This experiment used a data center configuration with the solution within minutes or seconds. These experiments
four locations running a sinusoidal workload pattern. The confirm that the NILD heuristic consistently performs the best
cloud operating costs were calculated over a duration of a day. for all problem sizes. In case of the data center configurations
The results are shown in Fig. 4. containing sixteen data centers, the ‘PS, NM, NW aware’ NILD
For each heuristic; the ‘PS, NM, NW aware’ variant pro- heuristic (our proposed work) achieved 41% and 23% better
duced the best results. This validates our consideration of peak cost savings than the ‘PS, NM aware’ FDLD and GALD heuris-
shaving, net metering, and network cost during the cloud tics (from our previous work [36]), respectively. It also
workload distribution to more effectively minimize overall en- achieved 32% lower cost savings than the ‘PS, NM, NW una-
ergy costs. It can also be observed that the FDLD heuristic per- ware’ NSLD heuristic (from the previous work [35]).
formed the worst, severely over-provisioning nodes, and re-
sulting in high operating costs. The GALD heuristic has infor- TABLE 3: Cloud Operating Cost Reduction Comparison
mation about task-node power (DVFS) models [36], allowing PS, NM, NW NW PS, NM PS, NM,
heuristic
it to make better task placement decisions than FDLD. The unaware aware aware NW aware
NSLD [35] heuristic considers a simplistic view of data center FDLD 0.0% 1.3% 6.9% 20.5%
compute and cooling power, while also ignoring co-location 4 data NSLD 24.7% 30.8% 31.1% 33.1%
interference, net metering, and peak demand pricing distribu- centers GALD 30.3% 39.8% 40.3% 44.8%
tions, and because it was designed for a different system NILD 37.0% 42.5% 46.5% 47.5%
model it is unable to perform as well as our NILD framework. FDLD 0.0% 0.6% 23.5% 31.0%
The performance of the ‘PS, NM aware’ variant of NILD heu- 8 data NSLD 27.1% 29.1% 29.5% 31.4%
ristic came close to its ‘PS, NM, NW aware’ variant. The ‘PS, centers GALD 33.7% 32.7% 48.6% 49.2%
NM, NW aware’ variant of NILD heuristic outperformed all NILD 47.0% 50.5% 53.8% 54.3%
other approaches. This heuristic minimizes the operating costs
FDLD 0.0% 0.4% 22.6% 34.7%
for all players/task types independently but ultimately reaches
16 data NSLD 32.6% 34.5% 34.7% 36.8%
an equilibrium. Because of the non-cooperative nature of this
centers GALD 31.7% 22.4% 40.3% 40.4%
method, each player/task type determines the lowest possible
NILD 46.9% 50.6% 50.5% 54.2%
operating cost with its workload allocation strategy while con-
sidering the current load distribution strategies of other play-
ers. 7.2.2 Epoch based Analysis
For most of our experiments, we analyzed the total system
7.2 Data Center Scalability Analysis cost for each heuristic over one day. Fig. 5 shows a detailed
7.2.1 Cost Reduction Comparison view of the cloud operating cost at one-hour intervals over a
In this experiment, we analyze heuristic performance for day for four, eight, and sixteen data centers executing a sinus-
larger problem sizes. Simulations with the sinusoidal workload oidal workload. We consider the ‘PS, NM, NW aware’ variants
patterns were analyzed for eight and sixteen data center con- of all workload management heuristics in this study. The op-
figurations in addition to the previously discussed four data erating cost for each heuristic is very high during the first
center configuration. For each configuration, the percentage epoch in the figure because the period for which the results
performance improvement of each heuristic over the ‘PS, NM, are shown represents the first day of the month where the in-
NW unaware’ FDLD variant is given in Table 3. itial peak demand cost is added. This effect stays there for the
For GALD, going from 8 to 16 data centers, we notice that first day and would not be present for other days of the
month. After a few epochs, the performance of the GALD
2377-3782 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: Tencent. Downloaded oninformation. June 30,2021 at 09:30:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSUSC.2021.3086087, IEEE Transactions on Sustainable Computing
came close to the NILD but could not surpass its performance eight data centers executing both sinusoidal and flat workload
in general. arrival patterns as shown in Fig. 3. We conducted 10 simula-
We discussed the notion of the data center queueing delay tion runs for each workload pattern and analyzed its impact.
in Section 3.3.8. It is calculated using (8). We also discussed For each run, the arrival rate values were randomly sampled
how important it is for cloud service providers to minimize this (with a normal distribution). We used the original arrival rate
delay. For this experiment, we considered the ‘PS, NM, NW values, shown in Fig. 3 as mean and used 20% of the mean as
aware’ variant of each heuristic for a group of eight data cen- standard deviation. We analyzed the cloud operating cost var-
ters executing sinusoidal workload over a day. In Fig. 6, the iation for each heuristic over a day by plotting the mean cost
colored data points represent the queueing delay (summed with the standard error bars as shown in Fig. 7. Recall that each
over 8 data centers) values for all heuristics for each epoch task type is characterized by its arrival rate and the estimated
over a day. As per (8), the queueing delay increases as the de- time required to complete the task on each of the heteroge-
nominator’s value (the difference ‘𝐸𝑅𝑖,𝑑 (𝜏) – 𝐴𝑅𝑖,𝑑 (𝜏)’) de- neous compute nodes. The heuristics distribute the workload
creases. The increase in the queueing delay across heuristics to minimize total energy cost across all data centers with the
indicates the times during which the data centers are running constraint that the execution rates of all task types meet their
at very high or nearly maximum capacity (𝐸𝑅𝑖,𝑑 (𝜏)) due to the arrival rates (2). Therefore, the workload assignment was al-
resource allocation decisions (𝐴𝑅𝑖,𝑑 (𝜏)) made by the heuris- tered with the change in incoming arrival rate pattern, which
tics. The results shown in Fig. 6 reveal that the FDLD, NSLD, further affected the system cost. The results are shown in Fig.
and GALD heuristics fail to keep the delay small for certain 7(a) and Fig. 7(b) indicate that the geo-distributed system re-
epochs, whereas the NILD heuristic maintains a lower queue- sponded differently for sinusoidal and flat arrival rate patterns.
ing delay compared to the other heuristics over the day (on The percentage difference between sinusoidal and flat arrival
average) across epochs. rate patterns’ results for ‘PS, NM, NW aware’ variants (yellow
bars) of FDLD, NSLD, GALD, and NILD are 16%, 7%, 1%, and
FDLD NSLD GALD NILD
64 10%, respectively.
queuing delay (seconds)
32
16
1
12 AM 4 AM 8 AM 12 PM 4 PM 8 PM
UTC time
Fig. 6. Data center queueing delay comparison among heuristics
over a day executing sinusoidal workload for eight data centers
HOGADE ET AL.: ENERGY AND NETWORK COST AWARE WORKLOAD MANAGEMENT FOR GEOGRAPHICALLY DISTRIBUTED HETEROGENEOUS
CLOUD DATA CENTERS 11
7.4.1 Task Data Size we decreased the value of 𝛽 very low (smaller than 0.1), NILD
We performed this experiment to analyze the impact of reached equilibrium slowly and could not make appropriate
task data size on the cloud operating costs. Recall that, as adjustments in the arrival rates. The best value of 𝛽 found
per (6), two principal components of the network cost are from this experiment (0.1) was the one we used for all ex-
the price per data traffic unit ($/GB), 𝑁 𝑝𝑟𝑖𝑐𝑒 , and the amount periments.
of data volume (GB) for the number of tasks migrated (out-
ward). Here the data volume depends on the amount of
data migrated per task. If a task migrates from one data
center location to another, it transfers various types and
amounts of data for each task type. We used a workload
that was a mixture of offline analytics and artificial intelli-
gence task types as shown in Table 2. This experiment con-
sidered all heuristics for a group of eight data centers exe-
cuting both sinusoidal and flat workloads with different
task data sizes. We considered the ‘PS, NM, NW aware’ var-
iants of all workload management heuristics in this study.
The results from Fig. 8 show that the cloud operating costs
increased with the increase in task data size. Here, the network
Fig. 9. Impact of the delay cost factor, β, on cloud operating costs
costs increased with an increase in the amount of data trans- among heuristics over a day for both sinusoidal and flat workload arri-
ferred. For all task data sizes, our proposed NILD framework val rate patterns, for a configuration with (a) four, (b) eight, and (c) six-
performed the best overall. For the small values of task data teen data centers
sizes, both FDLD and NSLD performed the worst, while GALD
performed better but not as good as NILD. For the large val- 8 CONCLUSIONS
ues of task data sizes, GALD’s performance degraded (as In this work, we studied the problem of workload distribu-
compared to its performance for the small data sizes) but tion among geo-distributed cloud data centers to minimize
NSLD performed the worst. cloud energy and network costs while ensuring all tasks com-
plete without being dropped. We used a new cloud network
model that directly considers the real-world data transfer
prices and the amount of data migrated out of the cloud data
centers. Our framework considers heterogeneity within the
data centers and in task types used in the workload. It is aware
of data center cooling power, time-of-use (TOU) electricity
pricing, green renewable energy, net metering, peak demand
pricing distribution, inter-data center network, and data center
queueing delay. We formulated the cloud workload distribu-
tion problem as a non-cooperative game. We proposed a
game-theoretic workload management technique for mini-
mizing the cloud operating cost. However, to implement our
Fig. 8. Impact of task data size scaling on cloud operating costs approach in a real system, it needs to be used with some form
among heuristics over a day for (a) sinusoidal and (b) flat workload ar- of a workload prediction technique, e.g., [13], [14], to avoid
rival rate patterns for eight data centers delays with workload allocation. For our complex (NP-hard)
problem and system environment, it should also be noted that
7.4.2 NILD Delay Cost Factor (𝛽) the Nash equilibrium-based game-theoretic technique may
As discussed in Section 5.5, NILD uses the 𝐵𝑒𝑠𝑡-𝑅𝑒𝑝𝑙𝑦 al- not always achieve global optima solutions. However, the ob-
gorithm to determine arrival rates. It uses the delay cost factor, tained solutions are still superior to those obtained with state-
𝛽, while calculating the arrival rates (26). In this experiment, we of-the-art frameworks, as demonstrated in our experiments.
study the impact of 𝛽 on the cloud operating cost. We con- Apart from the data analytics and artificial intelligence work-
sidered configurations with four, eight, and sixteen data cen- loads, the real-world cloud workloads also include web-ser-
ters executing both sinusoidal and flat workload arrival pat- vice, search, mobile services, etc. Such time-critical workloads
terns. We considered the ‘PS, NM, NW unaware’ variants of all cannot be transferred without migration penalties, extra net-
workload management heuristics in this study. work processing, and scheduling costs. However, our frame-
Results from this experiment in Fig. 9 show that the cloud work can be extended for such time-critical workloads, e.g., by
operating costs for both sinusoidal and flat workloads in- using a priority scheduling approach where time-critical work-
creased with the increase in 𝛽. As per (26), the larger values of loads are given higher priority for local scheduling at (or near)
𝛽 make large adjustments in the arrival rate, and vice versa. the source of the request, to minimize latencies. We compared
For the configuration with sixteen data centers, when we in- our new game-theoretic NILD technique with three state-of-
creased 𝛽 beyond 0.2, the NILD failed to reach equilibrium. the-art techniques (FDLD, GALD [36], and NSLD [35]). We an-
Whereas for the other configurations, NILD reached equilib- alyzed their performance by comparing the cloud operating
rium quickly and CWM could not make fine adjustments in cost reduction, performing a scalability assessment, examin-
the arrival rate to further minimize the operating costs. When
2377-3782 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: Tencent. Downloaded oninformation. June 30,2021 at 09:30:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSUSC.2021.3086087, IEEE Transactions on Sustainable Computing
ing sensitivity to task data size, testing system behavior for dif- workload forecasting," Soft Computing, p. 1–18, 2020.
ferent task arrival patterns, and comparing data center queue- [14] Z. Chen, J. Hu, G. Min, A. Y. Zomaya and T. El-Ghazawi, "Towards
ing delays. We demonstrated that NILD performed the best Accurate Prediction for High-Dimensional and Highly-Variable
when including information about peak demand charges, net Cloud Workloads with Deep Learning," IEEE Transactions on
metering policies, network costs, and intra-data center queue- Parallel and Distributed Systems, vol. 31, pp. 923-934, 2020.
ing delay. The best performing NILD heuristic from this work [15] A. Wierman, Z. Liu, I. Liu and H. Mohsenian-Rad, "Opportunities
achieved 43%, 16%, and 33% cost reductions on average than and challenges for data center demand response," in Int'l Green
the FDLD and GALD from [36], and NSLD from [35], respec- Computing Conf., 2014.
tively. Additionally, the runtime of the NILD heuristic is much [16] W. Wu, W. Wang, X. Fang, L. Junzhou and A. V. Vasilakos,
lower than the FDLD and GALD heuristics, and is suitable in "Electricity Price-aware Consolidation Algorithms for Time-
case of time-critical workload scheduling. sensitive VM Services in Cloud Systems," IEEE Transactions on
Services Computing, pp. 1-1, 2019.
ACKNOWLEDGMENTS [17] M. Jawad, M. B. Qureshi, U. Khan, S. M. Ali, A. Mehmood, B. Khan,
X. Wang and S. U. Khan, "A robust Optimization Technique for
The authors thank Dylan Machovec for his valuable com-
Energy Cost Minimization of Cloud Data Centers," IEEE
ments on this work. This work is supported by the National
Transactions on Cloud Computing, pp. 1-1, 2018.
Science Foundation (NSF) under grant CCF-1302693.
[18] H. Yeganeh, A. Salahi and M. A. Pourmina, "A Novel Cost
Optimization Method for Mobile Cloud Computing by Capacity
REFERENCES
Planning of Green Data Center With Dynamic Pricing," Canadian
Journal of Electrical and Computer Engineering, vol. 42, pp. 41-
[1] IEA (2017), "Digitalisation and Energy," 2017. [Online]. Available:
51, 2019.
https://www.iea.org/reports/digitalisation-and-energy.
[19] M. Dabbagh, B. Hamdaoui and A. Rayes, "Peak Power Shaving for
[Accessed 1 May 2020].
Reduced Electricity Costs in Cloud Data Centers: Opportunities
[2] "Data Center Locations," [Online]. Available:
and Challenges," IEEE Network, pp. 1-6, 2020.
http://www.google.com/about/datacenters/inside/locations/ind
[20] V. Papadopoulos, J. Knockaert, C. Develder and J. Desmet, "Peak
ex.html. [Accessed 1 May 2020].
Shaving through Battery Storage for Low-Voltage Enterprises
[3] "Global Infrastructure," [Online]. Available:
with Peak Demand Pricing," Energies, vol. 13, p. 1183, 2020.
http://aws.amazon.com/about-aws/global-infrastructure/.
[21] Y. Peng, D.-K. Kang, F. Al-Hazemi and C.-H. Youn, "Energy and QoS
[Accessed 1 May 2020].
aware resource allocation for heterogeneous sustainable cloud
[4] Y. Li, H. Wang, J. Dong, J. Li and S. Cheng, "Operating Cost
datacenters," Optical Switching and Networking, vol. 23, p. 225–
Reduction for Distributed Internet Data Centers," in 13th
240, 2017.
IEEE/ACM Int'l Symposium on Cluster, Cloud, and Grid
[22] J. Krzywda, V. Meyer, M. G. Xavier, A. Ali-Eldin, P.-O. Östberg, C. A.
Computing, 2013.
F. De Rose and E. Elmroth, "Modeling and Simulation of QoS-
[5] "WHAT IS TIME-OF-USE PRICING AND WHY IS IT IMPORTANT?,"
Aware Power Budgeting in Cloud Data Centers," 2019.
[Online]. Available: http://www.energy-exchange.net/time-of-
[23] Y. Mansouri and R. Buyya, "Dynamic replication and migration of
use-pricing/. [Accessed 1 May 2020].
data objects with hot-spot and cold-spot statuses across storage
[6] "Dynamic pricing," [Online]. Available:
data centers," Journal of Parallel and Distributed Computing, vol.
https://www.whatissmartgrid.org/featured-article/what-you-
126, p. 121–133, 2019.
need-to-know-about-dynamic-electricity-pricing. [Accessed 1
[24] A. Hou, C. Q. Wu, R. Qiao, L. Zuo, M. M. Zhu, D. Fang, W. Nie and
May 2020].
F. Chen, "QoS provisioning for various types of deadline-
[7] "Demand Charges," [Online]. Available: https://www.we-
constrained bulk data transfers between data centers," Future
energies.com/business/elec/understand_demand_charges.pdf.
Generation Computer Systems, vol. 105, p. 162–174, 2020.
[Accessed 1 May 2020].
[25] R. Prodan, E. Torre, J. J. Durillo, G. S. Aujla, N. Kummar, H. M. Fard
[8] W. B. Norton, "Internet Transit Prices - Historical and Projected,"
and S. Benedikt, "Dynamic Multi-objective Virtual Machine
DrPeering white paper, [Online]. Available:
Placement in Cloud Data Centers," in 2019 45th Euromicro
http://drpeering.net/white-papers/Internet-Transit-Pricing-
Conference on Software Engineering and Advanced Applications
Historical-And-Projected.php.
(SEAA), 2019.
[9] X. Yi, F. Liu, J. Liu and H. Jin, "Building a network highway for big
[26] M. Najm and V. Tamarapalli, "VM Migration for Profit
data: architecture and challenges," IEEE Network, vol. 28, pp. 5-
Maximization in Federated Cloud Data Centers," in 2020
13, 2014.
International Conference on COMmunication Systems NETworkS
[10] L. Belkhir and A. Elmeligi, "Assessing ICT global emissions (COMSNETS), 2020.
footprint: Trends to 2040 & recommendations," Journal of
[27] H. Zhang, Y. Xiao, S. Bu, R. Yu, D. Niyato and Z. Han, "Distributed
Cleaner Production, vol. 177, p. 448–463, 2018.
Resource Allocation for Data Center Networks: A Hierarchical
[11] "Greenpeace: China’s Data Centers on Track to Use More Energy Game Approach," IEEE Transactions on Cloud Computing, pp. 1-
than All of Australia," [Online]. Available: 1, 2018.
https://www.datacenterknowledge.com/asia-
[28] X. Yuan, G. Min, L. T. Yang, Y. Ding and Q. Fang, "A game theory-
pacific/greenpeace-china-s-data-centers-track-use-more-
based dynamic resource allocation strategy in geo-distributed
energy-all-australia. [Accessed 1 May 2020].
datacenter clouds," Future Generation Computer Systems, vol.
[12] "Net Metering," [Online]. Available: 76, p. 63–72, 2017.
https://www.seia.org/initiatives/net-metering. [Accessed 1 May
[29] A. Yassine, A. A. N. Shirehjini and S. Shirmohammadi, "Bandwidth
2020].
On-demand for Multimedia Big Data Transfer across Geo-
[13] J. Kumar, D. Saxena, A. K. Singh and A. Mohan, "BiPhase adaptive Distributed Cloud Data Centers," IEEE Transactions on Cloud
learning-based neural network model for cloud datacenter Computing, pp. 1-1, 2016.
2377-3782 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: Tencent. Downloaded oninformation. June 30,2021 at 09:30:55 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TSUSC.2021.3086087, IEEE Transactions on Sustainable Computing
HOGADE ET AL.: ENERGY AND NETWORK COST AWARE WORKLOAD MANAGEMENT FOR GEOGRAPHICALLY DISTRIBUTED HETEROGENEOUS
CLOUD DATA CENTERS 13
[30] B. K. Ray, A. Saha and S. Roy, "Migration cost and profit oriented [Online]. Available: https://www.flexera.com/blog/cloud/cloud-
cloud federation formation: hedonic coalition game based computing-trends-2021-state-of-the-cloud-report/. [Accessed
approach," Cluster Computing, vol. 21, p. 1981–1999, 2018. 20 May 2021].
[31] Z. Yu, Y. Guo and M. Pan, "Coalitional datacenter energy cost [50] "Top cloud providers in 2021: AWS, Microsoft Azure, and Google
optimization in electricity markets," in Proceedings of the Eighth Cloud, hybrid, SaaS players," [Online]. Available:
International Conference on Future Energy Systems, 2017. https://www.zdnet.com/article/the-top-cloud-providers-of-
[32] S. Xu, L. Liu, L. Cui, X. Chang and H. Li, "Resource scheduling for 2021-aws-microsoft-azure-google-cloud-hybrid-saas/.
energy-efficient in cloud-computing data centers," in 2018 IEEE [Accessed 20 May 2021].
20th International Conference on High Performance Computing [51] J. Daniel, Y. Danny and J. Neeraj, "Scryer: Netflix's Predictive Auto
and Communications; IEEE 16th International Conference on Scaling Engine," [Online]. Available:
Smart City; IEEE 4th International Conference on Data Science https://netflixtechblog.com/scryer-netflixs-predictive-auto-
and Systems (HPCC/SmartCity/DSS), 2018. scaling-engine-a3f8fc922270. [Accessed 1 May 2020].
[33] M. M. Moghaddam, M. H. Manshaei, W. Saad and M. Goudarzi, [52] W. Qiang, "Making Facebook’s software infrastructure more
"On Data Center Demand Response: A Cloud Federation energy efficient with Autoscale," [Online]. Available:
Approach," IEEE Access, vol. 7, p. 101829–101843, 2019. https://engineering.fb.com/production-engineering/making-
[34] A. Ghassemi, P. Goudarzi, M. R. Mirsarraf and T. A. Gulliver, "Game facebook-s-software-infrastructure-more-energy-efficient-with-
based traffic exchange for green data center networks," Journal autoscale/. [Accessed 2 May 2020].
of Communications and Networks, vol. 20, p. 85–92, 2018.
[35] R. Tripathi, S. Vignesh, V. Tamarapalli, A. T. Chronopoulos and H.
Siar, "Non-cooperative power and latency aware load balancing
in distributed data centers," Journal of Parallel and Distributed Ninad Hogade received his B.E. degree in Elec-
Computing, vol. 107, p. 76–86, 2017. tronics Engineering from Vishwakarma Institute
of Technology, India, and M.S. degree in Com-
[36] N. Hogade, S. Pasricha, H. J. Siegel, A. A. Maciejewski, M. A. Oxley puter Engineering from Colorado State Univer-
and E. Jonardi, "Minimizing Energy Costs for Geographically sity, USA. He is currently a Ph.D. student in
Distributed Heterogeneous Data Centers," IEEE Transactions on Computer Engineering at Colorado State Uni-
Sustainable Computing, vol. 3, pp. 318-331, 2018. versity, USA. His research interests include en-
ergy aware scheduling of high performance
[37] D. G. Feitelson, D. Tsafrir and D. Krakov, "Experience with using the
computing systems and data centers.
Parallel Workloads Archive," J. of Parallel and Distributed
Computing, vol. 74, pp. 2967-2982, Oct. 2014. Sudeep Pasricha received his B.E. degree in
[38] D. Dauwe, E. Jonardi, R. D. Friese, S. Pasricha, A. A. Maciejewski, D. Electronics and Communications from Delhi In-
A. Bader and H. J. Siegel, "HPC node performance and energy stitute of Technology, India; and his M.S. and
modeling with the co-location of applications," The J. of Ph.D. degrees in Computer Science from Uni-
Supercomputing, vol. 72, pp. 4771-4809, 01 12 2016. versity of California, Irvine. He is currently a Pro-
fessor and Chair of Computer Engineering at
[39] "BigDataBench 5.0 benchmark suite," [Online]. Available: Colorado State University, where he is also a
http://www.benchcouncil.org/BigDataBench/. [Accessed 1 May Professor of Computer Science. He is a Senior
2020]. Member of the IEEE and ACM. Homepage:
[40] Z. Liu, M. Lin, A. Wierman, S. Low and L. L. H. Andrew, "Greening http://www.engr.colostate.edu/sudeep.
geographical load balancing," IEEE/ACM Transactions on Howard Jay (H.J.) Siegel is a Professor Emer-
Networking, vol. 23, p. 657–671, 2014. itus at Colorado State University. From 2001 to
2017, he was the Abell Endowed Chair Distin-
[41] J. Hamilton, "The Cost of Latency," [Online]. Available:
guished Professor of Electrical and Computer
https://perspectives.mvdirona.com/2009/10/the-cost-of- Engineering, and a Professor of Computer Sci-
latency/. [Accessed 1 May 2020]. ence. He was a professor at Purdue from 1976
[42] H. Goudarzi and M. Pedram, "Geographical Load Balancing for to 2001. He is an IEEE Fellow and an ACM Fel-
Online Service Applications in Distributed Datacenters," in IEEE low. He received B.S. degrees from MIT, and the
6th Int'l Conf. on Cloud Computing (CLOUD '13), June 2013. M.A., M.S.E., and Ph.D. degrees from Princeton.
Homepage: http://www.engr.colostate.edu/hj.
[43] M. J. Osborne and A. Rubinstein, A course in game theory, MIT
press, 1994.
[44] P. G. Paulin and J. P. Knight, "Force-directed scheduling for the
behavioral synthesis of ASICs," IEEE Trans. on Computer-Aided
Design of Integrated Circuits and Systems, vol. 8, pp. 661-679, 6
1989.
[45] D. Whitley, "The GENITOR algorithm and selective pressure: Why
rank-based allocation of reproductive trials is best," in 3rd
lnternational Conf. on Genetic Algorithms, June 1989.
[46] NREL, "National Solar Radiation Database," [Online]. Available:
https://maps.nrel.gov/nsrdb-viewer. [Accessed 1 May 2020].
[47] "Amazon CloudFront Pricing," [Online]. Available:
https://aws.amazon.com/cloudfront/pricing/. [Accessed 1 May
2020].
[48] "Net Metering Policies," [Online]. Available:
https://www.dsireusa.org/. [Accessed 1 May 2020].
[49] "Cloud Computing Trends: 2021 State of the Cloud Report,"
2377-3782 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: Tencent. Downloaded oninformation. June 30,2021 at 09:30:55 UTC from IEEE Xplore. Restrictions apply.