Resource Allocation and Scheduling in Cloud Computing - Policy and Algorithm
Resource Allocation and Scheduling in Cloud Computing - Policy and Algorithm
90]
On: 18 June 2014, At: 00:21
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK
To cite this article: Tinghuai Ma, Ya Chu, Licheng Zhao & Otgonbayar Ankhbayar (2014) Resource Allocation and Scheduling in
Cloud Computing: Policy and Algorithm, IETE Technical Review, 31:1, 4-16
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Resource Allocation and Scheduling in Cloud Computing:
Policy and Algorithm
Tinghuai Ma1, Ya Chu2, Licheng Zhao3 and Otgonbayar Ankhbayar2
1
Jiangsu Engineering Centre of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing, 210044, China,
2
School of Computer and Software, Nanjing University of Information Science & Technology University, Nanjing, 210044, China, 3National
Information Meteorological Center, China Meteorology Administration, Beijing, China
ABSTRACT
Cloud computing is a new distributed commercial computing model that aims at providing computational
resources or services to users over a network in a low-cost manner. Resource allocation and scheduling (RAS)
is the key focus of cloud computing, and its policy and algorithm have a direct effect on cloud performance and
cost. This paper presents five major topics in cloud computing, namely locality-aware task scheduling; reliability-
aware scheduling; energy-aware RAS; Software as a Service (SaaS) layer RAS; and workflow scheduling. These
five topics are then classified into three parts: performance-based RAS; cost-based RAS; and performance- and
cost-based RAS. A number of existing RAS policies and algorithms are discussed in detail accordingly with
regard to their given parameters. In addition, a comparative analysis of five identified problems with their repre-
sentative algorithms is performed. Finally, some future research directions of cloud RAS are pointed out.
Downloaded by [78.56.45.90] at 00:21 18 June 2014
Keywords:
Cloud computing, Data locality, Reliability, Energy efficiency, Resource allocation and scheduling, Software as
a service, Workflow scheduling.
some algorithms, both are often used simultaneously. challenge. For example, MapReduce [46] can automati-
Therefore, in our discussion, we will not distinguish cally handle failures: if a node crashes, MapReduce can
whether the algorithm is static or dynamic. re-run its tasks on another node. There are some
research works [1518] that consider reliability.
The rest of the paper is organized as follows. In Sec-
tion 2, we present five hot topics of cloud RAS and Problem 3: Energy-aware RAS problem how to
describe them in detail. In Section 3, we provide a reduce energy consumption of data centres to mini-
detailed classified discussion on the various RAS poli- mize operating costs of cloud providers.
cies and algorithms according to the five problems.
Comparative analysis of the RAS policies and algo- With the demand for high-performance computing
rithms of the five problems is introduced in Section 4. infrastructures growing dramatically, the energy con-
Finally, we draw a conclusion and discuss future sumption of data centres is increasing and has therefore
research directions in Section 5. become an important research issue. High energy con-
sumption not only contributes to high operational costs,
2. PROBLEMS AND DESCRIPTION which will reduce cloud providers’ profit, but also
emits substantial carbon dioxide (CO2) which is
In solving the problem of RAS in cloud computing, unfriendly to the environment. Therefore, there is an
researchers have performed significant numbers of urgent need to research the problem of energy-aware
studies from various aspects. We can sum up five hot RAS to reduce energy consumption, while maintaining
topics of cloud RAS from these researches: (1) locality- high SLAs between users and cloud platform providers.
Downloaded by [78.56.45.90] at 00:21 18 June 2014
aware task scheduling; (2) reliability-aware schedul- Many studies [1930] have been performed to find
ing; (3) energy-aware RAS; (4) SaaS layer RAS; and (5) energy-efficient RAS approaches to save energy and
workflow scheduling. In this section, we discuss the reduce the usage of power in cloud computing.
five topics in detail.
Problem 4: SaaS layer RAS problem how to opti-
Problem 1: Locality-aware task scheduling problem mize RAS in SaaS layer to maximize profits of SaaS
how to exploit data locality in RAS of cloud to improve providers.
execution efficiency and save network bandwidth.
Services in the cloud can be categorized as: Infrastruc-
In cloud computing, one of Hadoop’s basic principles ture as a Service (IaaS), Platform as a Service (PaaS)
is “moving computation is cheaper than moving data” and SaaS [47]. SaaS is a software delivery model pro-
[45], which indicates that migrating computation viding software service to customers at a low cost
closer to the data source is better than moving data to across the Internet independently without supplying
where the application is run. Furthermore, network the underlying hosting infrastructure [48]. SaaS pro-
bandwidth is a scarce resource, so in order to minimize viders obtain profits from the margin between the
network congestion and increase the overall through- operational cost of infrastructure and the revenue gen-
put of systems, it is necessary to research the problem erated from customers. Therefore, researching the
of locality-aware task scheduling to enhance data problem of SaaS layer RAS to minimize the overall
locality between jobs. In general, good data locality infrastructure cost without adverse influence to the
means that a computation task is located near the data customer for SaaS providers has become an urgent
source. There are many researchers [414] attempting problem to be solved in the cloud computing environ-
to achieve good data locality as much as possible. ment. There are some other research works [3134]
that focus on maximizing the profits of SaaS providers.
Problem 2: Reliability-aware scheduling problem
how to reduce failure rate of tasks in RAS of cloud, to Problem 5: Workflow scheduling problem how to
improve the reliability and execution efficiency of optimize workflow scheduling to trade off time and
cloud system. cost.
In the cloud environment with thousands of data Workflows constitute a common model for describing
nodes, resource failures are inevitable, which may lead a wide range of applications in distributed systems
to execution abort, data corruption and loss, perfor- [49]. Their scheduling methods try to minimize the
mance degradation, service level agreement (SLA) execution time (makespan) of the workflows in distrib-
violation, and consequently devastating loss of uted systems like Grids. However, in clouds, there is
customers. Therefore, researching the problem of another important parameter than execution time, that
reliability-aware scheduling to improve the reliability is, economic cost [50]. In general, the faster resources
of a cloud computing environment is a crucial are, the more they cost. Therefore, researching the
where ti is the parent task of tj and tj is the child task of solving the Problem 2.
ti, represents a precedence constraint, which indicates
that a child task cannot be executed until all of its par- 3.1.1 Locality-aware Task Scheduling
ent tasks have been completed. Labels on tasks repre-
sent computation costs, for example, number of Mass data processing platform (e.g. MapReduce [46],
instructions; labels on dependencies represent commu- Dryad [51], Hadoop [52], etc.) of cloud computing
nication cost, for example, bytes to transmit. An exam- requires the concurrent execution of data-intensive
ple of workflow is shown in Figure 1, and the main parallel jobs (each job consists of several subtasks). In
objective of the workflow scheduler is to decide where general, jobs will compete for computing resources
each task component of the directed acyclic graph will and network bandwidth, so in order to improve execu-
execute. tion efficiency through decreasing network transmi-
ssion during the execution of jobs, some researchers
have indicated that tasks should be assigned to the
3. RAS POLICIES AND ALGORITHMS node with data as far as possible to improve data local-
ity [414].
In this section, according to different optimization
goals, RAS algorithms and policies can be classified There are many improved methods, from improving
into three parts (see Figure 2): performance-based data locality to a tradeoff between data locality and
fairness, proposed based on the Hadoop schedulers, network environment is poor and the cluster is over-
thus, locality-aware task scheduling policies and algo- loaded, Balance-Reduce tries to enhance data locality
rithms are described from three perspectives: the and decreases data locality to increase optimization. In
Hadoop scheduler, improving data locality and com- addition, taking into account the job character and
bining data locality and fairness. data locality while killing tasks, which is lacking in
Fair Scheduler, Tao et al. [6] proposed an improved
3.1.1.1 Perspective of Hadoop Scheduler. Hadoop has Fair Scheduler, which adopts different policies for
several job schedulers, such as the FIFO Scheduler I/O-bound and CPU-bound jobs based on data locality
[53], the Hadoop on Demand Scheduler [54], the Fair to decrease the data movement and speed up the exe-
Scheduler [55, 56] and the Capacity Scheduler [57]. cution of jobs. Furthermore, Seo et al. [7], Zhang et al.
The First In First Out (FIFO) Scheduler, the Hadoop [8, 9] and Hammoud et al. [10] tried to improve
default scheduler, greedily searches for a data-local MapReduce performance through enhancing the data
task in the head-of-line job and allocates it to an idle locality of map tasks and reduce tasks respectively.
server. Although the policy is easy to implement, it Seo et al. [7] built the High Performance MapReduce
does not consider global optimization, so it cannot Engine (HPMR), which suggests inspecting input
guarantee higher data locality and should increase splits in the map phase and predicts which key-value
completion time of small jobs. The Hadoop on pairs to be partitioned in the reduce phase. HPMR
Demand Scheduler solves the drawback of poor assigns the expected data to map tasks near the future
response times for short jobs in the presence of large reducers, which leads to the time of job execution not
jobs in the FIFO scheduler by letting users share a com- changing with network bandwidth. Zhang et al. [8]
Downloaded by [78.56.45.90] at 00:21 18 June 2014
mon file system while owning private MapReduce proposed a next-k-node scheduling (NKS) method
clusters. However, it has two problems: poor locality based on a homogeneous computing environment to
and poor utilization. To solve these problems, Fair improve the data locality of map tasks. Afterwards,
Scheduler gives each user the illusion of owning a pri- Zhang et al. [9] introduced an original data locality
vate cluster, and dynamically redistributes capacity aware scheduling method based on the NKS method
unused by some users to other users. However, for MapReduce framework in heterogeneous environ-
although trying to minimize wasted computation, it ments. Contrary to HPMR, Locality-Aware Reduce
does not consider the job character and data locality, Task Scheduler, proposed by Hammoud et al. [10]
which increases the network traffic and data transfer attempts to collocate reduce tasks with the maximum
while rescheduling the killed Map/Reduce tasks. The required data computed after recognizing input data
Capacity Scheduler, developed by Yahoo, is similar to network locations and sizes to seek a good data
the Fair Scheduler. It is defined for large clusters by locality.
maintaining multiple job queues sorted by job arrival
time, and it provides a minimum capacity guarantee 3.1.1.3 Perspective of Combining Data Locality and
and the sharing of excess capacity among users. Fairness. Workload in cloud computing
includes two kinds of job: real-time and long-term
3.1.1.2 Perspective of Improving Data Locality. To fur- jobs. A real-time job has the characteristics of few sub-
ther achieve data locality, some researchers proposed tasks, short execution time and sensitive response
improved methods and algorithms [410] based on time. In contrast, a long-term job’s characteristics are
the existing Hadoop scheduling algorithms. To reach multiple subtasks and long execution time. The Fair-
global optimization and assign tasks efficiently in ness Scheduling algorithm can allocate resources for
Hadoop, Fischer et al. [4] introduced an idealized real-time jobs in time to achieve a fast response. How-
Hadoop model called Hadoop Task Assignment ever, data locality and fairness are in opposition to
(HTA) problem to find the assignment that minimized each other, and it is difficult to satisfy them simulta-
the job completion time, and proved it to be NP- neously in a scheduling policy. Therefore, to further
complete. To solve the problem, they designed a flow- improve job execution efficiency, some researchers
based algorithm called MaxCover-BalAssign, which focus on how to combine data locality and fairness in
works iteratively to produce a sequence of assign- scheduling policies and algorithms. Zaharia et al. [11]
ments and output the best one. Although the solution proposed a delay scheduling algorithm that has been
has been shown to be near optimal, it takes a long time used in the Fair Scheduler and is based on maxmin
to deal with a large problem instance owing to its high fairness [58]. The algorithm requires the job scheduled
time complexity. Therefore, Jin et al. [5] proposed a next according to fairness to wait for a small amount
heuristic task scheduling algorithm called Balance- of time; if it is unable to launch a data-local task, it
Reduce, which schedules tasks by taking a global view allows other jobs to execute tasks instead. It is worth
and adjusts task data locality dynamically according to waiting for a local task if assuming servers become
network state and cluster workload. When the idle quickly enough. However, this assumption is too
strict, and based on static waiting time threshold, it novel scheduling algorithm based on a kind of trust
cannot adapt to dynamic load of a data centre, so mechanism. These are described in more detail below.
delay scheduling does not work well when servers
slowly free up. Moreover, there are two works similar The performance of Hadoop Scheduler is seriously
to delay scheduling, Quincy [12] and Good Cache limited owing to its use of assumptions of homoge-
Compute (GCC) [13]. Quincy, which can be used for neous cluster nodes and linear progress made by tasks
Downloaded by [78.56.45.90] at 00:21 18 June 2014
Mapreduce and Dryad, maps the scheduling problem to speculate when to re-execute tasks that appear to be
to a graph data structure according to a global cost stragglers. However, in practice, the assumptions can-
model, and solves the problem by a well-known mini- not hold well in a heterogeneous environment charac-
mum-cost flow algorithm. It can achieve better fair- terized by equipment that varies greatly with respect
ness, but it has a negligible effect on improving data to the capacities of computation and communication,
locality. GCC sets a utility threshold that is the upper architectures, memory and power. Therefore, Zaharia
bound of the number of idle servers, and it can skip et al. [15] studied a discovery mechanism of abnormal
servers when the idle server number is below the util- tasks under a heterogeneous environment, and pro-
ity threshold. To address the issue of the delay sched- posed the Longest Approximate Time to End (LATE)
uling, Jin et al. [14] proposed an adaptive waiting time scheduling algorithm based upon the speculative exe-
threshold model and designed an adaptive delay cution scheduler in Hadoop. The LATE algorithm uses
scheduling algorithm (ADS). ADS adjusts jobs’ wait- a Slow Task Threshold, meaning that it is slow enough
ing time threshold dynamically to reduce the job to prevent unnecessary speculation, ranks tasks by
response time, according to the information on idle estimated time remaining and starts a copy of the high-
servers’ arrival intensity, available network band- est ranked task that has a progress rate lower than the
width and job running status. Slow Task Threshold. The advantage of the LATE
algorithm is robustness to node heterogeneity, since
A comparison of the policies and algorithms discussed only some of the slowest (not all) speculative tasks
above has been performed from the aspects of data are restarted. However, the method does not compute
locality, fairness, execution efficiency, utilization, the remaining time for tasks correctly, and cannot find
global optimization and so on. Table 1 shows the over- really slow tasks in the end. To this end, Chen et al.
view of locality-aware task scheduling policies and [16] proposed a Self-Adaptive MapReduce (SAMR)
algorithms. scheduling algorithm, which guarantees robustness
and significantly improves execution efficiency and
resource utilization of MapReduce. Although launch-
3.1.2 Reliability-aware Scheduling
ing backup tasks for slow tasks, Hadoop and LATE
Reliability refers to the probability of failure not cannot find the appropriate tasks that really delay the
appearing during run-time (program execution). In the overall execute time, because the two schedulers
cloud environment with thousands of data nodes, always use a static way of finding slow tasks. In con-
resource is inevitably unreliable, which has a great trast, SAMR incorporates historical information
effect on task execution and scheduling. Therefore, recorded on each node to tune parameters and dynam-
some studies have been performed on the reliability of ically find slow tasks that really need backup tasks. In
RAS. For example, the scheduling algorithm of Chen addition, Kumar et al. [17] also utilized task historical
et al. [16] is relative to Zaharia et al. [15], and has com- values such as past success rate, failure rate of task
mon characteristics, such as historical information, and previous execution time in the proposed algo-
with Kumar et al. [17]. Wang et al. [18] proposed a rithm, which adopts modified linear programming
problem transportation-based RAS for decentralized 3.2.1.1 Perspective of Server. The existing techniques
dynamic cloud computing. for energy savings at a server farm [21] can roughly be
divided into two categories: dynamic voltage/fre-
Furthermore, by evaluating the trustworthiness of quency scaling (DVFS) [61] inside a server and
machines in a cloud environment, Wang et al. [18] pro- dynamic power management [62], turning on/off serv-
posed a trust dynamic level scheduling algorithm ers on demand. The former depends on the setting of
named Cloud-DLS, which is based on a kind of trust hardware components to perform scaling tasks [19, 20]
mechanism and integrates the existing DLS algorithm and the latter ensures near-zero electricity consump-
[59, 60] to decrease the failure probability of the task tion through turning off servers [21, 22]. Mezmaz et al.
assignments. [19] and Huang et al. [20] adopted DVFS technique for
Downloaded by [78.56.45.90] at 00:21 18 June 2014
decisions, VMs that encapsulate virtualized services simple optimization process to maximize the
can be created, copied, moved and deleted. provider’s profit based on an economic model merging
several factors, such as power consumption, SLA, out-
Younge et al. [23] described a Green Cloud Framework sourcing capabilities, virtualization overhead and het-
to optimize energy consumption in a cloud computing erogeneity management, and then VM placement were
environment and considered relative aspects of VM decided based on the optimization result.
scheduling using a new greed-based algorithm to mini-
mize power consumption and VM management using Furthermore, the Credit Scheduler [64], which is the
VM imaging and live migration to reduce the number default VM scheduler of Xen, is a kind of fair-share
of active hosts. However, this neglects SLA violation scheduler that distributes minimum processor time to
penalties when reducing energy consumption. Belogla- a VM according to the proportion of selected VM
zov et al. [2426] defined an architectural framework credit value to the sum of all VM credit values. How-
and principles for energy-efficient cloud computing ever, processor time is not a satisfactory criterion for
and divided energy optimization approaches into two estimating energy consumption accurately by a VM
sub-phases: “VM Selection” and “VM Placement”. and there are some in-processor events that affect pro-
Based on this architecture, Beloglazov and Buyya [24] cessor energy consumption more significantly. There-
proposed three stages of VM placement optimization: fore, Kim et al. [28, 29] suggested an estimating energy
optimizing the utilization of resources to minimize the consumption model that calculates the amount of con-
number of physical nodes in use, optimizing the sumption of a VM via monitoring of processor perfor-
network to minimize the overhead of data transfer over mance counters, and then, by modifying the Credit
Downloaded by [78.56.45.90] at 00:21 18 June 2014
network and optimizing temperature to minimize cool- Scheduler, they proposed an Energy-Credit Scheduler
ing system operation, and then Beloglazov et al. [25] that schedules VMs according to user-defined budgets
introduced four heuristics that can effectively handle instead of the processor time credits to limit the energy
strict SLA requirements for the “VM selection” sub- consumption rates of VMs.
phase: a Single Threshold policy and three double-
threshold policies like the Minimization of Migrations 3.2.1.3 Perspective of Multiple Data Centers. The solu-
policy, the Highest Potential Growth policy and the tions mentioned above target saving energy within a
Random Choice policy, and the Modified Best-Fit single server or a single data centre (with many serv-
Decreasing algorithm for the “VM Placement” sub- ers) in a single location. Garg et al. [30] proposed some
phase. However, fixed values for the thresholds are not simple, yet effective generic scheduling policies based
suitable for an environment with dynamic and unpre- on various contributing factors such as energy cost,
dictable workloads, and thus Beloglazov and Buyya carbon emission rate, workload and CPU power effi-
[26] proposed an approach for dynamic consolidation ciency to achieve energy efficiency across multiple
of VMs based on adaptive utilization thresholds, which data centre locations.
further reduces SLA violation and VM migrations.
A comparison of these policies and algorithms dis-
Moreover, Goiri et al. [27] adopted an economic cussed above has been made from the aspects of
approach to solve the energy-efficient and multiface- energy-aware RAS, server, virtualization, multiple data
ted resource management problem. They performed a centres, application, SLA and QoS and so on. Table 3
shows the overview of energy-aware RAS policies and Table 4: The overview of SaaS-layer RAS policies and
algorithms. algorithms
Wu et al. Li et al. Espadas et al.
3.2.2 SaaS Layer RAS [31, 32] [33] [34]
p p p
There are several studies that focus on the SaaS layer, Cost-effective
p p p
where the SaaS provider (such as Saleforce.com) leases Profit of SaaS provider
p
Profit of cloud provider
resources from cloud providers (such as Amazon EC2) p
QoS of users
and provides services to SaaS users with the goal of min- p
Tenant-based
imizing the payment and maximizing the profit while p
Improved workload distribution
guaranteeing meeting QoS requirement. However, less
research has been performed compared with the other
four, which implies that there is a large space for further
study. Detailed descriptions of SaaS-layer RAS policies about workflow scheduling focuses on solving the
and algorithms have been made as follows. Problem 5.
Wu et al. [31, 32] proposed three cost-effective admis- 3.3.1 Workflow Scheduling
sion control and scheduling algorithms: minimizing
the number of VMs, rescheduling and exploiting the Workflow scheduling is a kind of global task schedul-
penalty delay, for SaaS providers to maximize profit ing, because it focuses on mapping each task to a suit-
able resource and ordering the tasks on each resource
Downloaded by [78.56.45.90] at 00:21 18 June 2014
meta-heuristic methods have been proposed for cloud deal with the optimization of the Task-to-VM
workflow scheduling. assignments.
IaaS Cloud Partial Critical Paths and IaaS Cloud Partial which is based on a three-objective weighted cost
Critical Paths with Deadline Distribution, which con- model considering execution time, reliability and
sider several main features of the current commercial financial cost, for parallel execution of scientific work-
Clouds, such as on-demand resource provisioning, flows in the cloud to meet the deadline and fit the bud-
homogeneous networks and the pay-per-use pricing get provided by scientists. Varalakshmi et al. [43]
model. Furthermore, Genez et al. [38] presented an adopted the resource discovery algorithm to reduce
integer linear program (ILP) to solve the workflow flooding of information first, and then proposed the
scheduling problem in SaaS or PaaS clouds with two optimal workflow scheduling algorithm to schedule
levels of SLA, and then presented two heuristics, task-based clusters to the resources within users’ mul-
namely BMT and BMEMT, to find feasible integer sol- tiple QoS parameters such as execution time, reliability
utions that minimize the global SaaS infrastructure and monetary cost. This achieves a significant
monetary costs to reach maximum profit and SLA vio- improvement in CPU utilization by reducing the slow-
lations over the relaxed runs of the proposed ILP. down rates and the waiting times.
The meta-heuristic-based approach is to develop a A comparison of these policies and algorithms dis-
scheduling algorithm based on a meta-heuristic cussed above has been made from the aspects of
method (e.g. Genetic Algorithm (GA), Particle Swarm deadline-constraint, budget-constraint, multiple QoS-
Optimization (PSO) and Ant Colony Optimization constraints, heuristic, meta-heuristic, optimized cost
(ACO)), which provides a general solution method for and optimized makespan. Table 5 shows an overview
developing a specific heuristic to satisfy a particular of workflow scheduling policies and algorithms.
kind of problem. Pandey et al. [39] presented a sched-
uling heuristic based on PSO to minimize the overall 4. COMPARATIVE ANALYSIS
cost of executing application workflows, including the
communication cost between resources and the execu- In this section, a comparative analysis of the five prob-
tion cost of compute resources. Wu et al. [40] pro- lems discussed above is presented in Table 6 in terms
posed a market-oriented hierarchical scheduling of several criteria, that is, execution efficiency, cost-
approach, including a service-level scheduling stage effective, reliability, resource utilization and QoS. In
and a task-level scheduling stage to satisfy the QoS addition, two examples of RAS policies and algorithms
constraints for each individual task while minimizing are given for each problem during the comparison.
the whole cost of cloud workflow system. The service- Problem 1 focuses on locality-aware task scheduling
level scheduling adopts the package-based random to optimize execution efficiency, and takes the
scheduling algorithm to deal with the Task-to-Service MaxCover-BalAssign algorithm proposed by Fischer
assignment, and the task-level scheduling adopts the et al. [4] and the ADS algorithm proposed by Jin et al.
ACO-based scheduling algorithm, which can be [14] as comparative examples. Problem 2 focuses on
replaced by a GA and PSO-based scheduling algo- RAS to improve the reliability of cloud computing and
rithm according to the requirement of performance, to improve execution efficiency, and takes the SAMR
scheduling algorithm proposed by Chen et al. [16] and saving operational cost and satisfying QoS of users,
Transportation Problem based scheduling algorithm and Problem 5 not only improves execution efficiency,
proposed by Kumar et al. [17] as comparative exam- but also maximizes profits without violating the QoS
ples. Problem 3 focuses on energy-aware scheduling to requirements of users.
Downloaded by [78.56.45.90] at 00:21 18 June 2014
resource utility. Third, RAS of the hybrid cloud [70, 71] European Conference on Computer systems, New York, Apr.
will arouse researchers’ attention with the requirement 2010, pp. 26578.
of deciding to allocate applications to the public cloud 12. M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and
A. Goldberg, “Quincy: Fair scheduling for distributed comput-
or to private cloud. This should optimize the total cost ing clusters”, in Proceedings of ACM SIGOPS 22nd symposium
for running all applications with the different charac- on Operating Systems Principles, New York, 2009, pp. 26176.
teristics between the public cloud and the private 13. I. Raicu, I. T. Foster, Y. Zhao, P. Little, et al., “The quest for scal-
cloud. Fourth, multi-objective RAS has been a popular able support of data-intensive workloads in distributed sys-
tems”, in Proceedings of the 18th ACM International
topic in the field of academic research. The more Symposium on High Performance Distributed Computing, New
parameters are taken into account, the better the per- York, Jun. 2009, pp. 20716.
formance and lower the cost will have. 14. J. H. Jin, J. Z. Luo, A. B. Song, and F. Dong, “Adaptive delay
scheduling algorithm based on data center load analysis”, J.
Commun., Vol. 32, no. 7, pp. 4756, Jul. 2011.
Funding 15. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica,
“Improving mapreduce performance in heterogeneous envi-
This work was supported in part by Special Public Sector ronments”, in 8th USENIX Symposium on Operating Systems
Research Program of China (no. GYHY201206030), Qing Lan Design and Implementation (OSDI), San Diego, CA, Dec. 2008,
Project of Jiangsu Province and was also supported by the pp. 2942.
Priority Academic Program Development (PAPD) of Jiangsu 16. Q. Chen, D. Q. Zhang, M. Y. Guo, Q. N. Deng, and S. Guo,
Higher Education Institutions. “SAMR: A self-adaptive MapReduce scheduling algorithm in
heterogeneous environment”, in 2010 IEEE 10th International
Conference on Computer and Information Technology (CIT),
REFERENCES Bradford, June 2010, pp. 273643.
S. K. S. Kumar and P. Balasubramanie, “Dynamic scheduling
Downloaded by [78.56.45.90] at 00:21 18 June 2014
17.
1. J. Z. Luo, J. H. Jin, A. B. Song, and F. Dong, “Cloud comput- for cloud reliability using transportation problem”, J. Computer
ing: Architecture and key technologies”, J. Commun., Vol. 32, Sci., Vol. 8, no. 10, pp. 161526, Oct. 2012.
no. 7, pp. 321, Jul. 2011. 18. W. Wang, G. S. Zeng, D. Z. Tang, and J. Yao, “Cloud-DLS:
2. Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: Dynamic trusted scheduling for Cloud computing”, Expert Sys-
State-of-the-art and research challenges”, J. Internet Services tems Applic. Int. J., Vol. 39, no. 3, pp. 23219, Feb. 2012.
Applic., Vol. 1, no. 1, pp. 718, May 2010. 19. M. Mezmaz, N. Melab, Y. Kessaci, Y.C. Lee, et al., “A parallel bi-
3. Y. J. Wang, W. D. Sun, S. Zhou, and X. Y. Li, “Key technologies objective hybrid metaheuristic for energy-aware scheduling for
of distributed storage for cloud computing”, J. Software, Vol. cloud computing systems”, J. Parallel Distributed Comput.,
23, no. 4, pp. 962986, Feb. 2012. Vol. 71, no. 11, pp. 1497508, Nov. 2011.
4. M. J. Fischer, X. Y. Su, and Y. T. Yin, “Assigning tasks for effi- 20. Q. J. Huang, S. Su, J. Li, P. Xu, K. Shuang, and X. Huang,
ciency in Hadoop: Extended abstract”, in 22nd ACM Sympo- “Enhanced energy-efficient scheduling for parallel applications
sium on Parallelism in Algorithms and Architectures, New York, in Cloud”, in 12th IEEE/ACM International Symposium on Clus-
2010, pp. 309. ter, Cloud and Grid Computing (CCGrid), Ottawa, May 2012,
5. J. H. Jin, J. Z. Luo, A. B. Song, F. Dong, and R. Q. Xiong, “Bar: pp. 7816.
An efficient data locality driven task scheduling algorithm for 21. T. V. T. Duy, Y. Sato, and Y. Inoguchi, “Performance evaluation
cloud computing”, in 11th IEEE/ACM International Symposium of a green scheduling algorithm for energy savings in cloud
on Cluster, Cloud and Grid Computing (CCGrid), Newport computing”, in 2010 IEEE International Symposium on Parallel
Beach, USA, May 2011, pp. 295304. and Distributed Processing, Workshops and Phd Forum
6. Y. C. Tao, Q. Zhang, L. Shi, and P. H. Chen, “Job scheduling opti- (IPDPSW), Atlanta, GA, Apr. 2010, pp. 18.
mization for multi-user MapReduce Clusters”, in 2011 Fourth 22. M. Mazzucco and D. Dyachuk, “Optimizing cloud providers
International Symposium on Parallel Architectures, Algorithms revenues via energy efficient server allocation”, Sustainable
and Programming, Tianjin, China, Dec. 2011, pp. 2137. Comput.: Informat. Systems, Vol. 2, no. 1, pp. 112, Mar. 2012.
7. S. Seo, I. Jang, K. Woo, I. Kim, J. Kim, and S. Maeng, “HPMR: 23. A. J. Younge, G. von Laszewski, L. Z. Wang, S. Lopez-Alarcon,
Prefetching and pre-shuffling in shared MapReduce computa- and W. Carithers. “Efficient resource management for cloud
tion environment”, in IEEE International Conference on Cluster computing environments”, in 2010 International Green Com-
Computing and Workshops (CLUSTER), New Orleans, LA, puting Conference, Aug. 2010, pp. 35764.
Sep. 2009, pp. 18. 24. A. Beloglazov and R. Buyya, “Energy efficient resource man-
8. X. H. Zhang, Z. Y. Zhong, B. Tu, S. Z. Feng, and J. P. Fan, agement in virtualized cloud data centers”, in 2010 10th IEEE/
“Improving data locality of mapreduce by scheduling in homo- ACM International Conference on Cluster, Cloud and Grid Com-
geneous computing environments”, in Proceedings of IEEE puting (CCGrid), Melbourne, May 2010, pp. 82631.
9th International Symposium on Parallel and Distributed Proc- 25. A. Beloglazov, J. Abawajy, and R. Buyya, “Energy-aware
essing with Applications, Busan, Korea, May 2011, pp. 1206. resource allocation heuristics for efficient management of data
9. X. H. Zhang, Y. H. Feng, S. Z. Feng, J. P. Fan, and Z. Ming, “An centers for Cloud computing”, Future Generation Comput.
effective data locality aware task scheduling method for Map- Syst., Vol. 28, no. 5, pp. 75568, May 2012.
Reduce framework in heterogeneous environments”, in 2011 26. A. Beloglazov and R. Buyya, “Adaptive threshold-based
International Conference on Cloud and Service Computing approach for energy-efficient consolidation of virtual machines
(CSC), Hong Kong, Dec. 2011, pp. 23442. in cloud data centers”, in Proceedings of the 8th International
10. M. Hammoud, and M. F. Sakr, “Locality-aware reduce task Workshop on Middleware for Grids, Clouds and e-Science,
scheduling for MapReduce”, in 2011 IEEE Third International Nov. 2010, pp. 16.
Conference on Cloud Computing Technology and Science 27. I. Goiri, J. Ll. Berral, J. O. Fito , R. Nou, J. Guitart, R.
, F. Julia
(CloudCom), Athens, Nov. 2011, pp. 5706. Gavalda , and J. Torres, “Energy-efficient and multifaceted
11. M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, et al., resource management for profit-driven virtualized data cen-
“Delay scheduling: A simple technique for achieving locality ters”, Future Generation Comput. Syst., Vol. 28, no. 5,
and fairness in cluster scheduling”, in Proceedings of 5th pp. 71831, May 2012.
28. N. Kim, J. Cho, E. Seo, “Energy-based accounting and sched- in 2012 IEEE International Systems Conference (SysCon), Van-
uling of virtual machines in a cloud system”, in 2011 IEEE/ACM couver, Mar. 2012, pp. 16.
International Conference on Green Computing and Communi- 45. Moving computation is cheaper than moving data. Available:
cations (GreenCom), Sichuan, China, Aug. 2011, pp. 17681. http://hadoop.apache.org/docs/r0.20.2/hdfs_design.html
29. N. Kim, J. Cho, and E. Seo, “Energy-credit scheduler: An 46. J. Dean, and S. Ghemawat, “MapReduce: Simplified data
energy-aware virtual machine scheduler for cloud systems”, processing on large clusters”, Commun. ACM, Vol. 51, no. 1,
Future Generation Comput. Syst., Vol. 32, pp. 12837, Mar. pp. 10713, Jan. 2008.
2014.
47. C. N. Ho€ fer, G. Karagiannis, “Cloud computing services: taxon-
30. S. K. Garg, C. S. Yeo, A. Anandasivam, and R. Buyya, “Envi- omy and comparison”, J. Internet Services Applic., Vol. 2, no.
ronment-conscious scheduling of HPC applications on distrib- 2, pp. 8194, Sept. 2011.
uted Cloud-oriented data centers”, J. Parallel Distributed
Comput., Vol. 71, no. 6, pp. 73249, June 2011. 48. R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Bran-
dic, “Cloud computing and emerging IT platforms: Vision,
31. L. L. Wu, S. K. Garg, and R. Buyya, “SLA-based resource allo- hype, and reality for delivering computing as the 5th utility”,
cation for software as a service provider (SaaS) in cloud com- Future Generation Comput. Syst., Vol. 25, no. 6, pp.
puting environments”, in 11th IEEE/ACM International 599616, June 2009.
Symposium on Cluster, Cloud and Grid Computing (CCGrid),
Newport Beach, USA, May 2011, pp. 195204. 49. E. Deelman, D. Gannon, M. Shields, and I. Taylor, “Workflows
and e-science: An overview of workflow system features and
32. L. L. Wu, S. K. Garg, and R. Buyya, “SLA-based admission con- capabilities”, Future Generation Comput. Syst., Vol. 25, no. 5,
trol for a Software-as-a-Service provider in Cloud computing envi- pp. 52840, May 2009.
ronments”, J. Comput. Syst. Sci., Vol. 78, no. 5, pp. 128099,
Sept. 2012. 50. S. Abrishami, M. Naghibzadeh, and D. H. J. Epema, “Deadline-
constrained workflow scheduling algorithms for infrastructure
33. C. L. Li, L. Y. Li “Optimal resource provisioning for cloud com- as a service clouds”, Future Generation Comput. Syst., Vol. 29,
puting environment”, The J. Supercomputing, Vol. 62, no. 2, no. 1, pp. 15869, Jan. 2013.
pp. 9891022, Apr. 2012.
51. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Dis-
nez, M. Molina, R. Ramírez, and
Downloaded by [78.56.45.90] at 00:21 18 June 2014
34. J. Espadas, A. Molina, G. Jime tributed data-parallel programs from sequential building
D. Concha, “A tenant-based resource allocation model for scal- blocks”, in Proceedings of the 2007 European Conference
ing Software-as-a-Service applications over cloud computing Computer Systems, Lisbon, pp. 5972, Mar. 2007.
infrastructures”, Future Generation Comput. Syst., Vol. 29, no.
1, pp. 27386, Jan. 2013. 52. Hadoop.Available: http://hadoop.apache.org/
35. L. F. Bittencourt and E. R. M. Madeira, “HCOC: A cost optimiza- 53. T. White, Hadoop: The Definitive Guide, 1st edn. O’Reilly
tion algorithm for workflow scheduling in hybrid clouds”, J. Media, 2009.
Internet Serv. Applic., Vol. 2, no.3, pp. 20727, 2011. 54. Hadoop on Demand Scheduler. Available: http://hadoop.
36. S. Abrishami, M. Naghibzadeh, “Deadline-constrained work- apache.org/docs/r1.1.1/hod_scheduler.html
flow scheduling in software as a service Cloud”, Scientia Ira- 55. Fair Scheduler. Available: http://hadoop.apache.org/docs/
nica, Vol. 19, no. 3, pp. 6809, June 2012. r1.1.1/fair_scheduler.html
37. S. Abrishami, M. Naghibzadeh, and D. H. J. Epema, “Deadline- 56. M. Zaharia, D. Borthakur, J. Sarma, et al., “Job scheduling for
constrained workflow scheduling algorithms for infrastructure multi-user mapreduce clusters”, EECS Department, University
as a service clouds”, Future Generation Comput. Syst., Vol. 29, of California, Berkeley, Tech. Rep. UCB/EECS-2009-2055,
no. 1, pp. 15869, Jan. 2013. 2009.
38. T. A. L. Genez, L. F. Bittencourt, E. R. M. Madeira, “Workflow 57. Capacity Scheduler. Available: http://hadoop.apache.org/
scheduling for SaaS/PaaS cloud providers considering two docs/r1.1.1/capacity_scheduler.html
SLA Levels”, in IEEE Network Operations and Management 58. Maxmin fairness. Available: http://en.wikipedia.org/wiki/Max-
Symposium (NOMS), Maui, HI, Apr. 2012, pp. 90612. minfairness
39. S. Pandey, L. Wu, S. M. Guru, and R. Buyya, “A particle swarm 59. A. Dogan and F. Ozg€ u €ner, “Reliable matching and scheduling
optimization-based heuristic for scheduling workflow applica- of precedence-constrained tasks in heterogeneous distributed
tions in cloud computing environments”, in 24th IEEE Interna- computing”, in Proceedings of 29th International Conference
tional Conference on Advanced Information Networking and on Parallel Processing, Toronto, Jun. 2000, pp. 30714.
Applications (AINA), Perth, Apr. 2010, pp. 4007. € u
60. A. Dogan and F. Ozg € ner, “Matching and scheduling algo-
40. Z. J. Wu, X. Liu, Z. W. Ni, D. Yuan, and Y. Yang, “A market-ori- rithms for minimizing execution time and failure probability of
ented hierarchical scheduling strategy in cloud workflow sys- applications in heterogeneous computing”, IEEE Trans. Parallel
tems”, J. Supercomputing, Vol. 63, no. 1, pp. 25693, Jan. Distributed Syst., Vol. 13, no. 3, pp. 30823, Mar. 2002.
2013.
61. N. B. Rizvandi, J. Taheri, and A. Y. Zomaya, “Some observa-
41. L. F. Zeng, B. Veeravalli, and X. L. Li, “ScaleStar: Budget con- tions on optimal frequency selection in DVFS-based energy
scious scheduling precedence-constrained many-task work- consumption minimization”, J. Parallel Distributed Comput.,
flow applications in cloud”, in 2012 IEEE 26th International Vol. 71, no. 8, pp. 115464, Aug. 2011.
Conference on Advanced Information Networking and Applica-
tions (AINA), Fukuoka, Japan, Mar. 2012, pp. 53441. 62. L. Benini, A. Bogliolo, and G. D. Micheli, “A survey of design
techniques for system-level dynamic power management”,
42. D. Oliveira, K. A. C. S. Ocan ~ a, F. Baia
~o, and M. Mattoso, “A IEEE Trans. Very Large Scale Integration (VLSI) Syst., Vol. 8, no.
Provenance-based Adaptive Scheduling Heuristic for Parallel 3, pp. 299316, Jun. 2000.
Scientific Workflows in Clouds”, J. grid Computing, Vol. 10, no.
3, pp. 52152, Sept. 2012. 63. A. Berl, E. Gelenbe, M. D. Girolamo, G. Giuliani, H. D. Meer, M.
Q. Dang, and K. Pentikousis, “Energy-efficient cloud comput-
43. P. Varalakshmi, A. Ramaswamy, A. Balasubramanian, and P. ing”, Comput. J., Vol. 53, no. 7, pp. 104551, Jul. 2010.
Vijaykumar, “An optimal workflow based scheduling and
resource allocation in cloud”, in Proceedings of First Interna- 64. E. Ackaouy, “The Xen Credit CPU scheduler”, in Proceedings
tional Conference on Advances in Computing and Communi- of 2006 Fall Xen Summit, Sep. 2006.
cations (ACC), Kochi, India, Jul. 2011, pp. 41120. 65. J. Yu, R. Buyya, and K. Ramamohanarao, “Workflow schedul-
44. E. M. Mocanu, M. Florea, M. I. Andreica, and N. Tapus, “Cloud ing algorithms for grid computing”, Metaheuristics Schedul.
computing Task scheduling based on genetic algorithms”, Distributed Comput. Environ. Stud. Comput. Intell., Vol. 146,
pp. 173214, 2008.
66. M. R. Garey, and D. S. Johnson, Computers and Intractability: a of the 2010 international conference on Web information sys-
Guide to the Theory of NP-Completeness. Philadelphia, PA: W. tems and mining (WISM), Sanya, China, Oct. 2010, pp. 2717.
H. Freeman, 1979. 70. R. V. D. Bosschce, K. Vanmechelen, and J. Broeckhove,
67. S. Abrishami, M. Naghibzadeh, and D. H. J. Epema, “Cost- “Online cost-efficient scheduling of deadline-constrained work-
driven scheduling of Grid workflows using partial critical loads on hybrid clouds”, Future Generation Comput. Syst., Vol.
paths”, in 2010 11th IEEE/ACM International Conference on 29, no. 4, pp. 97385, June 2013.
Grid Computing (GRID), Brussels, Oct. 2010, pp. 818. 71. R. N. Calheiros, C. Vecchiola, D. Karunamoorthy, and R.
68. D. Babu and P. V. Krishna, “Honey bee behavior inspired load Buyya, “The Aneka platform and QoS-driven resource provi-
balancing of tasks in cloud computing environments”, Appl. sioning for elastic applications on hybrid Clouds”, Future Gen-
Soft Comput., Vol. 13, no. 5, pp. 2292303, May 2013. eration Comput. Syst., Vol. 28, no. 6, pp. 86170, June 2012.
69. Y. Q. Fang, F. Wang, and J. W. Ge, “A task scheduling algorithm
based on load balancing in cloud computing”, in Proceedings
Authors
Tinghuai Ma is an associate professor in com- Licheng Zhao is the director-general of the
puter sciences at Nanjing University of Infor- National Meteorological Information Center of
mation Science & Technology, China. He the China Meteorological Administration
received his Bachelor’s (HUST, China, 1997), (CMA), and is a senior engineer on research.
master’s (HUST, China, 2000), PhD (Chinese He received his Bachelor’s degree in com-
Downloaded by [78.56.45.90] at 00:21 18 June 2014
Academy of Science, 2003) and was post-doc- puter science at the Wuhan University, China
toral associate (AJOU University, 2004). From in 1983. He has also minored in MS courses in
November 2007 to July 2008, he visited the meteorology of the Lanzhou University. From
Chinese Meteorology Administration. From July 1983 to July 2006, he worked in the
February 2009 to August 2009, he was a visiting professor in Ubiqui- National Satellite Meteorological Center, later served as the deputy
tous Computing Lab, Kyung Hee University. His research interests director general of the center, deputy chief designer for construction
are in the areas of data mining, privacy protected in ubiquitous sys- projects on the meteorological satellite ground application system,
tem, grid computing, ubiquitous computing, privacy preserving etc. etc. Since October 2009 when assuming his current position, he has
He has published more than 70 journal and conference papers. He is been engaging in system design, construction, organization, opera-
principle investigator of several NSF projects and is a member of tion and management of CMA’s information networking system. He
IEEE. has won a First National Award for Scientific Advance and a CMA Sci-
ence Progress Award. He is also the director of the Basic Meteorologi-
E-mail: [email protected] cal Information Technology Standardization Committee, deputy
Ya Chu received her Bachelor’s degree in director of the Technical Committee for Standardization of Meteoro-
Software engineering from Nanjing University logical Satellites and Space Weather, executive member of the China
Meteorological Society, associate editor-in-chief of the Journal of
of Information Science & Technology, China in
2011. Currently, she is a candidate for a Mas- Applied Meteorology, and member of the editorial board of the Jour-
ter’s degree in meteorological information nal of Remote Sensing.
technology and security in Nanjing University E-mail: [email protected]
of Information Science & Technology. Her
research interest is cloud computing. Otgonbayar Ankhbayar received his Bache-
lor’s degree from School of Mathematics and
E-mail: [email protected] Computer Science, National University of
Mongolia in 2011. Currently, he is a candidate
for a Master in computer science and engi-
neering in Nanjing University of Information
Science & Technology. His research interest is
privacy preservation and data mining.
E-mail: [email protected]