0% found this document useful (0 votes)
64 views14 pages

Concurrency and Computation - 2020 - Rjoub - Deep and Reinforcement Learning For Automated Task Scheduling in Large Scale

This document discusses using deep and reinforcement learning techniques for automated task scheduling in large-scale cloud computing systems. It proposes four approaches: reinforcement learning (RL), deep Q networks, recurrent neural network long short-term memory (RNN-LSTM), and deep reinforcement learning combined with LSTM (DRL-LSTM). An experiment using real-world datasets found that the DRL-LSTM approach outperformed the others, minimizing CPU usage by up to 67% and RAM memory usage by up to 72% compared to other scheduling algorithms. The document argues that deep and reinforcement learning can help automate task scheduling in clouds in a way that reduces resource consumption and task waiting times.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views14 pages

Concurrency and Computation - 2020 - Rjoub - Deep and Reinforcement Learning For Automated Task Scheduling in Large Scale

This document discusses using deep and reinforcement learning techniques for automated task scheduling in large-scale cloud computing systems. It proposes four approaches: reinforcement learning (RL), deep Q networks, recurrent neural network long short-term memory (RNN-LSTM), and deep reinforcement learning combined with LSTM (DRL-LSTM). An experiment using real-world datasets found that the DRL-LSTM approach outperformed the others, minimizing CPU usage by up to 67% and RAM memory usage by up to 72% compared to other scheduling algorithms. The document argues that deep and reinforcement learning can help automate task scheduling in clouds in a way that reduces resource consumption and task waiting times.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023].

See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Received: 10 February 2020 Revised: 8 May 2020 Accepted: 20 May 2020
DOI: 10.1002/cpe.5919

SPECIAL ISSUE PAPER

Deep and reinforcement learning for automated task


scheduling in large-scale cloud computing systems

Gaith Rjoub1 Jamal Bentahar1 Omar Abdel Wahab2 Ahmed Saleh Bataineh1

1 Concordia Institute for Information Systems

Engineering, Concordia University, Montreal, Summary


Quebec, Canada Cloud computing is undeniably becoming the main computing and storage platform
2 Department of Computer Science and
for today’s major workloads. From Internet of things and Industry 4.0 workloads
Engineering, Université du Québec en
Outaouais, Gatineau, Quebec, Canada to big data analytics and decision-making jobs, cloud systems daily receive a mas-
sive number of tasks that need to be simultaneously and efficiently mapped onto
Correspondence
Jamal Bentahar, Concordia Institute for the cloud resources. Therefore, deriving an appropriate task scheduling mechanism
Information Systems Engineering, Concordia
that can both minimize tasks’ execution delay and cloud resources utilization is of
University, Sir George Williams Campus, 1455
De Maisonneuve Blvd. W., Montreal, Quebec, prime importance. Recently, the concept of cloud automation has emerged to reduce
Canada.
the manual intervention and improve the resource management in large-scale cloud
Email: [email protected]
computing workloads. In this article, we capitalize on this concept and propose
Funding information four deep and reinforcement learning-based scheduling approaches to automate the
Defence Research and Development Canada
(DRDC), Grant/Award Number: Innovation for process of scheduling large-scale workloads onto cloud computing resources, while
Defence Excellence and Security (IDEaS); reducing both the resource consumption and task waiting time. These approaches
Natural Sciences and Engineering Research
Council of Canada (NSERC), Grant/Award are: reinforcement learning (RL), deep Q networks, recurrent neural network long
Number: Discovery grant short-term memory (RNN-LSTM), and deep reinforcement learning combined with
LSTM (DRL-LSTM). Experiments conducted using real-world datasets from Google
Cloud Platform revealed that DRL-LSTM outperforms the other three approaches. The
experiments also showed that DRL-LSTM minimizes the CPU usage cost up to 67%
compared with the shortest job first (SJF), and up to 35% compared with both the round
robin (RR) and improved particle swarm optimization (PSO) approaches. Moreover,
our DRL-LSTM solution decreases the RAM memory usage cost up to 72% compared
with the SJF, up to 65% compared with the RR, and up to 31.25% compared with the
improved PSO.

KEYWORDS
cloud automation, deep learning, reinforcement learning, task scheduling

1 INTRODUCTION

Cloud computing is an Internet-based platform that delivers computing services such as software, databases, servers, storage, analytics, and net-
working to users and companies at a large scale. Cloud computing is mainly praised for its ability to reduce operational costs and its always-on
availability.1 The growing adoption of Internet of things (IoT) in many applications (eg, intelligent transportation systems, healthcare management,
and so on) and the associated generation of unprecedented amounts of data has led to an increased reliance on the cloud technology to store and
analyze such data in resource and cost-efficient manner. Cloud computing runs all these applications on a virtual platform in the form of virtual

Concurrency Computat Pract Exper. 2021;33:e5919. wileyonlinelibrary.com/journal/cpe © 2020 John Wiley & Sons, Ltd. 1 of 14
https://doi.org/10.1002/cpe.5919
15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 14 RJOUB ET AL.

machines (VMs), where every resource dimension (eg, CPU, memory, bandwidth, and so on) is divided among the different VMs. These applications
need to be executed in a parallel manner in order to efficiently utilize the different cloud resources. Therefore, a scheduling approach to plan and
order the execution of the different applications is needed to guarantee optimal resource utilization and execution performance.2 Due to the tremen-
dous amounts of applications that need to be simultaneously scheduled on the cloud platform, allocating these applications manually becomes an
almost impossible mission.
To address this challenge, cloud automation is an emerging concept that capitalizes on the booming advancements in the artificial intelligence
domain to minimize the manual efforts of scheduling and managing cloud computing workloads.3 It encompasses designing automation techniques
and tools that execute on top of the virtualized cloud environment to take real-time decisions in terms of resource allocation and management.
In this article, we provide a novel contribution toward the concept of cloud automation by designing and comparing four deep and reinforce-
ment learning-based scheduling approaches for cloud computing. The objective is to analyze and identify the most appropriate technique that best
improves the task execution performance, while minimizing the resource cost on the cloud system.

1.1 Problem statement

The topic of task scheduling in cloud computing environments has been extensively addressed in the literature. The current scheduling approaches
can be categorized into two main classes, that is, traditional approaches and intelligent approaches. Traditional approaches focus on tuning and
extending conventional scheduling approaches such as first-in-first-out, shortest job first (SJF), round-robin (RR), min-min, and max-min4-6 to fit
the cloud computing settings. The main limitation of the traditional approaches is that they can support only a limited number of parameters (eg,
makespan) to optimize. This makes them unsuitable for the cloud computing environment in which many parameters such as task Makespan and
CPU, memory, and bandwidth costs need to be simultaneously optimized.
Intelligent approaches, on the other hand,7-15 capitalize on artificial intelligence techniques such as fuzzy logic, particle swarm optimization
(PSO), and genetic algorithm (GA) to devise more solid scheduling techniques optimizing several parameters simultaneously. However, similar to
the traditional approaches, the intelligent scheduling approaches operate in an offline fashion through attempting to optimize a series of param-
eters upon the receipt of a certain task. This causes high execution time, which makes it inefficient for delay-critical tasks such as IoT and big data
analytics tasks.
Recently, many attempts16-22 have been made to leverage the booming advancements in the field of machine learning, especially deep learn-
ing, to automate the resource management process in the cloud system. These approaches are mainly based on the idea of examining historical
resource data from VMs in order to predict the future workload. The objective is to improve the resource management and avoid the under and
over-provisioning cases. In this article, we investigate the application of four deep and reinforcement learning approaches to automate the process
of scheduling tasks over the cloud. In a preliminary version of this work,23 we proposed an intelligent technique that helps cloud providers schedule
tasks in such a way that minimizes the resources utilization and the overall cost. This article builds on and extends our previous work by designing,
developing, and comparing different cloud automation models that enable cloud providers to automatically schedule large-scale workloads on the
available cloud resources, while minimizing the tasks, execution delay, and cloud resource utilization.

1.2 Contributions

We propose in this article four machine learning approaches for automated task scheduling in cloud computing environments. We study and com-
pare the performance of these approaches and identify the best one in terms of minimizing the task execution cost and delay. The first scheduling
approach is based on reinforcement learning (RL). The state space of our RL network represents the amounts of available resources on the VMs
that would be hosting the tasks in terms of RAM, CPU, disk storage, and bandwidth. The action space represents the scheduling of the set of tasks
received. The reward function (cost in our case) represents the cost of executing the tasks on the VMs in terms of RAM, CPU, disk storage, band-
width, and waiting time. The second scheduling approach uses deep Q networks (DQN). DQN is a deep reinforcement learning (DRL) approach that
employs neural networks to approximate the Q-value (ie, the state-action pair that maximizes the reward function). The third scheduling approach
is based on the long short-term memory (LSTM), a recurrent neural network (RNN) architecture. The main idea of RNN-LSTM is to keep track of the
historical long-term dependencies between the resource requirements of the tasks and the available resources on the VMs to extract the impact of
each state-action pair on the resulting execution cost. Technically speaking, our RNN-LSTM unit consists of one cell and three gates. The cell is the
memory part of LSTM in the sense that it keeps track of the dependencies that exist among the components of the input (in our case the resource
requirements of the scheduling tasks and the resource specifications of the VMs). The input gate is responsible of determining the degree to which
a new value is useful enough to be kept in the memory cell. On the other hand, the forget gate determines the degree according to which an exist-
ing value should be kept in the memory cell or discarded. Finally, the output gate determines the degree to which a certain value in the memory
cell should be capitalized on to calculate the output activation function of the LSTM unit. The fourth and last scheduling approach is DRL-LSTM, a
15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RJOUB ET AL. 3 of 14

combination of DRL using DQN and LSTM. Specifically, an LSTM unit is added as the first layer of the DRL to help capture the long-term historical
dependencies in the data.
The rest of the article is organized as follows. In Section 2, we conduct a literature review on the existing task scheduling approaches in cloud
computing. We also survey the main deep and reinforcement learning-based resource management approaches. In Section 3, we describe the details
of the proposed task scheduling approaches. In Section 4, we explain the experimental environment, evaluate the performance of our scheduling
solutions, and present empirical analysis of our results compared with other three existing scheduling approaches. In Section 5, we conclude the
article.

2 RELATED WORK

2.1 Task scheduling in cloud computing

In Reference 7, the trade-off between minimizing the tasks’ makespan and reducing the energy consumption in cloud systems has been considered.
A multiobjective optimization problem is formulated based on the dynamic voltage frequency scaling system. To solve the problem, a nondomi-
nated sorting genetic algorithm (NSGA-II) is proposed, followed by an artificial neural network method to predict the appropriate VMs based on the
tasks’ characteristics. A performance and security-oriented multiagent-based cloud monitoring model has been proposed in Reference 8 with the
aim of both improving the task scheduling process and preventing unauthorized task injection and alteration. The authors of Reference 24 advo-
cate an improved version of the PSO technique to minimize the makespan and improve the resource utilization in cloud computing systems. They
propose to update the particles’ weights with the evolution of the number of iterations and to inject some random weights in the final stages of
the PSO technique. The objective is to avoid local optimum solutions being generated in the final stages of PSO. In Reference 9, the authors aim
to minimize three conflicting objectives, which are the makespan, resource utilization and execution cost. A multiobjective optimization problem
is formulated for this purpose, followed by an epsilon-fuzzy dominance-based composite discrete artificial bee colony technique to derive Pareto
optimal solutions. In Reference 10, the authors take into account the trust levels of cloud resources in the scheduling process through employ-
ing a Bayesian approach. A trust-based dynamic level scheduling algorithm is then presented to minimize the cost and time and to maintain the
security of the task execution process. A two-stage scheduling approach has been proposed in Reference 25 to reduce resource wastage in cloud
data centers during the task allocation process. A Bayesian classifier is proposed in the first stage to cluster tasks based on historical scheduling
data. Dynamic scheduling algorithms are advocated in the second stage to allocate tasks to the appropriate VMs. In Reference 11, the authors
propose a resource allocation framework in which IaaS providers might outsource their tasks to external clouds in case their own resources can-
not keep up with the incoming demands. The main challenge is to derive an allocation strategy that maximizes the providers profit while ensuring
high QoS for users tasks. The problem is formulated as an integer programming model and solved using a self-adaptive learning particle swarm
optimization-based scheduling approach. The authors of Reference 12 propose an intelligent bio-inspired approach to derive optimal task schedul-
ing solutions for IoT applications in heterogeneous multiprocess or cloud environments. A hybrid algorithm called GAACO is proposed through
combining the GA and ant colony optimization techniques to select only the best combination of tasks at each stage. In Reference 26, by leverag-
ing a game-theoretical model, the authors propose a task scheduling algorithm designed for energy management in cloud computing. They initiate
a balanced task scheduling model for computing nodes by establishing a mathematical model to deal with big data. In Reference 27, the authors
introduce a user-centric game-theoretical framework formulated as a scheduled two-stage game. The first stage uses a Stackelberg game to cap-
ture the user demand preferences, where IaaS providers are leaders and IaaS users are followers. In the second stage, the authors propose a
differential game to enhance the service ratings given by users in order to improve the provider position in the market and increase the future
users’ demand. In Reference 28, the authors present two new scheduled models to form cloud federations using evolutionary game theory and
a GA. To improve the total payoff of the federations from a generation to another, the GA explores the search space and finds a better cloud for-
mation. The genetic model is then improved by an evolutionary game that mainly considers the stability among federations. The main limitation
of these scheduling and scheduling-like approaches is that they work in an offline fashion by trying to optimize a set of parameters each time a
task is received. This entails high execution time and is inefficient for delay-critical tasks such as big data analytics on the cloud that need prompt
responses.

2.2 Resource management using deep and reinforcement learning

In Reference 29 the authors aim to minimize the cost, especially for a large task scheduling problem. Therefore, they propose an algorithm based on a
DRL architecture (RLTS) to dynamically schedule tasks with precedence relationship to cloud servers to minimize the task execution time. Moreover,
the authors use DQN, a DRL algorithm to consider the problem of complexity and high dimension. In Reference 16, a parallel Q-learning approach
is advanced to reduce the time taken to determine optimal resource allocation policies in cloud computing. A state action space formalism is also
15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 14 RJOUB ET AL.

proposed to guide Q-learning-based agents with no prior experience with finding appropriate VM allocation policies in IaaS clouds. In Reference 17,
Song et al address the problem of host load prediction to enhance the resource management and consumption in cloud-based systems. The challenge
here is to come up with accurate predictions in the light of the high load variance in cloud environments. To tackle this challenge, a LSTM-based model
is designed to predict the mean host load over consecutive future time intervals. In Reference 18, a two-phase framework is proposed to address
both VM resource allocation on servers and power management on each server. In the first phase, the authors capitalize on DRL to achieve VM
resource allocation on servers. In the second phase, LSTM and weight sharing have been employed for efficient local power management on servers.
In Reference 19, the authors capitalize on deep learning for VM workload prediction. Specifically, a deep belief network (DBN) is constructed using
multilayered restricted Boltzmann machines and regression layers. This DBN is then applied to extract high-level features from VMs’ workload data
and the regression layer is employed to predict the future VMs’ workload. These approaches focus mainly on analyzing historical resource data from
VMs to predict future workload and hence improve the VMs management and allocation processes. Nonetheless, none of them has yet addressed
the problem of automating the process of scheduling tasks in dynamic and large-scale cloud computing systems. In particular, no work has combined
LSTM with DRL for dynamic scheduling as we do in this article.

3 MACHINE LEARNING-BASED TASK SCHEDULING

In this section, we describe our machine learning-based scheduling techniques and explain their details.

3.1 Reinforcement learning

The main idea of RL30 is to teach a certain agent how to behave and adapt to the changes that take place in its environment. Specifically, we employ
the Q-learning algorithm to learn an optimized scheduling policy by considering the future decisions and evaluating the feedback from the cloud
environment. Let E={e1 ,e3 , … ,en } be a set of tasks submitted by a number of users and V = {v1 ,v2 , … ,vm } be a set of VMs. Moreover, let st represent
the cloud scheduler state at time moment t in the state space S and at be an action from the action space A at time moment t with probability

(̀s|s, a) = [st+1 = s̀ |st = s, at = a], where s̀ ∈S (̀s|s, a) = 1.31 The cloud scheduler policy in our model 𝜋(a|s), which maps states to actions, assigns
every task ei to a VM vj . The immediate reward of taking action at in state st is rt . The objective of the cloud scheduler is to find an optimum scheduling
policy that minimizes the cumulative reward value (ie, cost) for all the considered VMs and tasks. The states, actions, and reward function of our RL
solution are as follows.

• State Space
At each time t, the state st represents the current scheduling of the tasks on the VMs and each VM vj is described in terms of the available
resources (CPU, RAM, bandwidth, and disk storage). A task ei can be assigned to any VM vj which meets the resource constraints that will be
defined latter in this section.
• Action Space
The action at represents the scheduling action at time t of all the considered tasks on the available VMs. For each task, the action can be presented
as (0 or 1), which means the cloud scheduler can assign a task ei to a VM vj or not. Technically speaking, when a task ei is assigned to a VM vj , the
action space w.r.t that task is presented as a vector of size m, for example: (0,1,0,0, … ,0), which indicates the task ei is assigned to the second VM.
• Reward
The reward function is used to represent the task scheduling process efficiency. If task ei is assigned to VM vj , we define the execution cost 𝜁i,j as
an immediate individual reward. The overall reward at time t rt is the sum of all the costs. The individual reward is defined in terms of the amounts
of CPU, RAM, bandwidth, and disk storage as follows:

𝜁i,j = (𝜓i,j + 𝜑i,j ) × Pj , (1)

where 𝜑i,j is the waiting time for ei to be assigned to vj , Pj is the unit price of each VM vj , and 𝜓i,j is the execution time of ei on vj .

In our optimization scheduling model, we employ the Q-learning method to evaluate the feedback from the cloud system environment to opti-
mize future decision-making. After collecting each reward, the mean Q-value of an action a on state s following the policy 𝜋 is denoted Q𝜋 (s, a) and
the optimal value function is:

Q∗ (s, a) = min𝜋 Q𝜋 (s, a). (2)


15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RJOUB ET AL. 5 of 14

This optimal value function can be nested within the Bellman optimality equation as follows:


Q∗ (s, a) = ̀
Υ(̀s|s, a)[r + 𝛾minQ∗ (̀s, a)], (3)

where 𝛾 is the discount factor to what degree the future reward is affected by the past actions, and Υ denotes the transition probability of going
from the current state s to the next state s̀ under action a. We assume that the resource demands of each task is known upon arrival and for a task
ei to be assigned to VM vj the following conditions should be met:

KiCPU ≤ CPUtj ; KiRAM ≤ RAMtj ; KiBW ≤ BWtj ; KiDS ≤ DStj , (4)

where KiCPU , KiRAM , KiBW , and KiDS are the task CPU, RAM, bandwidth, and disk-storage requirements, respectively, and CPUtj , RAMtj , BWtj , and DStj are
the current amounts of available VM CPU, RAM, bandwidth, and disk-storage specification, respectively. After that, the cloud scheduler evaluates
Q𝜋 (s, a) for the current policy 𝜋 . Then, the policy is updated as follows,

𝜋̀ = arg minQ𝜋 (s, a). (5)


a

Finally, the main objective of this model is to determine the optimal policy 𝜋 ∗ leading to minimize the reward of any state s:

minE[Q𝜋 ∗ (s, a)], ∀s ∈ S. (6)


𝜋

3.2 Deep Q networks

The goal of DRL is to learn the optimal policy that maximizes the total discounted reward through a process of exploration and exploitation. The dis-
counted reward is considered since the aim is to maximize the future reward in the long run, rather than the immediate next reward. In this article,
we use DQN, a specific type of DRL that combines Q-learning and deep learning. In our problem, If we memorize all the Q-values in the Q-table, the
imaginable state may be more than ten thousands, and the matrix Q(s,a) would be very large. Therefore, we use a deep neural network as approxi-
mation to estimate Q(s,a) instead of computing Q-value for each state action pair (s,a). This is important to model large-scale scheduling scenarios
that are characterized by a large number of actions-state pairs. In our training process, the DRL cloud scheduler selects a random scheduling action
(ie, assigning tasks ei to VMs vj ) with a high probability to explore the effect of the unknown scheduling alternatives and obtain a better strategy.
The cloud scheduler increases the probability of choosing the action with the highest Q value during the training process, to minimize the expected
cumulative reward (execution cost), which employs the Bellman equation [Equation (3)]. Technically speaking, the cloud scheduler chooses to sched-
ule one or more waiting tasks at each time moment t subject to the conditions in Equation (4). The optimal Q-value function indicates that at time t,
each scheduling policy 𝜋 selects a valid VM from the set V to execute each task from the set E so as to minimize the overall execution cost. As in our
RL model (Section 3.1), the cloud scheduler chooses an action a∈A on a state s ∈ S depending on the behavioral policy 𝜋(a|s). The goal of our model is
to optimize 𝜋 ∗ that minimizes the expected cumulative reward (ie, cost), where the immediate reward at time t of taking action at in state st is rt . We
obtain the actual Q-value of action a, by using the state s as input of the online network, and s̀ as input to the target network to obtain the minimum
Q-value of all actions in the target network. We use the mean square error to define the loss function and the Bellman equation [Equation (3)] to
minimize the loss function Lu (𝛽u ) as follows:

̀ 𝛽̀u ) − Q(s, a|𝛽u ))2 ],


Lu (𝛽u ) = E(s,a,r,̀s) [(r + 𝛾minQ(̀s, a| (7)

where 𝛽 represents the parameters of the online network in the uth iteration, 𝛽̀ represents the parameters of the target network in the uth
iteration,32 and E(s,a,r,̀s) [.] denotes the expected value of the next reward given the current state s and action a, together with the next state s̀ .

3.3 Recurrent neural network long short-term memory

LSTM is a RNN equipped with an input gate, output gate and a forget gate along with three cells: state, output and input. Let p denote the input
gate, o denote the output gate and f denote the forget gate. Moreover, we denote by C the cell state, h the cell output, and by x the cell input. The
equations that are used to compute the LSTM gates and states are given in the following:

ft = 𝜎(Wf .[ht−1 , xt ] + bf ) (8)


15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 14 RJOUB ET AL.

pt = 𝜎(Wp .[ht−1 , xt ] + bp ) (9)

ot = 𝜎(Wo .[ht−1 , xt ] + bo ) (10)

where W are the weights for each of the gates, the b terms denote bias vectors, and 𝜎 is the logistic sigmoid function

C̃t = tanh(WC .[ht−1 , xt ] + bC ), (11)

where C̃ is the updated cell state and tanh is the hyperbolic tangent function. Given the value of the input gate activation pt , the forget gate activation
f t and the candidate state value C̃t , we can compute Ct the memory cells’ new state at time t as follows:

Ct = ft × Ct−1 + pt × C̃t (12)

ht = ot × tanh Ct (13)

The proposed RNN-LSTM prediction method has three parts: data preprocessing, model training, and model prediction. First, the collected data,
that is, (CPUtj , RAMtj , BWtj , andDStj ) are resampled to match the time moments. Second, the feature vectors of each resource type data are extracted.
By training the LSTM RNN, the error between output ot and real value is continuously reduced. Moreover, the LSTM unit can store long-term infor-
mation and is suitable for long-term training. In our model, we focus on learning the output of the next state given the current state of the model. Thus,
the model represents the probability distribution of sequences in the most general form unlike other models that assume independence between
outputs at different time steps, given latent variable states.
We use two neural network layers in our model. At each time moment t, the first LSTM-based network layer is employed for cost prediction,
where the outputs are all the VMs predicted states based on the history information. The predicted states are then used as inputs to the second
layer. Next, with the output of the second layer, the cloud scheduler assigns the tasks to the VMs. Thereafter, the selected VMs execute the policy
and transmit their data to the cloud scheduler, along with their current true states, which will be stored into the history information for future pre-
diction usage. The cloud scheduler finally receives rewards. All the VMs complete there tasks and store the cost and the resources usage for future
exploitation. We repeat this process in the next time moments, until the process converges when all the tasks get assigned with a minimum cost.

3.4 DRL-LSTM

In view of the changing usage of resources over time in cloud systems and the long-term state memory ability of LSTM network, we combine LSTM
and DQN (Sections 3.2 and 3.3) to deal with the time-dependent task scheduling problem. We integrate in this work an LSTM layer into our DRL to
store the information of the states from previous time moments and predict the next possible state. This concept of holding long-term information
to control current action is similar to controlling the historical cumulative value of error used in proportional-integral controller, while the specific
implementation and practical implication are distinct to some degree. We used the fully combined probability action selection method to make the
LSTM predictor and DQN cooperate with each other. First, the LSTM predicts next states and their probabilities (̀s|s, a). Then each state s is passed
to the DQN to find the optimum action,


a = argmin( Q × ). (14)
A

We adopt the prediction loss metric to measure the overall performance of the network, which is the dissimilarity between the predicted VM
execution cost and the true VM cost. In our cost prediction, it is worth mentioning that the cloud scheduler does not know the full information of
the VM states at each time moments for decision making. Moreover, the DRL algorithm explores all the VMs for the received tasks in the long run.
The immediate prediction loss 𝛿loss
t
at time moment t is given by:

√∑
𝛿loss
t
= |𝜁jt − 𝜁̀jt |2 , (15)
vj ∈V

where 𝜁̀jt and 𝜁jt are the predicted cost and true cost of VM vj at time moment t, respectively. We use ΓA (𝜃a ) to represent our neural network with the
input as the state s and the output as Q(s,a). Here, 𝜃a denotes the set of network weights which contains: the LSTM layer parameters (w1 ,w2 , … ,wn )
and the fully connected network layer parameters wm , where n is the number of LSTM units. In the learning time moment t, Q(St ,a) is estimated
by ΓA (𝜃a ) and the cloud scheduler selects the maximum Q value and the optimal policy if Q(St ,a) is perfectly predicted. In the exploitation phase,
15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RJOUB ET AL. 7 of 14

the cloud scheduler follows the policy, and in the exploration phase, the cloud scheduler takes actions randomly with the aim of discovering better
policies. The goal of the learning algorithm is to obtain the optimal prediction policy 𝜋 to minimize the cumulative discounted loss, which is given by

𝛾
𝜉𝜋 = E(𝛿loss |𝜋), (16)

𝛾
where 𝛾 ∈ (0, 1) is the discount factor, and 𝛿loss is the long-term prediction performance or the total discounted prediction loss.
Finally, in our solution, we model and formulate our large-scale cloud scheduling problem to adapt each of the aforementioned models. Techni-
cally speaking, we formulate the state space, action space and reward function of the RL, DQN, RNN-LSTM, and DRL-LSTM based on our specific
problem. In fact, the state space represents the current scheduling status of the tasks on the VMs in terms of the available resources (CPU, RAM,
bandwidth, and disk storage). The action space depicts whether or not the cloud scheduler assigns a task to a corresponding VM. Finally, the reward
function quantifies the waiting time, execution time, and execution cost of the tasks.

4 IMPLEMENTATION AND EXPERIMENTS

4.1 Experimental setup

4.1.1 Dataset

To carry out our experiments, we employ the Google cluster dataset, which contains data on the resource requirements and availabilities for both
tasks and VMs. The Google cluster consists of many machines that are connected via a high-speed network. The dataset includes 670 000 traces,
where approximately 40 million task events across over 12 000 machines for 30 days have been recorded.33,34 The dataset consists of the following
features: start time, end time, job ID, machine ID, task index, CPU rate, maximum CPU rate, assigned memory usage, maximum memory usage,
canonical memory usage, unmapped page cache, total page cache, disk I/O time, local disk space usage, maximum disk I/O time, cycles per instruction,
aggregation type, memory accesses per instruction, sample portion, and sampled CPU usage. In the training part of the LSTM, we first initialize the
weights of both the input and output layers as a normal distribution whose mean is 0 and standard deviation is 1. The bias for both layers is set to be
0.1. The multilayer perceptron (MLP) structure consists of three hidden layers and one LSTM layer. The three hidden layers have 512, 256, and 128
output dimensions, respectively. In addition, a rectified linear unit is used for the activation function. The output of the MLP is the input to the LSTM.

4.1.2 Validation metrics

In the first set of experiments, we aim to study and compare the training and test accuracy of the different proposed deep and reinforcement
learning-based scheduling approaches (ie, RL, DQN, RNN-LSTM, and DRL-LSTM). In both cases, the accuracy means the accuracy of predicting the
appropriate VMs to host each incoming task. It is computed with regard to a ground Truth.
For the reinforcement and DRL, our scheduling agent performs actions in the MDP environment populated by the dataset and learns from the
obtained reward at each state to select the next one using our policy function. The training is also done on the data generated by the agent. At each
state, the agent tries to find the best action that minimizes the reward. The actions that the agent made at the corresponding states in one learning
run, will become part of the training dataset for the next run. For the testing, the MDP is populated by new data.
We train the scheduling policy in an episodic setting. In each episode, we consider a varying number of VMs (from 10 to 100), and a fixed number
of tasks (78 597). The tasks arrive and are scheduled based on the policy, as described in Section 3. The episode terminates when all the tasks are
executed. During the training, we simulate fixed number of episodes (100) for each VM set to explore the probabilistic space of possible actions using
the current policy, and use the resulting data to improve the policy for all the tasks. Technically speaking, we record the state, action, and reward
information for each episode, and use these values to compute the cumulative reward of each episode. We simulate a large number of iterations
(1000 iterations) and then compute the average reward value. The minimum reward value (cost) from the ground truth corresponds to the full
accuracy. The other accuracy values are computed subsequently. For the testing, 20% of new data (from the dataset) is used. During testing, the
agent follows the learned policy by selecting the action with lowest reward value at each step.
In the second set of experiments, we measure the execution cost entailed by the different scheduling approaches in terms of CPU and RAM spent
on running the tasks. This perspective is important for both cloud providers and customers in the sense that it enables them to pick the scheduling
approach that reduces their overall monetary costs. In the third series of experiments, we compare the best identified candidate with three other
scheduling approaches, namely, SJF, RR, and improved PSO.
It is worth mentioning that in some cases, variable values of the accuracy and resource utilization metrics are obtained at some simula-
tion rounds. To deal with this problem, We made sure to run each single simulation for a large number of iterations (1000 iterations) and then
15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 of 14 RJOUB ET AL.

average over these iterations to get a stable and representative value for each corresponding metric. Our program is written in the python language,
version 3, RapidMiner Studio version 9.3, and performed in a 64-bit Windows 7 environment on a machine equipped with an Intel Core i7-6700
CPU 3.40 GHz Processor and 16 GB RAM.

4.2 Experimental results

In Figure 1, we measure the training accuracy of the different studied approaches. The main observation that can be drawn from this figure is that
increasing the number of deployed VMs leads to a modest decrease in the accuracy. This can be justified by the fact that having a larger number
of VMs to select from might increase the probability of mistakenly assigning tasks to some inappropriate VMs. The second observation that can
be drawn is that the DRL-LSTM yields the highest training accuracy (between 94.2% and 96.2%). DQN yields the second highest training accuracy
(between 91% and 96%) followed by RNN-LSTM (between 88% and 96%) and finally RL (between 82.1% and 92%).
In Figure 2, we measure the test accuracy of our four approaches. This metric is of prime importance since it can inform us about the accuracy
of the different approaches on data examples they have not seen yet, thus giving an indication about the overfitting rate of the machine learning
approaches. Again, DRL-LSTM yields the highest test accuracy (between 89.8% and 95.1%). However, different from the case of training accu-
racy, DQN and RNN-LSTM yield similar test accuracy (between 86% and 92%). This indicates that DQN (which recorded higher training accuracy
than RNN-LTSM) was overfitting the training data. Finally, as is the case for the training accuracy, RL yields the least test accuracy varying from 67%
to 86%.
In Figure 3, we present a detailed study on the total utilization cost entailed by our solutions. In this series of experiments, we vary the number of
deployed VMs from 10 to 50, and vary the number of tasks from 50 000 to 1 million. The objective is to study the scalability of the different solutions

100
DRL-LSTM
98 DQN
RNN-LSTM
96 RL
Training Accuracy (%)

94

92

90

88

86

84

82
10 20 30 40 50 60 70 80 90 100
Number of VMs FIGURE 1 Training accuracy of machine learning algorithm

100
DRL-LSTM
DQN
95 RNN-LSTM
RL

90
Test Accuracy (%)

85

80

75

70

65
10 20 30 40 50 60 70 80 90 100
Number of VMs FIGURE 2 Test accuracy of machine learning algorithm
15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RJOUB ET AL. 9 of 14

220 240
220
DRL-LSTM DRL-LSTM
DRL-LSTM
DQN DQN 220 DQN
200 200
RNN-LSTM RNN-LSTM
RNN-LSTM
RL RL
RL 200
180 180

180

Total Utilization Cost ($)

Total Utilization Cost ($)


Total Utilization Cost ($)

160 160
160
140 140
140
120 120
120
100 100
100

80 80
80

60 60
60

40 40 40
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
Number of Tasks (k) Number of Tasks (k) Number of Tasks (k)

(10 VMs) (20 VMs) (30 VMs)

240 240
DRL-LSTM DRL-LSTM
220 DQN 220 DQN
RNN-LSTM RNN-LSTM
200 RL RL
200
Total Utilization Cost ($)

180

Total Reward (Cost)


180
160
160
140
140
120
120
100

100
80

60 80

40 60
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
Number of Tasks (k) Number of Tasks (k)

(40 VMs) (50 VMs)

F I G U R E 3 Total utilization cost: We study in this figure the impact of varying both the number of tasks and number of VMs on the overall
usage cost. VMs, virtual machines

w.r.t the variation in the number of VMs and number of tasks. We notice from the figure that increasing the number of VMs leads to an increase in
the utilization cost for the different approaches. This is because deploying more VMs leads to consuming higher amounts of resources. Moreover,
we notice that DRL-LSTM shows a better scalability compared with the other solutions, followed by DQN and RNN-LSTM interchangeably and then
finally RL.
We provide in Table 1 the confusion matrix of the DRL-LSTM approach, which scored the best amongst the other reinforcement and deep
learning solutions in terms of training and test accuracy. In the matrix, for each underlying VM, the rows refer to the optimal VM that the tasks should
ideally be assigned to. The columns refer to the VMs chosen (predicted) by DRL-LSTM to receive the tasks. For example, by examining the matrix,
we notice that out of the total number of tasks that should be scheduled on VM1, DRL-LSTM has actually scheduled 94.46% (second column and
second row) of these tasks on that VM, yielding an insignificant error rate of 0.0553% (last column and second row).

TA B L E 1 Confusion matrix

VM0 VM1 VM2 VM3 VM4 VM5 VM6 VM7 VM8 VM9 VM10 Error rate

VM0 100% 0 0 0 0 0 0 0 0 0 0 0

VM1 0 94.46% 0.4% 0 1.72% 3.42% 0 0 0 0 0 0.0553%

VM2 0 0.05% 99.95% 0 0 0 0 0 0 0 0 0.0005%

VM3 0 0 0 99.92% 0 0 0.08% 0 0 0 0 0.0008%

VM4 0 0 0 0 94.08% 0 0 0 0.79% 0 5.13% 0.063%

VM5 0 0 0 0 0.39% 95.45% 0 0 0 4.16% 0 0.0477%

VM6 0 0 11.1% 0.77% 0 1.2% 86.93% 0 0 0 0 0.1503%

VM7 0 0 0 0 0 0 0 100% 0 0 0 0

VM8 0.07% 0 0 0 0 0 0 0 99.03% 0 0 0.0007%

VM9 0 0 0 0 0 0 0 0 0 100% 0 0

VM10 0 0 1.67% 0 0 0 0 3.07% 0 0 95.26% 0.0498%


15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 of 14 RJOUB ET AL.

In the next set of experiments, we are interested in comparing our scheduling solutions with the traditional scheduling approaches, namely,
SJF, RR, and PSO. Specifically, in Figures 4-6, and 7, we measure the average CPU and RAM utilization entailed on the VMs by our solutions as
well as the traditional scheduling approaches. In these figures, we fix the number of VMs to 100 and vary the number of tasks up to 100 000. We
observe that increasing the number of tasks leads to increasing the average CPU and RAM utilization. The second observation that can be taken from

100
SJF
90 RR
PSO
80 RL
RNN-LSTM
Average CPU Usage (%)

70 DQN
DRL-LSTM
60

50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10
Number of Tasks 104 FIGURE 4 Average CPU usage

2.5

2
CPU Usage(Mhz)

1.5

0.5

0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500
Time(min)
DRL-LSTM PSO RR SJF FIGURE 5 CPU usage (Mhz)

80
SJF
70 RR
PSO
RL
60 RNN-LSTM
Average RAM Usage (%)

DQN
DRL-LSTM
50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9 10
Number of Tasks 104 FIGURE 6 Average RAM usage
15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RJOUB ET AL. 11 of 14

0.045

0.04

0.035

0.03

RAM Usage(KB)
0.025

0.02

0.015

0.01

0.005

0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500
Time(min)
FIGURE 7 RAM usage (KB) DRL-LSTM PSO RR SJF

7000 7000
VM0 VM0
6000 VM1 VM1
6000
VM2 VM2
VM3 VM3
Period of Time (min)

Period of Time (min)

5000 VM4 5000 VM4


VM5 VM5
4000 VM6 4000 VM6
VM7 VM7
3000 VM8 VM8
3000
VM9 VM9
VM10 VM10
2000 2000

1000 1000

0 0
8 8

6 30 6 30
4 25 4 25
10 4 20 10 4 20
15 15
2 10 2 10
Tasks ID 5 5
0 0 CPU Usage Cost ($) Tasks ID 0 CPU Usage Cost ($)
0

(A) CPU usage cost by using DRL-LSTM


(B) CPU usage cost by using PSO

7000
7000
VM0 VM0
VM1 6000 VM1
6000
VM2 VM2
Period of Time (min)

VM3 5000 VM3


Period of Time (min)

5000 VM4
VM4
VM5 VM5
4000
4000 VM6 VM6
VM7 VM7
VM8 3000 VM8
3000
VM9 VM9
VM10 2000 VM10
2000

1000
1000

0
0
8
8
6 30
6 30
4 25
4 25 10 4
10 20
4 20
15
15 2
2 10
10
Tasks ID 5
Tasks ID 5 0 CPU Usage Cost ($)
0 CPU Usage Cost ($) 0
0

(C) CPU usage cost by using RR (D) CPU usage cost by using SJF

F I G U R E 8 CPU usage cost: We give in this figure a detailed breakdown of the CPU usage cost of our DRL-LSTM approach compared with
PSO, RR, and SJF. DRL, deep reinforcement learning; LSTM, long short-term memory; PSO, particle swarm optimization; RR, round robin; SJF,
shortest job first
15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 of 14 RJOUB ET AL.

7000 7000
VM0 VM0
VM1 VM1
6000 6000
VM2 VM2
VM3 VM3
Period of Time (min)

Period of Time (min)


5000 VM4 5000 VM4
VM5 VM5
4000 VM6 4000 VM6
VM7 VM7
VM8 VM8
3000 3000
VM9 VM9
VM10 VM10
2000 2000

1000 1000

0 0
8 8

6 30 6 30
25 25
10 4 4 20 10 4 4 20
15 15
2 10 2 10
Tasks ID 5 Tasks ID 5
0 0 RAM Usage Cost ($) 0 0 RAM Usage Cost ($)

(A) RAM usage cost by using DRL-LSTM (B) RAM usage cost by using PSO

7000 7000
VM0 VM0
VM1 VM1
6000 VM2 6000
VM2
VM3 VM3
Period of Time (min)

Period of Time (min)


5000 VM4 5000 VM4
VM5 VM5
VM6 VM6
4000 4000
VM7 VM7
VM8 VM8
3000 VM9 3000
VM9
VM10 VM10
2000 2000

1000 1000

0 0
8 8

6 30 6 30
25 25
10 4 4 20 10 4 4 20
15 15
2 10 2 10
Tasks ID 5 Tasks ID 5
0 0 RAM Usage Cost ($) 0 0 RAM Usage Cost ($)

(C) RAM usage cost by using RR (D)RAM usage cost by using SJF

F I G U R E 9 RAM usage cost: We give in this figure a detailed breakdown of the RAM usage cost of our DRL-LSTM approach compared with
PSO, RR, and SJF. DRL, deep reinforcement learning; LSTM, long short-term memory; PSO, particle swarm optimization; RR, round robin; SJF,
shortest job first

Figures 4 to 6 is that our deep and reinforcement learning-based scheduling solutions reduce the CPU and RAM consumption compared with the
traditional scheduling proposals. The reason is that our learning-based solutions consist of a learning component that intelligently predicts the
appropriate VM for each incoming task while considering a multitude of metrics at a time. On the other hand, traditional scheduling approaches
consider only a small set of metrics (eg, waiting time, and so on) when assigning tasks to VMs, which degrades the quality of their decisions. Moreover,
DRL-LSTM minimizes further the CPU and RAM utilization compared with the rest of our solutions. This outcome is a natural result of the training
and testing accuracy results explained in Figures 1 and 2.
Finally, in Figures 8 and 9, we provide a detailed description of the CPU and RAM monetary costs entailed by DRL-LSTM (the best approach
identified so far) compared with the traditional scheduling approaches (ie, PSO, RR, and SJF). The experiments have been conducted on a sample
of 11 VMs and a number of tasks fixed to 78 597. For instance, we can notice from Figure 8 that the CPU cost varies from 0 to $6.5 in the case of
DRL-LSTM (Figure 8A), from 0 to $10 in the case of RR (Figure 8C) and improved PSO (Figure 8B) and from 0 to $20 in the case of SJF (Figure 8D).
Furthermore, by examining Figure 9, we notice that the RAM cost varies from 0 to $7 in the case of DRL-LSTM (Figure 9A), from 0 to $15 in the
case of the improved PSO (Figure 9B), from 0 to $20 in the case of RR (Figure 9C), and from 0 to $25 in the case of SJF (Figure 9D). Overall, we can
conclude the DRL-LSTM solution achieves the best results in terms of training and test accuracy, total utilization cost, and CPU and RAM utilization
compared with both the other three learning-based approaches and the traditional scheduling solutions.

5 CONCLUSION

We proposed four automated task scheduling approaches in cloud computing environments using deep and reinforcement learning. The compar-
ison of the results of these approaches revealed that the most efficient approach is the one that combines DRL with LSTM to accurately predict
15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
RJOUB ET AL. 13 of 14

the appropriate VMs that should host each incoming task. Experiments conducted using real-world dataset from GCP pricing and Google cluster
resource and task requirements revealed that this solution minimizes the CPU utilization cost up to 67% compared with the SJF, up to 35% com-
pared with both the RR and improved PSO approaches. Besides, our solution reduces the RAM utilization cost by 72% compared with the SJF, by
65% compared with the RR, and by 31.25% compared with the improved PSO.
Although very promising results are achieved, the work has still some limitations. The first issue to be considered is the high computation time
of the DRL-LSTM approach which is caused by the fact that the LSTM layer needs to go back and check the full history of the states. To address
this problem, we plan to investigate the application of distributed and federated learning techniques which help reduce the time and overhead of
applying heavy machine learning algorithms. Moreover, the proposed solution does not take into account the reliability of the selected VMs, which
increases the risk of assigning the tasks to malicious or poorly performing nodes. To address this problem, we plan to include the performance and
trust metrics of the VMs in the reward function of the different learning models to better learn the behavior of VMs and avoid selecting the bad ones.

ACKNOWLEDGMENTS
We would like to thank the Natural Sciences and Engineering Research Council of Canada (NSERC), Discovery Grant program and the Department
of National Defence of Canada (DnD), Innovation for Defence Excellence and Security (IDEaS) program for their financial support.

ORCID
Gaith Rjoub https://orcid.org/0000-0002-7282-0687
Jamal Bentahar https://orcid.org/0000-0002-3136-4849

REFERENCES
1. Wahab OA, Bentahar J, Otrok H, Mourad A. Resource-aware detection and defense system against multi-type attacks in the cloud: repeated Bayesian
stackelberg game. IEEE Trans Depend Secure Comput. 2019. https://doi.org/10.1109/TDSC.2019.2907946.
2. Rjoub G, Bentahar J, Wahab OA. BigTrustScheduling: trust-aware big data task scheduling approach in cloud computing environments. Future Generat
Comput Syst. 2020;110:1079–1097. https://doi.org/10.1016/j.future.2019.11.019.
3. Singh S, Chana I. QoS-aware autonomic resource management in cloud computing: a systematic review. ACM Comput Surv. 2016;48(3):42.
4. Ghomi EJ, Rahmani AM, Qader NN. Load-balancing algorithms in cloud computing: a survey. J Netw Comput Appl. 2017;88:50-71.
5. Bhoi U, Ramanuj PN. Enhanced max-min task scheduling algorithm in cloud computing. Int J Appl Innovat Eng Manag. 2013;2(4):259-264.
6. Chen H, Wang F, Helian N, Akanmu G. User-priority guided min-min scheduling algorithm for load balancing in cloud computing. Paper presented at:
Proceedings of the National Conference on Parallel Computing Technologies; 2013:1-8.
7. Sofia AS, GaneshKumar P. Multi-objective task scheduling to minimize energy consumption and makespan of cloud computing using NSGA-II. J Netw Syst
Manag. 2018;26(2):463-485.
8. Grzonka D, Jakobik A, Kołodziej J, Pllana S. Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance
and security. Future Generat Comput Syst. 2018;86:1106-1117.
9. Gomathi B, Krishnasamy K, Balaji BS. Epsilon-fuzzy dominance sort-based composite discrete artificial bee colony optimisation for multi-objective cloud
task scheduling problem. Int J Bus Intell Data Mining. 2018;13(1-3):247-266.
10. Wang W, Zeng G, Tang D, Yao J. Cloud-DLS: dynamic trusted scheduling for cloud computing. Exp Syst Appl. 2012;39(3):2321-2329.
11. Zuo X, Zhang G, Tan W. Self-adaptive learning PSO-based deadline constrained task scheduling for hybrid IAAS cloud. IEEE Trans Automat Sci Eng.
2014;11(2):564-573.
12. Basu S, Karuppiah M, Selvakumar K, et al. An intelligent/cognitive model of task scheduling for IoT applications in cloud computing environment. Future
Generat Comput Syst. 2018;88:254–261.
13. Bataineh AS, Mizouni R, Bentahar J, El Barachi M. Toward monetizing personal data: a two-sided market analysis. Future Generat Comput Syst.
2020;111:435–459.
14. Rjoub G, Bentahar J. Cloud task scheduling based on swarm intelligence and machine learning. In: Younas M, Aleksy M, Bentahar J, eds. Proceedings of the
5th IEEE International Conference on Future Internet of Things and Cloud, FiCloud. Prague, Czech Republic: IEEE; 2017:272-279.
15. Peng Z, Lin J, Cui D, Li Q, He J. A multi-objective trade-off framework for cloud resource scheduling based on the deep Q-network algorithm. Cluster
Comput. 2020. https://doi.org/10.1007/s10586-019-03042-9.
16. Barrett E, Howley E, Duggan J. Applying reinforcement learning towards automating resource allocation and application scalability in the cloud. Concurr
Comput Pract Exp. 2013;25(12):1656-1674.
17. Song B, Yu Y, Zhou Y, Wang Z, Du S. Host load prediction with long short-term memory in cloud computing. J Supercomput. 2018;74(12):6554-6568.
18. Liu N, Li Z, Xu J, et al. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. Paper presented
at: Proceedings of the IEEE 37th International Conference on Distributed Computing Systems; 2017:372-382.
19. Qiu F, Zhang B, Guo J. A deep learning approach for VM workload prediction in the cloud. In: Chen Y, ed. 17th IEEE/ACIS International Conference on
Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), . Shanghai: IEEE/ACIS; 2016:319-324. http://doi.org/10.
1109/SNPD.2016.7515919.
20. Wahab OA, Kara N, Edstrom C, Lemieux Y. MAPLE: a machine learning approach for efficient placement and adjustment of virtual network functions.
J Netw Comput Appl. 2019;142:37-50.
21. Wahab OA, Cohen R, Bentahar J, Otrok H, Mourad A, Rjoub G. An endorsement-based trust bootstrapping approach for newcomer cloud services. Inf
Sci. 2020;527:159–175. https://doi.org/10.1016/j.ins.2020.03.102.
22. Rohoden K, Estrada R, Otrok H, Dziong Z. Stable femtocells cluster formation and resource allocation based on cooperative game theory. Comput
Commun. 2019;134:30-41.
15320634, 2021, 23, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cpe.5919 by Istinye Universitesi, Wiley Online Library on [11/02/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 of 14 RJOUB ET AL.

23. Rjoub G, Bentahar J, Wahab OA, Bataineh AS. Deep smart scheduling: a deep learning approach for automated big data scheduling over the cloud.
In: Younas M, Awan I, Hara T, eds. Proceedings of the 7th International Conference on Future Internet of Things and Cloud, FiCloud. Istanbul, Turkey: IEEE;
2019:189-196. https://doi.org/10.1109/FiCloud.2019.00034.
24. Luo F, Yuan Y, Ding W, Lu H. An improved particle swarm optimization algorithm based on adaptive weight for task scheduling in cloud computing. Paper
presented at: Proceedings of the International Conference on Computer Science and Application Engineering; vol 142, 2018:1-5.
25. Zhang PY, Zhou MC. Dynamic cloud task scheduling based on a two-stage strategy. IEEE Trans Automat Sci Eng. 2018;15(2):772-783.
26. Yang J, Jiang B, Lv Z, Choo K-KR. A task scheduling algorithm considering game theory designed for energy management in cloud computing. Future
Generat Comput Syst. 2020;105:985-992.
27. Taghavi M, Bentahar J, Otrok H. Two-stage game theoretical framework for IaaS market share dynamics. Future Generat Comput Syst. 2020;102:173-189.
28. Hammoud A, Mourad A, Otrok H, Wahab OA, Harmanani H. Cloud federation formation using genetic and evolutionary game theoretical models. Future
Generat Comput Syst. 2020;104:92-104.
29. Dong T, Xue F, Xiao C, Li J. Task scheduling based on deep reinforcement learning in a cloud manufacturing environment. Concurr Comput Pract Exp.
2020;32(11):e5654. https://doi.org/10.1002/cpe.5654.
30. Mao H, Alizadeh M, Menache I, Kandula S. Resource management with deep reinforcement learning. Ford B, Snoeren AC, Zegura EW, Proceedings of the
15th {ACM} Workshop on Hot Topics in Networks, HotNets 2016. Atlanta, GA: ACM; 2016;50-56. https://doi.org/10.1145/3005745.3005750.
31. Lu L, Jiang Y, Bennis M, Ding Z, Zheng FC, You X. Distributed edge caching via reinforcement learning in fog radio access networks. Paper presented at:
Proceedings of the IEEE 89th Vehicular Technology Conference (VTC2019-Spring); 2019:1-6.
32. Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529-533.
33. Wilkes J. More Google cluster data Google research blog; 2011. http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html.
34. Reiss C, Wilkes J, Hellerstein J. Google cluster-usage traces: format + schema. Technical Report: Google Inc.Mountain View, CA; 2011. Revised
2014-11-17 for version 2.1. https://github.com/google/cluster-data.

How to cite this article: Rjoub G, Bentahar J, Abdel Wahab O, Saleh Bataineh A. Deep and reinforcement learning for automated task
scheduling in large-scale cloud computing systems. Concurrency Computat Pract Exper. 2021;33:e5919. https://doi.org/10.1002/cpe.5919

You might also like