Future Generation Computer Systems: Anubhav Choudhary Indrajeet Gupta Vishakha Singh Prasanta K. Jana
Future Generation Computer Systems: Anubhav Choudhary Indrajeet Gupta Vishakha Singh Prasanta K. Jana
Future Generation Computer Systems: Anubhav Choudhary Indrajeet Gupta Vishakha Singh Prasanta K. Jana
highlights
• Proposed an efficient hybrid scheme of GSA and HEFT, called HGSA for workflow scheduling.
• Systematic derivation of fitness function based on makespan and cost.
• Novelty in introducing a proficient elimination strategy of inferior agents.
• Demonstration of better performance through simulation results and statistical test ANOVA.
article info a b s t r a c t
Article history: Workflow Scheduling in cloud computing has drawn enormous attention due to its wide application
Received 28 February 2017 in both scientific and business areas. This is particularly an NP-complete problem. Therefore, many
Received in revised form 12 October 2017 researchers have proposed a number of heuristics as well as meta-heuristic techniques by considering
Accepted 3 January 2018
several issues, such as energy conservation, cost and makespan. However, it is still an open area of research
Available online 8 January 2018
as most of the heuristics or meta-heuristics may not fulfill certain optimum criterion and produce near
optimal solution. In this paper, we propose a meta-heuristic based algorithm for workflow scheduling that
Keywords:
Gravitational Search Algorithm considers minimization of makespan and cost. The proposed algorithm is a hybridization of the popular
Workflow scheduling meta-heuristic, Gravitational Search Algorithm (GSA) and equally popular heuristic, Heterogeneous
Cost Earliest Finish Time (HEFT) to schedule workflow applications. We introduce a new factor called cost time
Makespan equivalence to make the bi-objective optimization more realistic. We consider monetary cost ratio (MCR)
Cost time equivalence and schedule length ratio (SLR) as the performance metrics to compare the performance of the proposed
algorithm with existing algorithms. With rigorous experiments over different scientific workflows, we
show the effectiveness of the proposed algorithm over standard GSA, Hybrid Genetic Algorithm (HGA)
and the HEFT. We validate the results by well-known statistical test, Analysis of Variance (ANOVA). In all
the cases, simulation results show that the proposed approach outperforms these algorithms.
© 2018 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.future.2018.01.005
0167-739X/© 2018 Elsevier B.V. All rights reserved.
A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26 15
[2,3] and therefore this has become the recent trend of research in • Validation of the performance through the statistical test
cloud computing. ANOVA.
Heterogeneous Earliest Finish Time (HEFT) [4] is an efficient
heuristic proposed for task scheduling in heterogeneous multipro- The rest of the paper is organized as follows. Related works are
cessor which is also used for cloud computing [5–7]. This algorithm stated in Section 2. Section 3 explains the application and cloud
maps each task arranged in a priority order to a VM for which model. Section 4 describes the terminologies used in the paper
the earliest finish time is minimum. It should be noted that it is and the problem statement. Section 5 presents the proposed work
essentially a single objective algorithm which can only optimize with an illustration. Performance metrics, experimental results,
the makespan. The Gravitational Search Algorithm (GSA) [8] is and comparison are discussed in Section 6 followed by Section 7
a popular meta-heuristic approach which utilizes the concept of which concludes the paper.
the law of gravitation to find the near-optimal solution. The algo-
rithm starts with a set of random particles, where each particle 2. Related works
represents a solution and the mass of each particle is calculated
using a fitness function based on the application. Particles with Many heuristic and meta-heuristic based algorithms have been
higher fitness value have the higher mass and, hence, it can exert proposed for workflow scheduling in cloud computing. In this
more force to attract other particles towards it. Eventually, all section, we present a short review of some of the works that are
the particles converge towards an optimal point. It is capable relevant to our proposed scheme.
of obtaining a global optimum faster than other meta-heuristic HEFT [4] is a popular heuristic which was initially developed
algorithms and, hence, has a higher convergence rate. Moreover, it for task scheduling in heterogeneous multiprocessor systems. It is
provides better results than the Central Force Optimization (CFO) well-known that HEFT performs better than many other heuris-
and Particle Swarm Optimization (PSO) as demonstrated in [8]. tics such as [12,13] for task scheduling. However, it considers
In this paper, we propose a meta-heuristic based algorithm the minimization of makespan only. An extension of HEFT called
for workflow scheduling problem which is a hybrid of the HEFT Pareto Optimal Scheduling Heuristic (POSH) was proposed by Su
and the GSA. Specifically, we address the following workflow et al. [5] for workflow scheduling in cloud, to minimize makespan
scheduling problem. Given a workflow consisting of a set of tasks and cost of execution. The POSH produces acceptable solution.
T = {t1 , t2 , . . . , tn } with their computational load and precedence Nevertheless, the solution is derived from a constricted search
constraints, and also given a set of VMs, V = {v1 , v2 , . . . , vm }, our space and thus, it may miss on the better solutions. An energy
objective is to map all the tasks to the available VMs so that the efficient scheduling with deadline constraint for heterogeneous
entire workflow can be executed in minimum time and minimum cloud environment was proposed in [14]. In this work, a new VM
computational cost. The proposed algorithm is presented with an scheduler is developed which is shown to reduce energy consump-
efficient agent representation and systematic derivation of fitness tion in the execution of workflows. They claimed to achieve up
function. The algorithm is extensively simulated using the scien- to 20% reduction in energy requirement and improvement in the
tific workflows of different sizes and is shown to produce better processing capacity by 8%. Fard et al. [15] proposed another heuris-
results as compared to other related algorithms such as Hybrid tic called multi-objective list scheduling (MOLS), which provides a
Genetic Algorithm (HGA), GSA and HEFT. We use ANOVA [9], a sta- general framework for multi-objective static workflow scheduling.
tistical test to validate the simulation results. This test determines It supports four objectives namely, makespan, cost, reliability, and
if a given result set has significant difference statistically with other energy. Based on selected objectives it provides the execution plan.
set of results. Abrishami et al. [16] adopted the Partial Critical Path (PCP) for
Many algorithms have been proposed for workflow scheduling workflow scheduling and designed two algorithms, a one-phase
in cloud computing which are based on meta-heuristic approaches. algorithm which is called IaaS Cloud Partial Critical Paths (IC-PCP)
For instance, Rodriguez et al. [10] have proposed a PSO based and a two-phase algorithm called IaaS Cloud Partial Critical Paths
algorithm with the objective of minimizing the execution cost with Deadline Distribution (IC-PCPD2). Here, homogeneous cloud
while meeting the deadline constraints. Similarly, HGA has been environment is assumed.
presented in [3] that also has a single objective, i.e., minimizing Recently, Casas et al. [17] have proposed balanced and
the makespan. Many other meta-heuristic approaches have been file reuse-replication scheduling (BaRRS) algorithm to schedule
developed for workflow scheduling, the survey of which can be workflows that are based on two optimization constraints, i.e.,
found in [11]. However, our approach is different from all such makespan and cost. They have also focused on finding an optimal
approaches and has the following novelty. We consider parameters number of VMs required for a given workflow. However, it has
such as communication bandwidth, output data size of each task, large computational overhead. Panda et al. [18] have developed a
VM boot time, VM shutdown time and performance variability of normalization based task scheduling for a heterogeneous multi-
VMs in order to create a more realistic environment for scheduling. cloud environment. This technique provides a way to schedule
Most of these features are absent in the existing works. Moreover, tasks over multiple cloud providers. In another work [19], they
our approach deals with two objectives in contrast to single objec- have proposed a modification of min–min algorithm with uncer-
tive in many existing algorithms. Hybridization of GSA with HEFT is tainty parameter for scheduling tasks in a heterogeneous multi-
also novel in the sense that there is full exploitation of the benefits cloud environment. Gupta et al. [20] have also reported a work-
of both these algorithms. flow scheduling algorithm for the multi-cloud environment. How-
Our contribution can be summarized as follows. ever, this work focuses more on compute intensive workflows for
scheduling.
• A hybrid algorithm based on GSA and HEFT to minimize Meta-heuristics are well-known techniques to obtain near op-
makespan and total computational cost. timal solution. For instance, Pandey et al. [21] proposed a PSO
• An efficient agent representation and derivation of system- based workflow scheduling algorithm for cost optimization. It is
atic fitness function. designed to consider computational cost and data transmission
• Introduction of cost time equivalence and a procedure for cost to provide an execution plan such that overall cost is min-
eradicating the inferior agents. imized. However, this approach has been tested only on limited
• Demonstration of better performance of the algorithm workflow applications. Jena et al. [22] proposed a multi-objective
through extensive simulation and comparison with other nested Particle Swarm Optimization (TSPSO) algorithm for work-
heuristic/meta-heuristic based approaches. flow scheduling to optimize energy as well as processing time.
16 A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26
Table 1
Notations and definitions.
Notation Definition
N Population size.
n Number of the tasks in a given workflow.
ti ith task.
m Number of available VMs.
vj jth VM.
Xi It represents the ith agent.
Xbest It represents the best agent known so far.
α It determines the weight of makespan and cost for calculation fitness.
β It is the cost makespan equivalence factor. It is a part of SLA and its
value depends on the priority and urgency of the application.
fiti Fitness value of the ith agent based on its makespan and cost.
Mi Mass of the ith agent.
σ A random variable used in the pricing model.
Vcbase Base price based on slowest VM
digvj Performance degradation of VM vj .
γ A small constant, which regulates the declination of gravitational
constant.
δ Threshold mass for replacing the inferior agents.
with other events. Therefore, makespan can be mathematically (6) respectively. Therefore, it is wise to minimize their linear com-
formulated as follows. bination. The workflow scheduling problem can be formulated as
m follows.
Makespan = VM-boot-time + max(VM-time[i])
i=1
Minimize z = α × Makespan + (1 − α ) × MEcost
+ VM-shutdown-time (2) m
∑
subject to (i) Bi,j = 1, i = 1, 2, 3, . . . , n (7)
Cost Calculation:
j=1
In our assumed model, a cloud server consists of VMs with
varying computational capacity for different types of workload. (ii) 0 ≤ α ≤ 1
vj
ETti is the execution time of the task ti on VM vj as defined in Eq. The constraint (i) indicates that any task of the workflow can be
(1). Let τ be the unit chargeable time for which charge of execution assigned to one and only one VM and the constraint (ii) limits the
of any task will be accounted. Let Vcbase be the base price charged
range of α which balances makespan and total cost.
for the slowest VM, then as per the exponential pricing model [5],
the cost of execution of the task ti on VM vj is denoted as, cost(ti , vj )
and is formulated as follows. 5. Proposed work
⌈ vj ⌉
ETti CPU cycles of vj
( )
cost(ti , vj ) = σ × × Vcbase × exp (3) As our algorithm is a hybrid of GSA and HEFT, we first provide
τ slow est CPU cycle
a brief description about both of these algorithms as follows.
where, σ is a random variable used to generate different combina-
tion of VM pricing and capacity. Let Bi,j be a boolean variable, such 5.1. Overview of heterogeneous earliest finish time (HEFT)
that
1, if task ti is assigned to vj
{ }
Bi,j = (4) The main idea behind HEFT [4] is to schedule tasks in such a way
0, otherwise that earliest finish time (EFT ) is minimized for all the tasks. HEFT
Therefore, the total cost of execution for a workflow is defined as is executed in two phases which is described as follows.
n m
∑ ∑ Phase 1: Calculating priority of tasks
Total Cost = Bi,j × cost(ti , vj ) (5)
In this phase, the priority of each task is calculated
i=1 j=1
using average execution time and average communi-
However, the values of makespan and cost may have different cation time. The priorities are calculated in a bottom -
scales. So, the value of one attribute may overwhelm the value up approach. The sequence of tasks generated based on
of the other attribute, in the case of agent evaluation. Thus, it higher to lower priority value, satisfying the precedence
is not valid to perform a linear formulation directly using the constraints of the given workflow. The priority of task ti
actual values. One of the institutive approaches is to normalize
is given by
both makespan and cost in order to scale these values in the same
range. Normalization can be done using any of the well-known
pri(ti ) = wi + max
( )
ci,j + pri(tj ) (8)
methods such as min–max normalization, Z-score normalization, tj ∈succ(ti )
etc. However, this solution has a major drawback, i.e, if there is a
change in global minimum and maximum values of makespan and where, wi is the average execution time of task ti on
cost, it may lead to the variation in the relative rank of agents in available VMs and ci,j is the average communication
two consecutive iterations. To resolve this issue, we use makespan time between task ti and task tj .
equivalent of total cost (MEcost ) calculated using Eq. (6) instead of Phase 2: Mapping tasks to VMs
total cost. This is the main phase of the algorithm, where actual
mapping of task to VM is performed according to the
MEcost = β × Total Cost (6)
priority of the tasks. The task with the highest priority
Problem Statement: is scheduled first, by calculating earliest start time (EST )
The objective of the proposed work is to minimize makespan and EFT on all available VMs.
and makespan equivalent of total cost as given in Eqs. (2) and
18 A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26
Table 2
Example of a Mapping / Agent.
Task t1 t2 t3 t4 t5 t6 t7 t8
VM 1 3 2 1 5 1 3 1
xdi
where, represents the VM assigned to the task td . Note that is xdi
an integer which lies in interval [1, m]. Table 2 shows an example
of an agent for 8 tasks on 5 VMs. Here, task t1 is mapped with VM
v1 , i.e., task t1 will be executed on VM v1 , while preserving the
precedence constraints.
5.2. Overview of gravitational search algorithm (GSA) Remark. The problem of workflow scheduling is to minimize the
linear combination of the makespan and the makespan equivalent
of total cost as described in Section 4.2. As our fitness value is
The GSA is based on the law of gravity [28] introduced by the reciprocal of the same, so the higher fitness value would be
Rashedi et al. [8]. It is a population-based search algorithm where desirable.
each agent is considered as a particle (so, we use agent and particle
In the proposed algorithm, we apply min–max normalization in
interchangeably throughout the paper) and its fitness value is
order scale the fitness values of all the agents in population in the
considered as its mass. Each particle represents a solution of a given range [0, 1] to get the mass of each agent. So, mass of the ith agent
problem. The main idea is that the heavier particles, i.e., superior is given by
solutions do not move much as compared to lighter particles, i.e., ( )
fiti − min fitj
inferior solutions. All particles apply force on every other particle. j=1 to N
As each particle has some mass, its acceleration and velocity can be Mass(Mi ) = ( ) ( ) (11)
max fitj − min fitj
calculated using the net force. Using calculated velocity, the new j=1 to N j=1 to N
position of the particle can be found. When algorithm terminates, We use Eq. (11) as the fitness for our proposed work which is
the particle with highest mass provides the near optimal solution. used in the simulations.
The working of the GSA is depicted in Fig. 2.
To use this algorithm in scheduling, we first identify the search 5.5. Proposed scheduling technique
space and particle representation. Then we initialize the popu-
lation randomly. Now in each iteration, we calculate the fitness The proposed HGSA algorithm works in two phases. The first
phase focuses only on optimization of makespan. In the second
value of all particles using the fitness function defined as per the
phase, it attempts to optimize the cost while trying to minimize
optimization constraints. Based on the best and worst particles
the fitness value which is calculated from both makespan and cost.
identified, we calculate the mass to update the position of each The result from the first phase guides the particle movement in
particle. The gravity constant that is used to calculate the velocity GSA, which is considered in the second phase of the proposed work.
and the position, is also updated in each iteration. We repeat all the This improves the result as compared to GSA with random initial
steps till the algorithm attains a certain termination criterion. particles. We use GSA by incorporating HEFT with following steps.
A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26 19
Step 1: The initial population is seeded with the output of the 5.7. Algorithm
HEFT algorithm. The HEFT heuristic provides guidance
to GSA algorithm that improves the overall performance We start by generating initial population by random mapping
of the proposed algorithm. It helps to generate better of the tasks on VMs in step 1 of the proposed algorithm, followed
solutions in fewer number of iterations. by seeding of the results of HEFT into the population in step 2. Once
Step 2: The best particle identified from current generation based the initial population is ready, a set of iterative steps is applied
on the fitness function is preserved. This is done to ensure to each agent of the population to get the final result as per step
that the best agent does not get degraded in the future 3 through step 16. The first step of iteration is to calculate the
generation.
gravitational constant in step 4. Then in step 5, we compute the
Step 3: The agents having mass less than threshold mass (δ ) is
fitness value of each agent using Eq. (10). Note that Eq. (10) requires
removed from the current population as they have very
the values of both cost and makespan. Algorithms 2 and 3 can be
little or no contribution in updating the population. In
utilized for calculating these values. Based on fitness value, we
place of all the removed agents, new agents are added to
identify the best and the worst agents in step 6 for calculating the
the population generated with the help of the best agent
identified so far. This improves the overall fitness of the mass of all the agents in step 7. In step 8, we update the position
population. of each agent by calculating the net force, net acceleration and
velocity. In remaining steps, we replace the inferior agents by the
new agents generated with the help of best agent known so far. The
5.6. Position update of particle new agent is generated by mapping one of the tasks to a randomly
selected VM and rest of the mapping remains same as the best
Let us consider a system of N agents. We define the position of agent.
the ith agent as follows.
Algorithm 1 : Proposed Workflow Scheduling Algorithm
Xi = x1i , x2i , x3i , . . . , xni for i = 1, 2, 3, . . . , N
[ ]
(12) Input: Workflow Application (W ) and Cloud Server Specification
(CSS)
where, xdi
shows the position of ith agent in the dth dimension.
Let Mi (k) and G(k) be the mass of ith agent and the gravitational Output: Task mapping with VMs (M)
constant respectively in kth iteration. We can define the force
acting on the ith agent by jth agent for the kth iteration as follows. 1 Initialize population X with N randomly generated agents.
Mi (k) × Mj (k) 2 Replace one of the agent by mapping generated by HEFT
Fid,j (k) = G(k) × × xdj (k) − xdi (k)
( )
(13) 3 for k = 1 to MAX_ITERATION do
Ri,j (k) + ϵ
4 Compute gravitational constant G(k) using Eq. (19)
where, ϵ is a very small constant and Ri,j (k) is the Euclidean dis- 5 Compute fitness value fiti for i = 1, 2, 3, ..., N using Eq. (10)
tance between ith and jth agent in the kth iteration. Ri,j (k) is defined 6 Identify best and worst agent based on calculated fitness
as value.
7 Compute mass Mi for i = 1, 2, 3, ..., N using Eq. (11)
Ri,j (k) = Xi (k) · Xj (k))2 (14) 8 Update velocity and position of each agent using Eq. (13) to
We suppose that the total force that acts on the ith agent in the (18)
dth dimension be a randomly weighted sum of the forces exerted 9 for i = 1 to N do
in the dth dimension from the other agents. Then, 10 if Mi < δ then
11 Pos = a random integer from interval [1, n]
N
∑ 12 xdi = xdbest for d = 1, 2, 3, ..., n.
Fid (k) = randj × Fid,j (k) (15)
13 xPos
i = a random integer from interval [1, m]
j=1,j̸ =i
14 end if
where randj is a random number that lies in the interval [0, 1]. By 15 end for
the law of motion [28], the acceleration of the ith agent in the dth 16 end for
dimension in kth iteration is given by, 17 Find M corresponding to the best agent based on the fiti for i =
1, 2, 3, ..., N
Fid (k)
adi (k) = (16) 18 return M
Mi (k)
Furthermore, the next velocity of the agent is considered as a Algorithm 2 : Cost-Calculation
fraction of its current velocity added to its acceleration. Therefore,
Input: Workflow Application (W ), Mapping (M), Cloud Server
its position and velocity can be calculated as follows.
Specification (CSS)
v eldi (k + 1) = randi × v eldi (k) + adi (k) (17) Output: Cost value (Cost)
Algorithm 3 : Makespan-Calculation
Input: Workflow Application (W ), Mapping (M), Cloud Server Specification (CSS)
Output: Makespan value (Makespan)
1 for each vi ∈ V do
2 VM_time[i] = 0
3 end for
4 for each task ti ∈ W in topological order do
5 if ti .ParentCount ! = 0 then
6 Parent_finishtime = max (Tast_actual_finsh_time[k])
tk ∈pred(ti )
7 end if
8 if ti .ChildCount ! = 0 then
9 Transfer_time = 0
10 for each task tj where tj ∈ succ(ti ) and M [i] ̸ = M [j] do
11 if output data of ti task is not transferred to vM [j] then
t .Outputdatasize
12 Transfertime = Transfertime + i Bandw idth
13 end if
14 end for
15 end if
vM [i]
16 Execution_time ETti = Capacity(v Load(t i)
)×(1−deg vM [i] )
M [i]
17 Actual_start_time = max(Parent_finishtime, VM_time[M[i]])
18 Task_actual_finish_time[i] = Actual_start_time + Execution_time + Transfer_time
19 VM_time[M [i]] = Task_actual_finish_time[i]
20 end for
21 Makespan = VM_boot_time + max ( VM_time[i]) + VM_shutdown_time
vi ∈Cloud
22 return Makespan
Table 3
Parameters used in the illustration.
Parameter Value
Number of VMs 4
Computational power of all VMs 2.0, 3.5, 4.5 and 5.5 MIPS
Network bandwidth 1 MBps
Boot time and shutdown time of VM 0.5 sec
Performance variance of VM 24%
MAX _ITERATION 10
Population size (N) 100
Gravitational constant (G0 ) 5
Weight of makespan and cost (α ) 0.5
Cost time equivalence (β ) 1
Small constant used in gravity (γ ) 0.3
Mass threshold for inferior agents (δ ) 0.1
Small constant used in force (ϵ ) 10
Table 4
Initial population of agents.
Agent 1 4 1 1 2 3 4 4 2 3 1 2 3 2 1 1 4
Agent 2 2 3 1 1 1 2 3 4 4 1 1 2 1 3 3 3
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
Agent N 1 2 1 1 1 3 3 4 4 1 2 2 2 2 1 1
Table 5 Table 6
Task priority and task mapping by HEFT. Iteration wise specification of the best agent.
Task Priority Virtual machine Iteration Makespan Cost Fitness
t1 107.71 3 1 36.575226 50.313644 2.250 × 10−2
t2 103.37 2 2 32.338966 49.307339 2.391 × 10−2
t3 107.74 4 3 32.536972 48.014904 2.422 × 10−2
t4 103.24 4 4 32.642235 47.711269 2.428 × 10−2
t5 94.38 2 5 32.642235 47.711269 2.428 × 10−2
t6 94.44 3 6 32.642235 47.711269 2.428 × 10−2
t7 90.26 2 7 32.642235 47.711269 2.428 × 10−2
t8 90.20 4 8 32.642235 47.711269 2.428 × 10−2
t9 90.05 3 9 32.642235 47.711269 2.428 × 10−2
t10 90.08 2 10 32.642235 47.711269 2.428 × 10−2
t11 90.06 4
t12 90.08 4
t13 77.89 4
6.2. Performance metrics
t14 77.57 4
t15 2.63 4
t16 0.28 4 We normalize the makespan and monetary cost similar to that
in [5] and call them the schedule length ratio (SLR) and monetary
cost ratio (MCR) of tasks, as follows.
in each iteration based on calculated fitness value. The resultant Makespan
SLR = m
(20)
schedule is shown in Table 7 having makespan of 32.64 sec and ∑ { vj
}
min ETti
cost as $47.71. ti ∈CP
j=1
Total Cost
6. Experimental results and comparison MCR = (21)
m
min cost(ti , vj )
∑ { }
ti ∈CP
j=1
This section presents the simulation results of the proposed
algorithm and its comparison with three workflow scheduling The denominator is the summation of the minimum execution
algorithms including the standard GSA based approach, HEFT and time and monetary cost of the tasks on the critical path (CP)
HGA as follows. Note that, for the sake of comparison, we convert without communication cost. For a given task graph, an algorithm
the single objective of the HGA (minimization of makespan) into that produces a scheduling plan with lower SLR and lower MCR
bi-objective, keeping all the constraints same as the proposed value is more effective.
algorithm. We also calculate the normalized fitness value for easy compar-
ison and visualization of the overall quality of the results. We use
max-normalization to normalize the absolute fitness value as cal-
6.1. Experimental setup culated using Eq. (10). After applying normalization, the maximum
value is mapped to one and the rest of the values lie in the interval
The simulations were carried out using C++ coding environment (0,1]. Mathematically, max-normalization is defined as
on an Intel(R) Core(TM) i5-2540M CPU with 2.60 GHz and 4GB xi
x̂i = (22)
RAM running on Linux platform. The specifications of the cloud max (xj )
j=1 to N
environment, as well as the parameters used for evaluation of our
proposed algorithm, are given in Table 8. where, x̂i is the normalized value for xi .
22 A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26
Table 7
Resultant schedule of montage with 16 tasks.
Task t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16
VM 3 1 4 4 2 3 1 1 3 2 4 4 4 4 4 4
Fig. 5. (a) CyberShake (b) Epigenomics (c) Inspiral (d) Montage (e) SIPHT. (For interpretation of the references to color in this figure legend, the reader is referred to the web
version of this article.)
(a) Normalized fitness for small sized workflows. (b) Normalized fitness for medium sized workflows.
Table 9 workflows of various sizes. Tables 10–14 shows the results for each
Category of the workflow based on the number of tasks. workflows of 2000 tasks.
Number of tasks Category As we can see that, for all workflows we have F statistical > F
24 to 60 Small critical. Thus, we can reject the null hypothesis. Therefore, means
100 to 400 Medium of all the groups are significantly different. This implies that the
800 to 2000 Large
performance of HGSA is better and consistent than HGA and GSA.
7. Conclusion
The test was performed to compare the standard GSA, hybrid
In this paper, we have presented a hybrid gravitational search
GA and the hybrid GSA. In order to do this experiment, all three algorithm for scheduling workflows, with the basic objective of re-
algorithms were executed 10 times for each of the five scientific ducing the makespan as well as the cost of execution. The efficiency
Table 10
ANOVA using CyberShake workflow of 2000 tasks.
(a) Summary of input
Group Count Sum Average Variance
HGSA 10 3.51E−05 3.51E−06 3.44E−16
GSA 10 2.96E−05 2.96E−06 5.10E−17
HGA 10 3.17E−05 3.17E−06 2.01E−16
Table 11
ANOVA using Epigenomics workflow of 2000 tasks.
(a) Summary of input
Group Count Sum Average Variance
HGSA 10 3.60E−07 3.60E−08 5.90E−20
GSA 10 3.00E−07 3.00E−08 1.30E−20
HGA 10 3.10E−07 3.10E−08 5.90E−20
Table 12
ANOVA using Inspiral workflow of 2000 tasks.
(a) Summary of input
Group Count Sum Average Variance
HGSA 10 3.90E−06 3.90E−07 3.70E−18
GSA 10 3.50E−06 3.50E−07 4.50E−19
HGA 10 3.60E−06 3.60E−07 2.30E−18
Table 13
ANOVA using Montage workflow of 2000 tasks.
(a) Summary of input
Group Count Sum Average Variance
HGSA 10 8.30E−05 8.30E−06 6.10E−16
GSA 10 5.80E−05 5.80E−06 8.80E−17
HGA 10 7.80E−05 7.80E−06 7.90E−15
Table 14
ANOVA using SIPHT workflow of 2000 tasks.
(a) Summary of input
Group Count Sum Average Variance
HGSA 10 5.10E−06 5.10E−07 1.90E−17
GSA 10 4.30E−06 4.30E−07 2.60E−18
HGA 10 4.80E−06 4.80E−07 1.50E−17
and usefulness of our algorithm has been demonstrated using var- we would like to apply the proposed approach for scheduling of
ious simulation runs. The results depict that HGSA performs better multiple workflows. It should be noted that in the proposed work
than HGA by 14%, GSA by 20% and the HEFT by 35% with respect to the execution time of a task is calculated based on the number of
the fitness value. This clearly shows the supremacy of our scheme instructions. However, for the complex tasks, this may not work
over its contemporaries. Moreover, the consistent performance of accurately. Thus, in the future work, we would like to support
HGSA has been validated through the statistical test, ANOVA. complex tasks as well. For the sake of the simplicity of the work, we
However, in the proposed approach, we have assumed that assumed that there is no VM failure during the task execution. This
the bandwidth between the VMs is fixed and all the tasks of the opens a possible future extension of this work. Our future efforts
workflow consist of simple programming instructions. Therefore, will be made towards addressing all these issues accompanied by
we would like to extend our approach by considering the scenario, a newly proposed scheduling algorithm with it’s simulation on
where the nature of the available bandwidth is variable.Moreover, federated multi-cloud environment.
26 A. Choudhary et al. / Future Generation Computer Systems 83 (2018) 14–26
References [27] J. Schad, J. Dittrich, J.A. Quiané Ruiz, Runtime measurements in the cloud:
Observing, analyzing, and reducing variance, Proc. VLDB Endow. 3 (1–2) (2010)
460–471.
[28] J. Walker, D. Halliday, R. Resnick, Fundamentals of Physics, Wiley, Hoboken,
[1] R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, I. Brandic, Cloud computing and NJ, 2008.
emerging IT platforms: Vision, hype, and reality for delivering computing as [29] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M.H. Su, K. Vahi, Characteriza
the 5th utility, Future Gener. Comput. Syst. 25 (6) (2009) 599–616. tion of scientific workflows, in: 2008 Third Workshop on Workflows in Sup-
[2] F. Tao, Y. Feng, L. Zhang, T. Liao, CLPS-GA: A case library and pareto solution- port of Large-Scale Science, IEEE, 2008, pp. 1–10.
based hybrid genetic algorithm for energy-aware cloud service scheduling, [30] G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta, K. Vahi, Characterizing
Appl. Soft Comput. 19 (2014) 264–279. and profiling scientific workflows, Future Gener. Comput. Syst. 29 (3) (2013)
[3] S.G. Ahmad, C.S. Liew, E.U. Munir, T.F. Ang, S.U. Khan, A hybrid genetic algo- 682–692.
rithm for optimization of scheduling workflow applications in heterogeneous [31] Pegasus: Workflow generator: (accessed on December. 24th, 2016). https:
computing systems, J. Parallel Distrib. Comput. 87 (2016) 80–90. //github.com/pegasus-isi/WorkflowGenerator.
[4] H. Topcuoglu, S. Hariri, M.Y. Wu, Performance-effective and low-complexity
task scheduling for heterogeneous computing, IEEE Trans. Parallel Distributed
Syst. 13 (3) (2002) 260–274.
[5] S. Su, J. Li, Q. Huang, X. Huang, K. Shuang, J. Wang, Cost-efficient task scheduling Anubhav Choudhary completed his M.Tech in Computer
for executing large programs in the cloud, Parallel Comput. 39 (4) (2013) 177– Science and Engineering from Indian Institute of Technol-
188. ogy (ISM), Dhanbad, India. He is currently working as a
[6] M.A. Vasile, F. Pop, R.-I. Tutueanu, V. Cristea, J. Kołodziej, Resource-aware software engineer in Sandvine Technologies, Bengaluru,
hybrid scheduling algorithm in heterogeneous distributed computing, Future India. His research interests include Cloud Computing and
Gener. Comput. Syst. 51 (2015) 61–71. Machine Learning. He completed his bachelor’s degree
[7] A. Verma, S. Kaushal, Cost-time efficient scheduling plan for executing work- course from Kalinga Institute of Industrial Technology,
flows in the cloud, J. Grid Comput. 13 (4) (2015) 495–506. Bhubaneswar, India.
[8] E. Rashedi, H. Nezamabadi-Pour, S. Saryazdi, GSA: A gravitational search algo-
rithm, Inf. Sci. 179 (13) (2009) 2232–2248.
[9] K.E. Muller, B.A. Fetterman, Regression and ANOVA: An Integrated Approach
using SAS Software, SAS Institute, 2002.
[10] M.A. Rodriguez, R. Buyya, Deadline based resource provisioningand scheduling
algorithm for scientific workflows on clouds, IEEE Trans. Cloud Comput. 2 (2)
(2014) 222–235. Indrajeet Gupta is working as a senior research scholar
[11] F. Fakhfakh, H.H. Kacem, A.H. Kacem, Workflow scheduling in cloud comput- in the Department of CSE at IIT (ISM), Dhanbad, India. He
ing: A survey, in: 2014 IEEE 18th International Enterprise Distributed Object received M. Tech. degree from NIT, Rourkela, India. He
Computing Conference Workshops and Demonstrations (EDOCW), IEEE, 2014, has published more than ten papers in reputed journals
pp. 372–378. and conferences. His current research interests include
[12] J.G. Barbosa, B. Moreira, Dynamic scheduling of a batch of parallel task jobs on workflow scheduling and load balancing in cloud comput-
heterogeneous clusters, Parallel Comput. 37 (8) (2011) 428–438. ing. He acted as reviewers in many reputed journals and
[13] L.F. Bittencourt, R. Sakellariou, E.R. Madeira, Dag scheduling using a lookahead conferences.
variant of the heterogeneous earliest finish time algorithm, in: 2010 18th
Euromicro International Conference on Parallel, Distributed and Network-
Based Processing (PDP), IEEE, 2010, pp. 27–34.
[14] Y. Ding, X. Qin, L. Liu, T. Wang, Energy efficient scheduling of virtual machines
in cloud with deadline constraint, Future Gener. Comput. Syst. 50 (2015) 62–
74.
[15] H.M. Fard, R. Prodan, J.J.D. Barrionuevo, T. Fahringer, A multi-objective ap-
Vishakha Singh received M. Tech. in computer science
proach for workflow scheduling in heterogeneous environments, in: Procee
and engineering from Indian Institute of Technology (ISM),
dings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud
Dhanbad, India. Her research interests include energy and
and Grid Computing (Ccgrid 2012), IEEE Computer Society, 2012, pp. 300–309.
cost-efficient workflow scheduling in cloud computing.
[16] S. Abrishami, M. Naghibzadeh, D.H. Epema, Deadline-constrained workflow
She obtained her bachelor’s degree from Indian institute
scheduling algorithms for infrastructure as a service clouds, Future Gener.
of Information Technology, Jabalpur, India. Apart from her
Comput. Syst. 29 (1) (2013) 158–169.
Masters topic, her expertise also covers wireless sensor
[17] I. Casas, J. Taheri, R. Ranjan, L. Wang, A.Y. Zomaya, A balanced scheduler
networks field.
with data reuse and replication for scientific workflows in cloud computing
systems, Future Gener. Comput. Syst. (2016).
[18] S.K. Panda, P.K. Jana, Normalization-based task scheduling algorithms for
heterogeneous multi-cloud environment, Inf. Syst. Front. (2016) 1–27.
[19] S.K. Panda, P.K. Jana, Uncertainty-based qos min–min algorithm for heteroge-
neous multi-cloud environment, Arab. J. Sci. Eng. 41 (8) (2016) 3003–3025.
[20] I. Gupta, M.S. Kumar, P.K. Jana, Compute-intensive workflow scheduling in
multi-cloud environment, in: 2016 International Conference on Advances in Prasanta K. Jana received M.Tech. degree in Computer
Computing, Communications and Informatics (ICACCI), IEEE, 2016, pp. 315– Science from University of Calcutta, in 1988 and Ph.D. from
321. Jadavpur University in 2000. Currently he is a Professor
[21] S. Pandey, L. Wu, S.M. Guru, R. Buyya, A particle swarm optimization-based in the department of Computer Science and Engineering,
heuristic for scheduling workflow applications in cloud computing environ- IIT (ISM), Dhanbad, India. He has contributed 150 research
ments, in: 2010 24th IEEE International Conference on Advanced Information publications in his credit and co-authored six books and
Networking and Applications, IEEE, 2010, pp. 400–407. two book chapters. He has also produced seven Ph.Ds. As
[22] R. Jena, Multi objective task scheduling in cloud environment using nested pso a recognition of his outstanding research contributions, he
framework, Procedia Comput. Sci. 57 (2015) 1219–1227. has been awarded Senior Member of IEEE (The Institute
[23] S.K. Garg, P. Konugurthi, R. Buyya, A linear programming-driven genetic al- of Electrical and Electronics Engineers, USA) in 2010 and
gorithm for meta-scheduling on utility grids, Int. J. Parallel Emergent Distrib. Canara Bank Research Publication Award in 2015. He is
Syst. 26 (6) (2011) 493–517. in the editorial board of two international journals and acted as referees in many
[24] X. Wang, C.S. Yeo, R. Buyya, J. Su, Optimizing the makespan and reliability for reputed international journals. Dr. Jana has acted as the General Chair of the In-
workflow applications with reputation and a look-ahead genetic algorithm, ternational Conference RAIT-2012, Co-chair of National Conference RAIT-2009 and
Future Gener. Comput. Syst. 27 (8) (2011) 1124–1134. Convener of the workshop WPDC-2008. He has also served as advisory committee
[25] M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, Dryad: Distributed data-parallel and programme committee members of several International Conferences. His
programs from sequential building blocks, in: ACM SIGOPS Operating Systems current research interest includes wireless sensor networks, cloud computing and
Review, Vol.41, ACM, 2007, pp. 59–72. big data clustering. He visited University of Aizu in 2003, Japan, Las Vegas, USA in
[26] Y. Xu, K. Li, L. He, T.K. Truong, A dag scheduling scheme on heterogeneous 2008, Imperial college of London, UK in 2011, University of Macau, Macau in 2012
computing systems using double molecular structure-based chemical reaction and Hong Kong in 2015 for academic purpose.
optimization, J. Parallel Distrib. Comput. 73 (9) (2013) 1306–1322.