0% found this document useful (0 votes)

32 views22 pages

(2019) (23) - Performability Evaluation and Optimization of Workflow Applications in Cloud Environments

Uploaded by

Luan Lins

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views22 pages

(2019) (23) - Performability Evaluation and Optimization of Workflow Applications in Cloud Environments

Uploaded by

Luan Lins

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

J Grid Computing (2019) 17:749–770

https://doi.org/10.1007/s10723-019-09476-0

Performability Evaluation and Optimization of Workflow

Applications in Cloud Environments
Danilo Oliveira · André Brinkmann ·
Nelson Rosa · Paulo Maciel

Received: 13 April 2018 / Accepted: 6 January 2019 / Published online: 17 January 2019
© Springer Nature B.V. 2019

Abstract Given the characteristics of dynamic pro- Experimental results show the effectiveness of the
visioning and illusion of unlimited resources, clouds hybrid simulation-optimization approach for optimiz-
are becoming a popular alternative for running sci- ing the number of allocated virtual machines and the
entific workflows. In a cloud system for process- scheduling of tasks regarding performability.
ing workflow applications, the system’s performance
is heavily influenced by two factors: the schedul- Keywords Scientific workflows · Performability ·
ing strategy and failure of components. Failures in a Stochastic petri nets · Optimization
cloud system can simultaneously affect several users
and depreciate the number of available computing
resources. A bad scheduling strategy can increase 1 Introduction
the expected makespan and the idle time of physical
machines. In this paper, we propose an optimization In the past decades, computers become an invaluable
method for the scheduling of scientific workflows on asset for scientists in many fields of human knowl-
cloud systems. The method comprises the use of a edge. Simulation models are useful when experiments
meta-heuristic algorithm coupled to a performability in the real world are too difficult or costly to execute,
model that provides the fitnesses of explored solu- or when the phenomenon of interest is impossible to
tions. For being able to represent the combined effect reproduce (for instance, in studies about the origin of
of scheduling and component failures, we adopted dis- the universe). Such models are often computationally
crete event simulation for the performability model. intensive and require an execution environment com-
posed of many processing units. Many computational
D. Oliveira () · N. Rosa · P. Maciel scientific applications can be expressed as workflows,
Federal University of Pernambuco, Informatics Center, i.e., a set of subtasks with data and control-flow
Recife, Brazil
e-mail: [email protected]
dependencies between them. In such applications, the
scheduling of tasks in the processing units plays a
Nelson Rosa
e-mail: [email protected]
vital role in the system’s performance, but finding an
optimal schedule is an NP-Hard problem [8].
Paulo Maciel
e-mail: [email protected]
Nowadays, cloud computing has been attracting
attention as a platform for running scientific appli-
A. Brinkmann
Data Processing Center (ZDV), Johannes Gutenber
cations [25, 27, 55]. The pay-per-use model elim-
University, Mainz, Germany inates the need for upfront investment on a clus-
e-mail: [email protected] ter/supercomputer. Moreover, cloud users do not need
750 D. Oliveira et al.

to worry about managing underlying hardware infras- its adaptable architecture, many extensions were pro-
tructure. While this model makes things more conve- posed to address aspects not originally implemented
nient for the user, this task becomes a severe issue by CloudSim. Some of the extensions feature auto-
for cloud providers, who need to guarantee reliability scaling [11], federated clouds [30], fault tolerance
and performance levels specified by a Service Level mechanisms [1, 16, 65], and workflow applications
Agreement (SLA). [10, 14, 35]. Nevertheless, covering multiple real-
The complexity of cloud infrastructures (i.e., the world characteristics of workflow applications in the
large number of hardware and software components same cohesive model is still a gap in the literature.
and their interdependence relationships) raises the Such characteristics include non-determinism, multi-
need for performance evaluation methods that con- tenancy, and the combined effect of hardware and
sider the failure of components. A failure in a sin- virtual machine failures.
gle physical server can bring down several virtual In this work, we propose an optimization method
machines of different users. Likewise, the unavailabil- for the scheduling of scientific workflows running in a
ity of the cloud manager (the head node of a cloud cloud environment. The throughput of workflow jobs
infrastructure) can provoke the unavailability of the is the problem’s objective function. Each request pro-
whole system. A classic performance study (that disre- cessed by the cloud system is defined by a graph of
gards reliability aspects) may not give accurate results subtasks with precedence constraints between them.
since failures of physical and virtual machines can A single job demands the provisioning of a certain
increase waiting times and decrease throughput [43]. number of virtual machines for running the subtasks
To assess the performance degradation caused by fail- in parallel. The model used to compute the objec-
ures in a cloud environment via measurement-based tive function can measure the impact of inefficient
evaluation is often prohibitive in practice. Even using scheduling and failures of components on the system’s
fault injectors to provoke failures on the system’s com- throughput. Given the presence of stochastic com-
ponents, the associated costs and time constraints for ponents on the proposed model, our method applies
such experiments are high, especially when testing discrete-event simulation for evaluating the objective
multiple configurations in a sensitivity analysis (i.e., function. We developed a Stochastic Petri net genera-
a systematic study of the impact of system param- tor algorithm for creating performance and reliability
eters on system’s performance/reliability [24]). For models of cloud workflow applications. Our exper-
that reason, state-space based models (e.g., Markov imental results demonstrate the effectiveness of the
chains [7], Stochastic Petri Nets (SPN) [38], Stochas- hybrid Simulation/Optimization approach for optimiz-
tic Automata Networks [42]) are the most employed ing the considered objective function. We also present
technique for performability evaluation of cloud and a sensitivity analysis study of the effects of hardware
grid systems [19, 43, 45, 57]. Besides the occurrence and virtual machine failures on system throughput.
of failures, inefficient scheduling strategies can harm This work is structured as follows. Section 2 cov-
the overall system performance by increasing the job ers the theoretical background and Section 3 presents
makespan and reducing the utilization of processing a list of related works. Section 4 describes the pro-
units. posed optimization method and Section 5 presents the
Evaluating the effects of both scheduling and hard- performability model used for the objective function.
ware/virtual machine failures on performability of In Section 6, we show the evaluation results for the
workflow applications in cloud environments is a chal- proposed method. Finally, Section 7 makes some final
lenging task for space-state based models due to the considerations and points further research directions.
high number of states to be considered. Moreover,
the exponential distribution may not be a good fit for
computation times in workflows. Given this intrin- 2 Background
sic limitation of space-based models, many existing
research efforts towards the modeling of cloud appli- This section presents basic concepts about combi-
cations employ discrete event simulators. CloudSim natorial optimization with simulation and workflow
[13] is the most widely adopted simulation frame- scheduling, aiming to facilitate the understanding of
work in the cloud computing literature. Thanks to this work.
Performability Evaluation and Optimization of Workflow Applications in Cloud Environments 751

2.1 Simulation/Optimization Hybrid Heuristics provide a survey of different Sim-Opt approaches and
present guidelines to help the selection of the most
Combinatorial optimization problems (COP) arise in suited approach given the particular characteristics of
many areas of human activity and computer science the problem (e.g., all designs have the same variance,
as well. A single-objective COP can be described the variance is known/unknown, and so on).
regarding [6]: Using simulation instead deterministic models
becomes possible to avoid simplifications needed
– a set of variables X = {x1 , x2 , . . . , xn }, where xi
when using deterministic models. The expressiveness
belongs to a domain Di ;
of deterministic models, however, has a drawback:
– a set of constraints c1 , c2 , . . . , cm ;
the time to solve a simulation model can be signifi-
– an objective function f : D1 ×D2 ×. . . Dn → IR.
cantly prolonged. This fact is even more problematic
The optimization procedure should give a solution in simulations used in conjunction with meta-heuristic
s from the space state S = D1 × D2 × . . . Dn that algorithms that need to compute the objective func-
either minimizes or maximizes the objective function tion of a large number of solutions. One alternative is
f (s), and satisfies all constraints. to use a surrogate model [44], a simplification of a
The characteristics of the objective function and more complex simulation model, that can be evaluated
constraints define the approach used to solve the prob- more quickly. A typical approach is to use an Artifi-
lem. If the objective function and constraints have cial Neural Network (ANN) as a surrogate model [21]
linear relationships, the underlying problem can be along with a simulation model to train the network.
solved by the efficient Simplex algorithm [39]. How- The training phase is computationally intensive, but
ever, since many essential COP problems are NP- once completed, it generates results very quickly.
Hard, a conventional approach to solving them is
using approximate algorithms [4]. They can find good 2.2 Scheduling of Scientific Workflows on Cloud
enough - but not optimal - solutions from a potentially Systems
large solution space. Metaheuristics are approximate
algorithms not tied to any particular problem/domain A scientific workflow consists of a set of comput-
and can be adapted for solving many different combi- ing and IO intensive tasks with precedence constraints
natorial problems [53]. between them. Scientific workflows can be repre-
COP problems with stochastic components in sented by a directed acyclic graph (DAG). A DAG G is
either the objective function or constraints can adopt defined by a tuple {T , E}, where T = {T1 , T2 , . . . , Tn }
simulation models to represent the random behavior. is the set of tasks and E = {e1 , e2 , . . . , em } is the set
Hybrid Simulation-Optimization (Sim-Opt) methods of precedence constraints. Each ei = (Ta , Tb ) tuple
deal with the issues involved in using simulation denotes that task Tb starts to execute after Ta finishes
models in conjunction with optimization algorithms. and sent some input data to Tb . The node weights
Swisher et al. [51] define Sim-Opt methods as “a are the computing times, and the edge weights are
structured approach to determine optimal settings for the communication times, i.e., the time for sending
input parameters (i.e., the best system design), where the results needed by a dependent task running on a
optimality is measured by a (steady-state or tran- different processing node.
sient) function of output variables associated with a A scheduler should map tasks/jobs to a set of pro-
simulation model”. cessors according to some predefined goals such as
A simulation-optimization problem can be viewed utilization of resources, makespan (the total length
as the selection of the best design among a (potentially of a schedule), throughput, meeting of deadlines, etc.
large) set of possible designs concerning some out- A particular scheduling of tasks for some DAG can
put response variable is given by a simulator model. be defined by a mapping of ordered task lists to the
A random distribution will define the response vari- processing units. A scheduling algorithm can either
able for each element of this set. The optimization require as input the number of processing units or try
procedure selects the design that corresponds to the to find an optimal/near optimal number of process-
highest or lowest expected value of this distribution ing units in conjunction with the mapping of tasks.
by using a sample for each design. Swisher et al. [51] The latter case is harder since it leads to an increased
752 D. Oliveira et al.

search space. Additionally, increasing the number of (Stochastic Petri Nets and Markov Chains) is the most
processors can shorten the makespan of an individual adopted method for joint performance/availability
job, but it may increase the idle time of processors due evaluation of clouds and grids systems. Dealing with
to the precedence constraints between tasks [48]. space state explosion is a recurrent problem handled
Since many practical scheduling problems are by every work in this category.
either NP-Hard or NP-Complete, a lot of effort is nec- Ramakrishnan and Reed [46] propose a qualitative
essary to apply and adapt meta-heuristic algorithms framework for the performability evaluation of scien-
to schedule tasks in cloud computing environments. tific workflows running on grid systems. The frame-
The following list summarizes the major contributions work encompasses a Markov Reward Chain model [7]
made by the literature concerning different aspects of and simulations calibrated with data from real grid
cloud workflow scheduling: applications. Xia et al. [58] describe a queuing net-
work model for evaluating estimated service time and
– Devising heuristics algorithms able to provide request rejection probability of an Infrastructure-as-
near-optimal solutions under certain constraints a-Service cloud. This model represents features such
[32]; as request handling, job creation, job execution, job
– Adapting nature inspired and evolutionary algo- rejection due to insufficient queue capacity, and failure
rithms for this problem, such as Genetic algorithm and repair events of physical machines configured in
[22, 60, 64], Honey bee colony [5, 31], ant colony hot/warm standby mode. Ever et al. [18] propose a set
[15, 52], and Fish Swarm [61]; of equations obtained from queuing theory for evaluat-
– Dealing with a heterogeneous system (processors ing the performability of clouds with large numbers of
with different computing power) [41]; servers. Since the underlying space-state model does
– Dealing with conflicting aspects of a multi-cost not need to be generated, this model can represent a
objective function (e.g., energy versus makespan) large number of servers and simultaneous requests.
[37]. A strategy to avoid space state explosion is to adopt
small models rather than use a big monolithic one.
For combining the results of the submodels, iteration
3 Related Work methods can be used [34]. Ghosh [19] uses inter-
acting homogeneous time Markov chains to perform
3.1 Performability Modeling of Cloud and Grid end-to-end performability analysis of cloud services.
Environments The proposed model is used to evaluate two essential
metrics: service availability and response time. Raei
Performability is the study of systems performance et al. [45] developed Stochastic Reward Net models
when subjected to the effect of failures on its sub- for representing a public cloud and a cloudlet pro-
components [36]. The performance of a system is said viding virtual machines for mobile applications. For
to be degradable if failure events may affect it neg- avoiding space state explosion when modeling both
atively. For instance, a mesh network of routers can performance and availability aspects of the consid-
tolerate a certain number of failures, but the over- ered system, the authors divided the public cloud and
all performance will be affected as some routers may cloudlet parts into two separated models and used the
be subject to overheads. Similarly, failures on worker fixed point iteration method to obtain a joint result.
nodes in cloud and grid environments can diminish The performability models cited in this subsec-
the number of available processing resources, and tion are based on Markov Chain [7, 46], Stochastic
therefore increasing queueing times and decreasing Petri nets (with Markov chain generation) [19, 34,
throughput of jobs. 45], and Queuing Theory [18, 58]. By contrast, our
Due to a large number of components of cloud and work adopts a discrete simulation approach based on
grid environments, we can expect a significant fail- Stochastic Petri nets components and automatic gener-
ure rate even if the mean time to failure of individual ation of models. The advantage of our model over the
components is high. Thus, neglecting the impact of works mentioned above is the ability to model DAGs
failures in performance studies of such systems can as job requests and the relationship between VM and
lead to misleading results. Space stated based models hardware failures. Incorporating these features into
Performability Evaluation and Optimization of Workflow Applications in Cloud Environments 753

space-state based models would lead to a space-state scheduling and resource provisioning algorithms for
explosion problem. Also, using the exponential distri- optimizing the execution of workflow ensembles
bution for representing job times can introduce dis- under deadline constraints in IaaS clouds. Elastic-
tortions when modeling the makespan of a stochastic CloudSim [11] is a CloudSim extension for evaluating
DAG (as we demonstrate in Section 6.2.2). workflow applications which supports auto-scaling
capabilities and considers non-deterministic (stochas-
3.2 Simulation of Workflow Execution on Cloud tic) workflows.
Environments Our work differs from WorkflowSim and Dynamic-
CloudSim by modeling hardware and virtual machine
Simulation is a commonly used approach for evalu- failures instead of representing transient/permanent
ating the performance of load-balancing algorithms, failure on tasks. FailureSim is able to model hard-
allocation policies, and scheduling strategies in cloud ware failures, and DesktopCloudSim can represent
systems, considering dynamic workload patterns. both hardware and VM failures. However, Desktop-
CloudSim [13] is the most adopted cloud simulation CloudSim and FailureSim do not target workflow
software in the literature. The CloudSim simulator applications. We opted not to create another CloudSim
allows the representation of data-center infrastruc- extension as the employed SPN based simulator
tures, VM allocation policies, user level workloads, presents some advantages. The proposed SPN mod-
and coordination between multiple cloud environ- els can be used separately for obtaining other metrics
ments through a cloud broker service. After being than performability (e.g., availability, reliability, and
released as open source software, CloudSim was expected makespan). Using this simulation environ-
extended in many different ways by the research com- ment also allows us to use existing SPN models in the
munity. Fault tolerance capabilities were introduced literature for representing reliability and performance
in [65]. FederatedCloudSim [30] extended CloudSim aspects of our system.
to represent SLA policies in federated clouds. Fail-
ureSim [16] introduced failure prediction of cloud 3.3 Cloud Workflow Optimization
nodes based on ANNs. Performance and usage levels
(bandwidth, number of tasks running, the quantity of The execution of workflow applications on clouds
available million of instructions per second per node) brings the need of new modeling strategies, schedul-
are used as predictors for training the network. Alwa- ing algorithms, and optimization metrics. The reason
bel et al. proposed DesktopCloudSim [1], a CloudSim for this need is the particular aspects of cloud systems
extension with a layer of failure injection for the phys- when contrasted to traditional grid/cloud environ-
ical nodes. Like our work, DesktopCloudSim allows ments. Kliazovich et al. [29] demonstrated how exis-
the investigation of failure events on system through- tent workflow models fail to address the communica-
put. tion patterns typically found in cloud workflow appli-
CloudSim does not offer, by default, classes for cations. They proposed CA-DAG (Communication-
representing workflows modeled as DAGs. Given the Aware DAG), a workflow model which represents
importance of this application category, some exten- communication processes as vertices instead of edges.
sions to CloudSim were proposed for representing Arabnejad and Barbosa [3] developed a Heteroge-
workflows. WorkflowSim [14] is a CloudSim exten- neous Budget Constrained Scheduling (HBCS) for
sion that includes support for workflow representation minimizing makespan and rental cost of cloud work-
and management. It also provides task aggregation flow applications. The HBCS algorithm is able to
capabilities and a fault generator at job/task level. It reduce up to 30% of the execution time while main-
can generate recoverable transient failures that can be taining the same budget level.
handled by task re-execution and permanent job fail- Many works in the scheduling literature con-
ures that cannot be recovered. DynamicCloudSim [10] sider deterministic computation and communication
is a CloudSim based simulator that includes work- times. However, using a deterministic objective func-
flow execution considering VM inhomogeneity and tion does not match the non-deterministic nature of
failure of tasks at runtime. Malawskia et al. [35] devel- real-world applications [2]. In this sense, Zheng et
oped a cloud workflow simulator for evaluating task al. [62] proposed a Monte Carlo based scheduling
754 D. Oliveira et al.

method for cloud/grid workflows which consider non- performability model of a grid resource based on a
deterministic computing and communication time. stochastic reward net model and the universal gen-
The method is not dependent on a particular heuristic erating function. The proposed model is connected
algorithm, and the HEFT is adopted in the evalu- to a genetic algorithm which aims to optimize the
ation. In [63], a randomized version of HEFT was scheduling of a DAG into a set of grid resources.
proposed. The algorithm consists in running a deter- Our work aims to contribute to the research line
ministic HEFT for random predictions of the stochas- opened by Entezari et al. [17] - workflow schedul-
tic DAG, generating a list of potential candidates for ing optimization from a performability viewpoint.
best scheduling. The scheduling from the list with To the best of our knowledge, no existing method
the smaller expected makespan is selected. Cai et al. covers simultaneously performability as the objec-
[12] presents a dynamic algorithm for minimizing tive function, multi-tenancy, non-deterministic com-
the rental cost (of VMs in a cloud) of bag-of-tasks puting/communication times, and failures of hosts and
workflows with non-deterministic times. The Cloud VMs. Table 1 shows a comparison of our work to the
Workflow Scheduling Algorithm (CWSA) [47] aims state of the art.
to optimize the scheduling of workflows in a multi-
tenant cloud environment. This algorithm considers
non-deterministic times for task computation times. 4 Problem Definition and Proposed Optimization
The presence of failures in data centers can pose a Method
threat to workflow applications with strict deadlines.
In [56], an original fault-tolerant scheduling algorithm This work aims to solve the multiprocessor schedul-
name FESTAL was proposed. It employs a primary/ ing problem of scientific workflows running in a
backup redundancy model and VM migration to cloud environment, using a performability metric as
achieve high-availability and load balancing into a the objective function. Given a workflow described
cloud workflow application. Vinay et al. [54] present by a DAG G = {T , E}, our objective is to find a
a new heuristic for cloud scheduling named CHEFT scheduling S of tasks in T on m virtual machines that
(Cluster-based Heterogeneous Earliest Finish Time). maximizes the throughput of jobs. The number m is
It uses the idle time of the processors for resubmitting not fixed and must be determined by the optimization
the failed tasks as a mean to achieve fault toler- method.
ance. FASTER [66] is another algorithm that employs The flow diagram of Fig. 1 presents a high-level
the primary/backup redundancy model for provid- overview of the proposed optimization method. The
ing a fault tolerant scheduling mechanism for cloud workflow DAG has a list of tasks and their depen-
applications. Performability was first considered an dencies, processing time of each task and commu-
objective function in [17]. The authors developed a nication time between dependent tasks running on

Table 1 Comparison of the state of art for cloud workflow scheduling

Work Metric Multi-tenancy Non-deterministic times Failure model

Kliazovich et al. [29] Makespan No No Not considered

Arabnejad and Barbosa [3] Makespan and rental cost No No Not considered
Zheng and Sakellariou [62] Makespan No Yes Not considered
Cai et al. [12] Rental cost Yes Yes Not considered
Rimal and Maier [47] Makespan and resource usage Yes Yes Not considered
Zhou et al. [56] Job reliability and resource usage Yes No Host failures
Zhu et al. [66] Job reliability and resource usage Yes No Host failures
Vinay and Kumar [54] Makespan and cost No No Task failures
Entezari-Maleki et al. [17] Performability Yes Yes (exponential times) Processor failures
Our work Performability Yes Yes Host, VM, and cloud
manager failures
Performability Evaluation and Optimization of Workflow Applications in Cloud Environments 755

Fig. 1 Overview of the

proposed method

different processors. The computing and communica- additional performance/reliability metrics, such as
tion times of a DAG can either be deterministic or average waiting/response time, discard rate of
follow a specific random distribution (normal, expo- tasks/jobs, the probability of completing a job. The
nential, Erlang, and so on). The cloud infrastructure method can be used interactively, i.e., the user can
parameters define the number of physical servers, modify the cloud/control parameters and repeat the
the maximum number of virtual machines that each process, obtaining new scheduling and performability
host can provide, and the failure/repair/switchover metrics.
rates of the physical/virtual machines. The simula- The remainder of this section will explain further
tion/optimization parameters configure the simulation each part that composes the proposed method.
engine (e.g., number of replications for an individual
simulation) and the optimization algorithm (e.g., pop- 4.1 Genetic Algorithm with Stochastic Fitness
ulation size, number of elite chromosomes, number of Function
generations and so on).
The optimization algorithm explores the solution The activity diagram of Fig. 2 describes the optimiza-
space for the input DAG until the stopping condition tion algorithm adopted in this paper. It is a genetic
is reached, which is defined by the control param- optimization procedure that uses a simulation model
eters. Then, a near-optimal scheduling solution is for computing the fitness value of explored chromo-
provided by the algorithm. The user can perform fur- somes. The chromosome representation consists of a
ther analysis of the obtained solution and evaluate pair of vectors representing the ordering of tasks and
756 D. Oliveira et al.

Fig. 2 Genetic algorithm

with a stochastic fitness
function

the mapping of tasks to the processors, as illustrated in random subinterval. Figure 4 shows an example of
Fig. 3. The partially-mapped crossover (PMX) oper- crossover operator. The mutation operator modifies a
ator [20] was adopted to generate the offspring of a chromosome with a random operation by swapping
population. Two random chromosomes are randomly
selected for creating a pair of children (parents with
higher fitness value are more likely to be selected).
The process to generate the children is defined as fol-
low. First, it creates a copy of the parents. A paired
subinterval is randomly selected and switched among
the children. Then, a mapping function is applied to
convert the repeated alleles (i.e.: the units of infor-
mation that compose the chromosome) outside the

Fig. 3 Chromosome representation Fig. 4 PMX crossover operator

Performability Evaluation and Optimization of Workflow Applications in Cloud Environments 757

the order of two tasks or changing a task to a differ-

ent processor. The mutation operator is illustrated in
Fig. 5.
Figure 6 shows the activity diagram of the method
to obtain a fitness value of a solution s. The method
takes as input a solution s from the search space of
available schedulings, a set of fixed parameters, a
template model, and a set of auxiliary models. The
solution s is parsed to obtain a set of solution parame-
ters. The fixed and solution parameters are combined
into a single set. The joint solution-fixed parameters
set is divided into two categories: structural and non-
structural. The structural parameters define the fixed
structures of the model and the arcs/edges that connect
them. Nonstructural parameters define the delay/rate
parameters and information such as probabilities, and
buffer capacity.
The combined set of structural/non structural
parameters and the base models are used as input for
generating the final simulation model for a solution.
The next section explains the performability model
and the conversion algorithm.

5 Performability Model for the Fitness Function

In this section, we describe the simulation model used

to compute the objective function of the scheduling
engine available in the Mercury modeling tool [33,
40]. This simulation engine is based on the Stochastic
Petri nets formalism and allows working with mod- Fig. 6 Method to obtain the fitness value for a solution s
els created on other formalisms such as Continuous
Time Markov Chains and Reliability Block Diagrams. It enables us to use existing reliability models for
private clouds [49] due to its hierarchical modeling
capabilities.
Figure 7 provides an overview of the modeled sys-
tem. The cloud infrastructure consists of a collection
of hosts that acts either as workers or part of a cloud
manager. A work node is a server that runs the users’
virtual machines. The cloud manager is a subsystem
composed of the software components for handling
users’ requests and orchestrating the cloud infrastruc-
ture. Employing more than one physical machine to
run the necessary services is recommended to avoid a
single point of failure.
Job requests arrive according to a Poisson pro-
cess, and the rate is a controlled parameter. Each job
is composed of a series of subtasks with precedence
Fig. 5 Mutation operator constraints defined as a DAG. A predefined strategy
758 D. Oliveira et al.

Fig. 7 Modeled system

establishes the scheduling of the subtasks. This strat- submission is when some virtual machine failure pre-
egy determines the number of virtual machines allo- vents the execution of one or more subtasks. A virtual
cated from the cloud provider and the tasks executed machine can fail due software (operating system or
on each of them. The scheduling can be generated hypervisor) or hardware faults.
by some heuristic (First Come-First Served, Shortest The number of completed and failed jobs are
Job First, Heterogeneous Earliest Finish Time, etc.) or recorded by the simulation model. The annual
meta-heuristic algorithm (taboo search, genetic algo- throughput is obtained by dividing the simulation time
rithm, ant colony, etc.). in years by the total number of completed jobs. The
Figure 8 depicts the discrete event simulation job failure ratio is defined by (1). The annual through-
model as an open queue accepting a job submission put and the job failure ratio are the metrics adopted in
influx with rate equals to λ. Each job will be promptly the analysis of Section 6.
executed in case there are available resources in the
cloud. Otherwise, it will be enqueued or discarded (if
the queue is full). A job will also be discarded if the Discarded jobs
Job failure ratio =
cloud manager is unavailable since in this case, it is Completed jobs + Discarded jobs
not possible to allocate the cloud resources for the job (1)
execution. Another possibility for the failure of a job

Fig. 8 Discrete event

simulation model

fi
Performability Evaluation and Optimization of Workflow Applications in Cloud Environments 759

In the simulation model, the service time of each job

execution is computed by converting it to a Stochastic
Petri net and measuring the time to reach a deadlock
marking. The transitions from the generated SPN rep-
resent the following events: i) the processing of a task;
ii) communication between tasks scheduled to differ-
ent VMs; iii) and failure of VMs. The SPN will reach
a deadlock marking after all tasks are executed or if a
VM failure impedes any task to finish properly. In the
next subsection, we explain the conversion algorithm.
Fig. 9 Example of Directed Acyclic Graph (adapted from [59])
5.1 Performance Modeling via DAG to SPN
Conversion – The tasks are scheduled to different VMs. In
this case it is necessary to include an additional
Stochastic Petri nets are well suited for modeling par- place-transition pair for representing the commu-
allel/concurrent activities and the logical ordering of nication. This transition is connected to the inputs
events, such as resource contention, forking, joining, place from the destination task via an output arc.
etc. In this work, we propose a conversion algorithm
Besides the representation of data relationships from
that takes a DAG workflow specification and schedul-
the DAG structure, the SPN model should express the
ing as input and produces an SPN model as output.
temporal constraints imposed by the scheduling. If a
The resulting SPN model is used in the simulation
set of tasks t1 , t2 , . . ., tn are scheduled to the same VM
model for representing the executions of the tasks
(in this order), the processing of a task ti should be
that compose a job request. Besides being a funda-
enabled only after the execution of the previous task
mental part of our simulation-based performability
ti−1 , in addition to the DAG’s data constraints. In our
model, the SPN representation enables us to evaluate
algorithm, this is ensured by connecting control places
the reliability, the estimated makespan, and the mean
to the processing transitions via input arcs.
time to interruption of the scheduling of a DAG with
To illustrate our method, we used the DAG from
non-deterministic times, via numerical analysis. For
Fig. 9 (adapted from [59]) and the scheduling dis-
obtaining those metrics without employing simulation
played in Fig. 10 as the input for our algorithm. The
it is necessary that the processing and communication
resulting SPN model is displayed in Fig. 11. The
times can be depicted as exponential or phase-type
control places and connected arcs are depicted in gray.
random variables.
The Algorithm 1 is a pseudo-code representation
of the proposed method. This algorithm takes a DAG
and scheduling as input, and produces an SPN perfor-
mance model as output. It means that a different SPN
model will be generated for each particular schedul-
ing. In the resulting model, each task is represented
by a place-transition pair. The place represents the
inputs for the task and the transition represents the task
processing event. A processing transition for a task t
must forward a token for each dependent task’s inputs
place. For a pair of tasks related by a precedence
relationship, there are two possibilities:

– The tasks are scheduled to the same VM. In

this case, the processing transition from the source
task has an output arc connected to the inputs
place from the destination task. Fig. 10 A scheduling for the DAG shown in Fig. 9
760 D. Oliveira et al.

5.2 Reliability Modeling

Failure of virtual machines and physical servers have

a substantial influence on system’s performance. Job
requests are enqueued if the cloud cannot provide the
specified number of virtual machines. The decrease of
the number of virtual machines due the effect of fail-
ures causes an increase in the average waiting time
and a reduction on the system’s throughput. When a
job request completes, the allocated virtual machines
are released and become available for processing fur-
ther requests. If a failure occurs during the processing
of a job, the failed virtual machines/worker nodes are
subject to repair, and the processing is canceled. In
case of a failure on the cloud manager, arriving job
requests are discarded, since this is the component that
acts as an interface between the users and the cloud
infrastructure.
The reliability submodel describes the operational
state (up/active or down/failed) of the cloud manager
and the worker nodes and running virtual machines.
It is represented as an SPN model integrated to the
performability model. The cloud manager is defined
as a fixed structure and the worker nodes and vir-
tual machines are dynamically generated according
to two parameters: the number of worker nodes and
the maximum number of running virtual machines
per server. Figure 12 illustrates the model generated
for two worker nodes with two virtual machines per
node. Each component is defined as a pair of places
for defining the component’s operational state, and a
pair of transitions for causing failure/repair events. An
immediate transition controlled by an inhibitor arc is
used for updating the operational state of every virtual
machine associated to a physical server.
We adopted the SPN availability model from [50]
for representing the operational state of the cloud man-
ager (shown in Fig. 13). This model represents two
servers configured to work in a warm-standby redun-
dancy scheme. The system availability is obtained by
the following expression:

P {#primary up = 1 or #spare active up = 1},

which represents the steady state probability of hav-

ing a token in the #primary up or #spare active up
places. In the warm-standby scheme, the spare server
remains turned on but it does not accept workload
if the main server is active. The failure rate of a
Performability Evaluation and Optimization of Workflow Applications in Cloud Environments 761

Fig. 11 DAG and scheduling converted to an SPN model

Fig. 12 Model generated with structural parameters set to number of servers = 2, vms per server = 2
762 D. Oliveira et al.

Fig. 13 Availability model

for the cloud manager

idle component is assumed to be lower than the fail- Figure 14 displays an overview of top-level simula-
ure rate of a component serving requests [23]. The tion model. As a discrete event simulator, it has a
activate spare transition represents the switchover global clock for the simulation time and an ordered
event, i.e., the process of configuring the standby list of events which is processed and updated as the
server to handle the incoming workload. This tran- simulator runs. The simulation engine also maintains
sition is enabled only when a failure occurs on the a list of running Petri nets. Petri nets can be config-
primary server due to the inhibitor arc. After the pri- ured at the beginning of the simulation, and new Petri
mary server is repaired, the standby server is sent back nets can be created or destroyed during simulation run
to the idle state. time. The simulation routines are software modules
(implemented in the Java programming language) that
5.3 Simulation Environment are invoked according to specific simulation events.
These routines can modify the simulation state, sched-
In this subsection, we go into details about the top ule/cancel simulation events, and start/destroy Petri
level simulation model and the simulation engine. nets. The Petri nets generate firing events and timed

Fig. 14 Simulation model - high level view

Performability Evaluation and Optimization of Workflow Applications in Cloud Environments 763

transitions can be configured to fire according to a Table 3 Model parameters

specific probability distribution.
Parameter Value
The performability submodel keeps track of the
workload (the workflows processed by the cloud Mean time to failure - physical machine 8760 h
resources). The transition arrivals create tokens cor- Mean time to failure - idle physical machine 13140 h
responding to individual job requests. A job request Mean time to failure - virtual machine 2880 h
being executed is associated to a running workflow Mean time to repair - physical machine 1h
Petri net (created by the Algorithm 1). An incom- Mean time to repair - virtual machine 1h
ing workflow job starts its execution when resources Arrival rate of jobs 1/2.5 (1/h)
are provisioned. Virtual machines are provisioned Number of workers 20
to workflow jobs via the VM scheduling algorithm VMs per worker 3
(in this work, we used a round-robin algorithm). In Activation time of standby server 0.004 (h)
Fig. 14, we display the association of the running job Number of replications for the simulation 30
to the scheduled virtual machines with a dashed line
arrow.
In this work, we employ transient simulations, i.e., 6.1 Optimization and Performability Analysis
we set a finish time and let the simulator run until of a Small Sized DAG Application
this time is reached. For being able to capture failure
events, a large simulation time should be employed. In this case study, we choose a small-sized DAG appli-
We adopted a runtime of three years, since using cation for being able to apply a brute force analysis
a higher simulation runtime was too computation- and compare the results found by the meta-heuristic
ally demanding. Metrics are obtained by reading the algorithm with the global maximum. Table 3 shows
simulation state after a transient run. For obtaining the input parameters for the performability model. We
point estimators and confidence intervals, we perform adopted the mean time to failure, repair, and activa-
several transient runs with different random seeds. tion times for physical and virtual servers as defined
in [28]. The DAG from Fig. 9 was adopted in this case
study. We consider the computing and communication
6 Experimental Results times to follow a normal distribution [12]. The mean
values are obtained from the graph’s nodes and edges
In this section, we present two case studies to evaluate and the standard deviation is assumed to be 10% of the
the proposed method. The first case study shows the mean value.
analysis of a small-sized workflow application/cloud The scatter plot from Fig. 15 shows the fitnesses
infrastructure. In the second case study, we examine a values (the number of completed jobs per year) for
real workflow application and a greater cloud infras- all possible schedulings of Fig. 9’s DAG, evaluated
tructure. In both case studies, the metric used for the by brute force. The solution space has 7840 differ-
fitness function was the average number of completed ent schedulings. Each solution is assigned to a unique
jobs in one year. We use a Fujitsu Primergy RX200 S7 integer identifier which is displayed on the plot’s x-
server to conduct the experiments. The server has the axis. It can be observed that the scheduling and the
configuration shown in Table 2. number of provisioned virtual machines play a crit-
ical role in the system’s throughput. The difference
Table 2 Server used for experiments - specifications between the worst and the best solutions is approxi-
mately 2430 jobs per year.
CPU 1, CPU 2 Intel Xeon E5-2650
Due to the solution space not being too big, we
(8 cores and 16 threads)
adopted a small population of ten chromosomes for
Memory (10 banks of) DIMM DDR3
testing the genetic algorithm in this case study. We
1600 MHz, 4GB
configured the algorithm to keep two elite chromo-
Operating system Debian 7,
somes from the previous population in each itera-
Linux kernel version 4.9
tion and employed a mutation probability of 0.05.
JVM Oracle JDK version 1.7
Figure 16 shows the population’s average and highest
764 D. Oliveira et al.

3500
Number of processed jobs per year
2500
1500
500
0

0 2000 4000 6000 8000

Solution number

Fig. 15 Scatter plot of all solutions evaluated by brute force

fitness values for each generation. A horizontal line

represents the fitness value for the global optimum at
the top of the plot. It is possible to observe that the
increase in the average fitness value after each genera-
tion leads to the discovery of a new elite chromosome
after the fifth generation. In the tenth generation, the
average fitness gets closer to the best fitness.
To analyze the impact of failures on the number
of completed jobs per year, we made a sensitivity
analysis on the mean time to failure for physical and
virtual machines. Each parameter ranges from 20% to
200% of the base value shown in Table 3. Figure 17
Number of processed jobs per year
3000
2500
2000

Global maximum
Best chromossome
Population average

1 2 3 4 5 6 7 8 9 10
Generation
Fig. 16 Average and max fitness value (number of processed Fig. 17 Sensitivity analysis - one factor at time (95% confi-
jobs per year) of each generation dence interval)
Performability Evaluation and Optimization of Workflow Applications in Cloud Environments 765

presents the results of the sensitivity analysis, consid- Table 4 Model/configuration parameters - second case study
ering the best scheduling found by brute force. For
Parameter Value
the collected metrics, we indicate the 95% confidence
intervals alongside the average values. For each point Number of workers 50
on the plots, we obtained 500 samples from the simu- VMs per worker 4
lation model. Figure 17a and b show the impact of the Number of replications for the simulation 10
hardware and virtual machine MTTFs on the number Generations 25
of processed jobs per year and Fig. 17c and d show Population size 40
the sensitivity analysis of the failure ratio of jobs. The Number of elite chromosomes 3
analysis reveals that the system throughput and job
failure ratio are sensitive to VM failures. It also can be
noticed that as the hardware/virtual machine MTTFs
increase, the differences between adjacent points in to the previous case study, we noticed a more accen-
the plots become less pronounced and the confidence tuated non-monotonic growth in the average fitness
intervals overlap. value (i.e., the average fitness value for the ith gen-
eration being smaller than the value for the (i − 1)th
6.2 Optimization of a LIGO Workflow Application generation). However, the algorithm can increase both
the average and maximum fitness value in the long
In the second case study, we used two scientific term.
workflows: the LIGO Inspiral Analysis workflow [9] The presented case studies confirm the ability of
(Fig. 18a) and a randomly generated DAG (Fig. 18b). the proposed simulation-based optimization method
The LIGO workflow was created with the Pegasus in solving the workflow scheduling problem from
Workflow Generator available in [26]. This workflow a performability viewpoint. Our method enables the
generator creates synthetic workflows based on traces optimization process to treat aspects that would be
collected from real-world scientific workflows. The impossible to capture with a deterministic function,
random DAG was created by an ad-hoc algorithm. namely:
Computing and communication times for the random
DAG were generated using a Uniform distribution – Modeling non-deterministic and non-exponential
with the interval from [1 min, 20 min] and [1min, 10 computation/communication times;
min], respectively. – Capturing the failure relationships between
Table 4 shows the updated parameters for the sec- servers and virtual machines;
ond case study. Unfortunately, increasing too much – Modeling the provisioning of cloud resources to
the cloud scale leads to a huge computational effort to multiple users concurrently;
solve the simulation model. The reason for this limita- – Representing the influence of the cloud controller
tion is the presence of stiffness on performability mod- on the overall system’s performance.
els. For capturing failure events in the performance Using such complex non-deterministic objective
model, it is necessary to employ a long simulation function (a discrete simulation model) did not cause
run (larger than a year). The high number of events the optimization algorithm to misbehave. The results
to be processed in a single simulation run leads to a for the LIGO and random workflows show the
great simulation runtime. SRIP (Single Replication in effectiveness of the generic operators (mutation and
Parallel) techniques can be used to simulate a larger crossover) in avoiding getting stuck at a local
cloud infrastructure, i.e., a cloud having hundreds or maximum. New elite chromosomes were found multi-
thousands of physical servers. ple times in both scenarios.
Figure 19a and b summarize the results of the
genetic algorithm. They are displayed as boxplots for 6.2.1 Performance and Reliability Analysis
each generation produced by the optimization algo-
rithm. Since we are using elitism, the best solution For evaluating the impact of failures on the second
(represented by the top horizontal bar in each box plot) case study, we performed a sensitivity analysis on the
is kept until better solutions are obtained. In contrast effect of hardware and VM failures in the adopted
766 D. Oliveira et al.

Fig. 18 DAGs for second

case study

workflows. In this study, we consider the best schedul- a large MTTF, however, the difference between the
ing obtained with the optimization method. The results mean number of processed jobs is minimal, and the
are shown in Fig. 20. We confirm the same pattern confidence intervals overlap. These results indicate
visualized in the previous section for the small DAG: a negligible impact of the cloud-manager on system
the impact of hardware and VM failures diminishes throughput when this subsystem is highly-available.
as the reliability of these components reach a certain
level. 6.2.2 Random Makespan Kernel Density Estimation
Figure 21 shows the impact of failures on the cloud (LIGO workflow)
manager in the system throughput. It also allows us to
evaluate the effectiveness of the warm-standby redun- Considering non-deterministic communication and
dancy mechanism when contrasted with a single node computation times for a workflow scheduling algo-
cloud manager (without redundancy). Figure 21 indi- rithm means that the makespan will be defined by a
cates that for a small MTTF for a server node, there is random variable instead of being a fixed value. We
a substantial increase in the number of processed jobs performed a kernel density estimation for the random
per year when using a redundant cloud manager. For makespan of the best scheduling found for the LIGO
Performability Evaluation and Optimization of Workflow Applications in Cloud Environments 767

Fig. 19 Fitness values for each generation

workflow. Figure 22 shows kernel density plot for the

following scenarios: i) normally distributed times with
standard deviation being equals to 10, 20, and 50%
of the mean, respectively; ii) exponentially distributed
times; and iii) deterministic times. The expected value
for all distributions is equal to the value considered in
the deterministic scenario.
It can be observed that increasing the variance
of individual communication/computation times in
the DAG increases the expected makespan. The Fig. 20 Sensitivity analysis - one factor at time (95% confi-
exponential distribution presents a high dispersion dence interval)
768 D. Oliveira et al.

and an expected value distant from the determin-

istic makespan. We, therefore, conclude that using
Markov-chain based methods for analyzing work-
flow applications may not be a good choice since
they assume exponentially distributed times. Phase-
type distributions can be used to approximate non-
exponential times, but using them can substantially
increase the number of modeled states and causing the
space-station explosion problem.

7 Conclusions

This paper presented a simulation-based optimiza-

tion method for scheduling of scientific workflows in
cloud systems that considers system performability as
the objective function. Our method employs the auto-
matic generation of performability models for a cloud
application that takes as input the workflow descrip-
tion (as a DAG) and the infrastructure configuration.
Our evaluation shows that the genetic algorithm is
efficient in optimizing both the number of virtual
machines and the scheduling of the tasks concerning
the system’s throughput.
In future works, we intend to deal with hetero-
geneity by considering virtual machines with differ-
ent computational capabilities. Furthermore, we are
Fig. 21 Influence of cloud manager failures on system through-
put (95% confidence interval) interested in implementing fault tolerance schemes
such as checkpointing and replication in our sim-
ulation model. For evaluating more massive infras-
tructures/applications, we plan to implement SRIP
parallelism in our simulation engine.

Publisher’s Note Springer Nature remains neutral with

regard to jurisdictional claims in published maps and institu-
tional affiliations.

References

1. Alwabel, A., Walters, R., Wills, G.: Desktopcloudsim: Sim-

ulation of node failures in the cloud. In: International Con-
ference on Cloud Computing, GRIDs, and Virtualization, p.
29 (2015)
2. Ando, E., Nakata, T., Yamashita, M.: Approximating the
longest path length of a stochastic dag by a normal distri-
bution in linear time. J. Discrete Algoritms 7(4), 420–438
(2009)
3. Arabnejad, H., Barbosa, J.G.: A budget constrained
Fig. 22 Kernel density plot for Makespan of LIGO workflow scheduling algorithm for workflow applications. J. Grid
(hours) Comput. 12(4), 665–679 (2014)
Performability Evaluation and Optimization of Workflow Applications in Cloud Environments 769

4. Bianchi, L., Dorigo, M., Gambardella, L.M., Gutjahr, W.J.: on Dependable Computing (PRDC), pp. 125–132. IEEE
A survey on metaheuristics for stochastic combinatorial (2010)
optimization. Nat. Comput. 8(2), 239–287 (2009) 20. Goldberg, D.E., Lingle, R., et al.: Alleles, loci, and the trav-
5. Bitam, S.: Bees life algorithm for job scheduling in cloud eling salesman problem. In: Proceedings of an International
computing. In: Proceedings of the Third International Con- Conference on Genetic Algorithms and their Applications,
ference on Communications and Information Technology, vol. 154, pp. 154–159. Lawrence Erlbaum, Hillsdale (1985)
pp. 186–191 (2012) 21. Gorissen, D., Couckuyt, I., Demeester, P., Dhaene, T.,
6. Blum, C., Roli, A.: Metaheuristics in combinatorial opti- Crombecq, K.: A surrogate modeling and adaptive sam-
mization: overview and conceptual comparison. ACM pling toolbox for computer based design. J. Mach. Learn.
Comput. Surv. (CSUR) 35(3), 268–308 (2003) Res. 11, 2051–2055 (2010)
7. Bolch, G., Greiner, S., de Meer, H., Trivedi, K.S.: Queueing 22. Gu, J., Hu, J., Zhao, T., Sun, G.: A new resource schedul-
Networks and Markov Chains: Modeling and Performance ing strategy based on genetic algorithm in cloud computing
Evaluation with Computer Science Applications. Wiley, environment. J. Comput. 7(1), 42–52 (2012)
Hoboken (2006) 23. Guimarães, A.P., Maciel, P.R., Matias, R.: An analyti-
8. Book, R.V. et al.: Michael r. garey and david s. john- cal modeling framework to evaluate converged networks
son, computers and intractability: a guide to the theory of through business-oriented metrics. Reliab. Eng. Syst. Saf.
np-completeness. Bulletin (New Series) of the American 118, 81–92 (2013)
Mathematical Society 3(2), 898–904 (1980) 24. Hamby, D.: A review of techniques for parameter sensi-
9. Brown, D.A., Brady, P.R., Dietz, A., Cao, J., Johnson, B., tivity analysis of environmental models. Environ. Monit.
McNabb, J.: A case study on the use of workflow tech- Assess. 32(2), 135–154 (1994)
nologies for scientific analysis: gravitational wave data 25. Hoffa, C., Mehta, G., Freeman, T., Deelman, E., Keahey,
analysis. In: Workflows for E-Science, pp. 39–59. Springer K., Berriman, B., Good, J.: On the use of cloud comput-
(2007) ing for scientific workflows. In: 2008. Escience’08. IEEE
10. Bux, M., Leser, U.: Dynamiccloudsim: Simulating hetero- Fourth International Conference on Escience, pp. 640–645.
geneity in computational clouds. Futur. Gener. Comput. IEEE (2008)
Syst. 46, 85–99 (2015) 26. Juve, G., Bharathi, S.: Pegasus synthetic workflow gen-
11. Cai, Z., Li, Q., Li, X.: Elasticsim: a toolkit for simulat- erator. https://confluence.pegasus.isi.edu/display/pegasus/
ing workflows with cloud resource runtime auto-scaling WorkflowGenerator (2014)
and stochastic task execution times. J. Grid Comput. 15(2), 27. Juve, G., Deelman, E., Vahi, K., Mehta, G., Berriman, B.,
257–272 (2017) Berman, B.P., Maechling, P.: Scientific workflow appli-
12. Cai, Z., Li, X., Ruiz, R., Li, Q.: A delay-based dynamic cations on amazon Ec2. In: 2009 5th IEEE International Con-
scheduling algorithm for bag-of-task workflows with ference on E-Science Workshops, pp. 59–66. IEEE (2009)
stochastic task execution times in clouds. Futur. Gener. 28. Kim, D.S., Machida, F., Trivedi, K.S.: Availability mod-
Comput. Syst. 71, 57–72 (2017) eling and analysis of a virtualized system. In: 2009.
13. Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A., PRDC’09. 15th IEEE Pacific Rim International Sym-
Buyya, R.: Cloudsim: a toolkit for modeling and simula- posium on Dependable Computing, pp. 365–371. IEEE
tion of cloud computing environments and evaluation of (2009)
resource provisioning algorithms. Softw. Pract. Exp. 41(1), 29. Kliazovich, D., Pecero, J.E., Tchernykh, A., Bou-
23–50 (2011) vry, P., Khan, S.U., Zomaya, A.Y.: Ca-dag: Modeling
14. Chen, W., Deelman, E.: Workflowsim: a toolkit for sim- communication-aware applications for scheduling in cloud
ulating scientific workflows in distributed environments. computing. J. Grid Comput. 14(1), 23–39 (2016)
In: 2012 IEEE 8th International Conference on E-Science 30. Kohne, A., Spohr, M., Nagel, L., Spinczyk, O.: Federat-
(E-Science), pp. 1–8. IEEE (2012) edcloudsim: a sla-aware federated cloud simulation frame-
15. Chen, W.N., Zhang, J.: Ant colony optimization for soft- work. In: Proceedings of the 2nd International Workshop
ware project scheduling and staffing with an event-based on CrossCloud Systems, pp. 3. ACM (2014)
scheduler. IEEE Trans. Softw. Eng. 39(1), 1–17 (2013) 31. LD, D.B., Krishna, P.V.: Honey bee behavior inspired load
16. Davis, N.A., Rezgui, A., Soliman, H., Manzanares, S., balancing of tasks in cloud computing environments. Appl.
Coates, M.: Failuresim: a system for predicting hardware Soft Comput. 13(5), 2292–2303 (2013)
failures in cloud data centers using neural networks. In: 32. Lin, W., Wu, W., Wang, J.Z.: A heuristic task scheduling
2017 IEEE 10th International Conference on Cloud Com- algorithm for heterogeneous virtual clusters. Sci. Program.
puting (CLOUD), pp. 544–551. IEEE (2017) 2016, Article ID 7040276 (2016)
17. Entezari-Maleki, R., Trivedi, K.S., Sousa, L., Movaghar, 33. Maciel, P., Matos, R., Silva, B., Figueiredo, J., Oliveira,
A.: Performability-based workflow scheduling in grids. The D., Fé, I., Maciel, R., Dantas, J.: Mercury: performance
Computer Journal (2018) and dependability evaluation of systems with exponential,
18. Ever, E.: Performability analysis of cloud computing cen- expolynomial, and general distributions. In: 2017 IEEE
ters with large numbers of servers. J. Supercomput. 73(5), 22Nd Pacific Rim International Symposium on Dependable
2130–2156 (2017) Computing (PRDC), pp. 50–57. IEEE (2017)
19. Ghosh, R., Trivedi, K.S., Naik, V.K., Kim, D.S.: End-To- 34. Mainkar, V., Trivedi, K.S.: Sufficient conditions for exis-
End performability analysis for infrastructure-as-a-service tence of a fixed point in stochastic reward net-based
cloud: an interacting stochastic models approach. In: iterative models. IEEE Trans. Softw. Eng. 22(9), 640–653
2010 IEEE 16th Pacific Rim International Symposium (1996)
770 D. Oliveira et al.

35. Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Algo- 52. Tawfeek, M.A., El-Sisi, A., Keshk, A.E., Torkey, F.A.:
rithms for cost-and deadline-constrained provisioning for Cloud task scheduling based on ant colony optimization. In:
scientific workflow ensembles in iaas clouds. Futur. Gener. 2013 8th International Conference on Computer Engineer-
Comput. Syst. 48, 1–18 (2015) ing & Systems (ICCES), pp. 64–69. IEEE (2013)
36. Meyer, J.F.: On evaluating the performability of degradable 53. Tsai, C.W., Rodrigues, J.J.: Metaheuristic scheduling for
computing systems. IEEE Trans. Comput. C-29(8), 720– cloud: a survey. IEEE Syst. J. 8(1), 279–291 (2014)
731 (1980) 54. Vinay, K., Kumar, S.D.: Fault-tolerant scheduling for sci-
37. Mezmaz, M., Melab, N., Kessaci, Y., Lee, Y.C., Talbi, entific workflows in cloud environments. In: 2017 IEEE 7th
E.G., Zomaya, A.Y., Tuyttens, D.: A parallel bi-objective International Advance Computing Conference (IACC), pp.
hybrid metaheuristic for energy-aware scheduling for cloud 150–155. IEEE (2017)
computing systems. J. Parallel Distrib. Comput. 71(11), 55. Vöckler, J.S., Juve, G., Deelman, E., Rynge, M., Berri-
1497–1508 (2011) man, B.: Experiences using cloud computing for a scientific
38. Molloy, M.K.: Performance analysis using stochastic petri workflow application, In: Proceedings of the 2nd Inter-
nets. IEEE Trans. Comput. 31(9), 913–917 (1982) national Workshop on Scientific Cloud Computing, pp.
39. Nelder, J.A., Mead, R.: A simplex method for function 15–24. ACM (2011)
minimization. Comput. J. 7(4), 308–313 (1965) 56. Wang, J., Bao, W., Zhu, X., Yang, L.T., Xiang, Y.: Fes-
40. Oliveira, D., Matos, R., Dantas, J., Ferreira, J., Silva, B., tal: fault-tolerant elastic scheduling algorithm for real-time
Callou, G., Maciel, P., Brinkmann, A.: Advanced stochas- tasks in virtualized clouds. IEEE Trans. Comput. 64(9),
tic petri net modeling with the mercury scripting language. 2545–2558 (2015)
In: ValueTools 2017, 11th EAI International Conference on 57. Wang, T., Chang, X., Liu, B.: Performability analysis for
Performance Evaluation Methodologies and Tools. Venice, iaas cloud data center. In: 2016 17th International Confer-
Italy. Elsevier (2017) ence on Parallel and Distributed Computing, Applications
41. Panda, S.K., Jana, P.K.: Efficient task scheduling algo- and Technologies (PDCAT), pp. 91–94. IEEE (2016)
rithms for heterogeneous multi-cloud environment. J. 58. Xia, Y., Zhou, M., Luo, X., Zhu, Q., Li, J., Huang,
Supercomput. 71(4), 1505–1533 (2015) Y.: Stochastic modeling and quality evaluation of
42. Plateau, B., Atif, K.: Stochastic automata network of mod- infrastructure-as-a-service clouds. IEEE Trans. Autom.
eling parallel systems. IEEE Trans. Softw. Eng. 17(10), Sci. Eng. 12(1), 162–170 (2015)
1093–1108 (1991) 59. Xu, Y., Li, K., He, L., Zhang, L., Li, K.: A hybrid chemical
43. Qiu, X., Sun, P., Guo, X., Xiang, Y.: Performability anal- reaction optimization scheme for task scheduling on hetero-
ysis of a cloud system. In: 2015 IEEE 34th International geneous computing systems. IEEE Trans. Parallel Distrib.
Performance Computing and Communications Conference Syst. 26(12), 3208–3222 (2015)
(IPCCC), pp. 1–6. IEEE (2015) 60. Zhao, C., Zhang, S., Liu, Q., Xie, J., Hu, J.: Indepen-
44. Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., dent tasks scheduling based on genetic algorithm in cloud
Vaidyanathan, R., Tucker, P.K.: Surrogate-based analysis computing. In: 2009. Wicom’09. 5th International Confer-
and optimization. Prog. Aerosp. Sci. 41(1), 1–28 (2005) ence on Wireless Communications, Networking and Mobile
45. Raei, H., Yazdani, N.: Performability analysis of cloudlet Computing, pp. 1–4. IEEE (2009)
in mobile cloud computing. Inform. Sci. 388, 99–117 61. Zhao, H.W., Tian, L.W.: Resource schedule algorithm
(2017) based on artificial fish swarm in cloud computing environ-
46. Ramakrishnan, L., Reed, D.A.: Performability modeling for ment. In: Applied Mechanics and Materials, vol. 635, pp.
scheduling and fault tolerance strategies for scientific work- 1614–1617. Trans Tech Publ (2014)
flows. In: Proceedings of the 17th International Symposium 62. Zheng, W., Sakellariou, R.: Stochastic dag scheduling using
on High Performance Distributed Computing, pp. 23–34. a monte carlo approach. J. Parallel Distrib. Comput. 73(12),
ACM (2008) 1673–1689 (2013)
47. Rimal, B.P., Maier, M.: Workflow scheduling in multi- 63. Zheng, W., Wang, C., Zhang, D.: A randomization approach
tenant cloud computing environments. IEEE Trans. Parallel for stochastic workflow scheduling in clouds. Sci. Program.
Distrib. Syst. 28(1), 290–304 (2017) 2016, Article ID 9136107 (2016)
48. Rodriguez, M.A., Buyya, R.: A taxonomy and survey on 64. Zheng, Z., Wang, R., Zhong, H., Zhang, X.: An approach
scheduling algorithms for scientific workflows in iaas cloud for cloud resource scheduling based on parallel genetic
computing environments. Concurr. Comput. Pract. Exp. algorithm. In: 2011 3rd International Conference on Com-
29(8), e4041 (2017) puter Research and Development (ICCRD), vol. 2, pp.
49. Sousa, E., Lins, F., Tavares, E., Cunha, P., Maciel, P.: A 444–447. IEEE (2011)
modeling approach for cloud infrastructure planning con- 65. Zhou, A., Wang, S., Sun, Q., Zou, H., Yang, F.: Ftcloudsim:
sidering dependability and cost requirements. IEEE Trans. a simulation tool for cloud service reliability enhancement
Syst. Man Cybern. Syst. Hum. 45(4), 549–558 (2015) mechanisms. In: Proceedings Demo & Poster Track of
50. Sousa, E., Lins, F., Tavares, E., Maciel, P.: Cloud infras- ACM/IFIP/USENIX International Middleware Conference,
tructure planning considering different redundancy mecha- p. 2. ACM (2013)
nisms. Computing 99(9), 841–864 (2017) 66. Zhu, X., Wang, J., Guo, H., Zhu, D., Yang, L.T., Liu, L.:
51. Swisher, J.R., Hyden, P.D., Jacobson, S.H., Schruben, Fault-tolerant scheduling for real-time scientific workflows
L.W.: A Survey of simulation optimization techniques and with elastic resource provisioning in virtualized clouds.
procedures. In: Simulation Conference, 2000. Proceedings. IEEE Trans. Parallel Distrib. Syst. 27(12), 3501–3517
Winter, vol. 1, pp. 119–128. IEEE (2000) (2016)

Computer-Controlled Systems: Theory and Design, Third Edition
From Everand
Computer-Controlled Systems: Theory and Design, Third Edition
Karl J Åström
3/5 (1)
A Novel Meta-Heuristic Approach For Load Balancing in Cloud Computing
No ratings yet
A Novel Meta-Heuristic Approach For Load Balancing in Cloud Computing
9 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
Information 10 00169
No ratings yet
Information 10 00169
18 pages
Hybrid Fault Tolerant Cost Aware Mechanism For Scientific Workflow in Cloud Computing
No ratings yet
Hybrid Fault Tolerant Cost Aware Mechanism For Scientific Workflow in Cloud Computing
11 pages
Cost Effective Genetic Algorithm For Workflow Scheduling in Cloud Under Deadline Constraint
No ratings yet
Cost Effective Genetic Algorithm For Workflow Scheduling in Cloud Under Deadline Constraint
18 pages
Cloudcomputing 14
No ratings yet
Cloudcomputing 14
15 pages
Workflow Scheduling in Clouds Using Pareto Dominance For Makespan Cost and Energy
No ratings yet
Workflow Scheduling in Clouds Using Pareto Dominance For Makespan Cost and Energy
6 pages
Multi-Objective Workflow Scheduling in Cloud System Based On Cooperative Multi-Swarm Optimization Algorithm
No ratings yet
Multi-Objective Workflow Scheduling in Cloud System Based On Cooperative Multi-Swarm Optimization Algorithm
13 pages
Workflow Management System in Public Clouds Using Time Limits
No ratings yet
Workflow Management System in Public Clouds Using Time Limits
4 pages
Aravind PDF
No ratings yet
Aravind PDF
29 pages
A Heuristics-Based Cost Model For Scientific Workflow Scheduling in Cloud
No ratings yet
A Heuristics-Based Cost Model For Scientific Workflow Scheduling in Cloud
19 pages
Cost and Performance Aware Scheduling Technique For Cloud Computing Environment
No ratings yet
Cost and Performance Aware Scheduling Technique For Cloud Computing Environment
11 pages
127 PDF
No ratings yet
127 PDF
5 pages
Mathematics 11 02126 v2
No ratings yet
Mathematics 11 02126 v2
18 pages
A Set-Based Discrete PSO For Cloud Workflow Scheduling With User-Defined QoS Constraints
No ratings yet
A Set-Based Discrete PSO For Cloud Workflow Scheduling With User-Defined QoS Constraints
6 pages
Future Generation Computer Systems: Anubhav Choudhary Indrajeet Gupta Vishakha Singh Prasanta K. Jana
No ratings yet
Future Generation Computer Systems: Anubhav Choudhary Indrajeet Gupta Vishakha Singh Prasanta K. Jana
13 pages
Metaheuristics For Scheduling of Heterogeneous Tasks in - 2021 - Simulation Mode
No ratings yet
Metaheuristics For Scheduling of Heterogeneous Tasks in - 2021 - Simulation Mode
23 pages
A Survey of Modern Scientific Workflow Scheduling Algorithms and Systems in The Era of Big Data
No ratings yet
A Survey of Modern Scientific Workflow Scheduling Algorithms and Systems in The Era of Big Data
10 pages
Mathematics 11 03563 v2
No ratings yet
Mathematics 11 03563 v2
28 pages
Precise Makespan Optimization Via Hybrid Genetic Algorithm For Scientific Workflow Scheduling Problem
No ratings yet
Precise Makespan Optimization Via Hybrid Genetic Algorithm For Scientific Workflow Scheduling Problem
16 pages
Task Scheduling in The Cloud Environments Based On An Artificial Bee Colony Algorithm
No ratings yet
Task Scheduling in The Cloud Environments Based On An Artificial Bee Colony Algorithm
7 pages
Genetically-Modified Multi-Objective Particle Swarm Optimization Approach For High-Performance Computing Workflow Scheduling
No ratings yet
Genetically-Modified Multi-Objective Particle Swarm Optimization Approach For High-Performance Computing Workflow Scheduling
15 pages
A Multi-Core Makespan Model For Parallel Scientific Workflow Execution in Cloud Computational Framework
No ratings yet
A Multi-Core Makespan Model For Parallel Scientific Workflow Execution in Cloud Computational Framework
9 pages
Prediction-Based Scheduling Techniques For Cloud Data Center's Workload: A Systematic Review
No ratings yet
Prediction-Based Scheduling Techniques For Cloud Data Center's Workload: A Systematic Review
27 pages
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
From Everand
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
A Survey On Multiple Workflow Scheduling Algorithms in Cloud Environment
No ratings yet
A Survey On Multiple Workflow Scheduling Algorithms in Cloud Environment
9 pages
ProjPaper (1) Edited-1
No ratings yet
ProjPaper (1) Edited-1
6 pages
1 s2.0 S1569190X23001417 Main
No ratings yet
1 s2.0 S1569190X23001417 Main
20 pages
10.1007@s10586 020 03075 5
No ratings yet
10.1007@s10586 020 03075 5
19 pages
StarPU: Parallel Computing and Task Scheduling Techniques
From Everand
StarPU: Parallel Computing and Task Scheduling Techniques
Richard Johnson
No ratings yet
Paper 1
No ratings yet
Paper 1
9 pages
J. Parallel Distrib. Comput.: Jiayin Li Meikang Qiu Zhong Ming Gang Quan Xiao Qin Zonghua Gu
No ratings yet
J. Parallel Distrib. Comput.: Jiayin Li Meikang Qiu Zhong Ming Gang Quan Xiao Qin Zonghua Gu
12 pages
Machine Learning Approach To Select Optimal Task Scheduling Algorithm in Cloud
No ratings yet
Machine Learning Approach To Select Optimal Task Scheduling Algorithm in Cloud
16 pages
G7 Fugu
No ratings yet
G7 Fugu
12 pages
Optimization - Algorithms - For - Efficient - Workflow - Scheduling - in - IaaS - Cloud
No ratings yet
Optimization - Algorithms - For - Efficient - Workflow - Scheduling - in - IaaS - Cloud
6 pages
Efficient Fault Tolerant Cost Optimized Approach For Scientific Workflow Via Optimal Replication Technique Within Cloud Computing Ecosystem
No ratings yet
Efficient Fault Tolerant Cost Optimized Approach For Scientific Workflow Via Optimal Replication Technique Within Cloud Computing Ecosystem
11 pages
Mainframe Modernization with DevOps Mastery: Mainframes
From Everand
Mainframe Modernization with DevOps Mastery: Mainframes
Ricardo Nuqui
No ratings yet
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
RP Read
No ratings yet
RP Read
14 pages
Task Scheduling Research Paper
No ratings yet
Task Scheduling Research Paper
9 pages
Trust Based Meta-Heuristics Workflow Scheduling in
No ratings yet
Trust Based Meta-Heuristics Workflow Scheduling in
12 pages
Nebojsa Bacanin Dzakula, Miodrag Zivkovic, Timea Bezdan, K Venkatachalam, Mohamed Abouhawwash. 2022 (8673)
No ratings yet
Nebojsa Bacanin Dzakula, Miodrag Zivkovic, Timea Bezdan, K Venkatachalam, Mohamed Abouhawwash. 2022 (8673)
26 pages
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
From Everand
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Genetic-Based Multi-Criteria Workflow Scheduling With Dynamic Resource Provisioning in Hybrid Large Scale Distributed Systems
No ratings yet
Genetic-Based Multi-Criteria Workflow Scheduling With Dynamic Resource Provisioning in Hybrid Large Scale Distributed Systems
12 pages
Task Scdeuling Algorithm
No ratings yet
Task Scdeuling Algorithm
8 pages
Multi Objective Task Scheduling Using Hybrid Whale Genetic
No ratings yet
Multi Objective Task Scheduling Using Hybrid Whale Genetic
27 pages
OneFlow for Parallel and Distributed Deep Learning Systems: The Complete Guide for Developers and Engineers
From Everand
OneFlow for Parallel and Distributed Deep Learning Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Workflow
No ratings yet
Workflow
10 pages
1 s2.0 S2667345223000445 Main
No ratings yet
1 s2.0 S2667345223000445 Main
8 pages
Job Scheduling
No ratings yet
Job Scheduling
45 pages
Dancing on a Cloud: A Framework for Increasing Business Agility
From Everand
Dancing on a Cloud: A Framework for Increasing Business Agility
David Sterling
No ratings yet
Proposal 1
No ratings yet
Proposal 1
10 pages
Mainframe to Cloud Mastery: Best Practices: Mainframes
From Everand
Mainframe to Cloud Mastery: Best Practices: Mainframes
Ricardo Nuqui
No ratings yet
Notes in Operations Research
From Everand
Notes in Operations Research
Rahul Basu
5/5 (1)
1
No ratings yet
1
2 pages
Elvina Fatma Widiyani - 20809334061 - UTS Review Jurnal
No ratings yet
Elvina Fatma Widiyani - 20809334061 - UTS Review Jurnal
55 pages
AI-Driven Web Apps: Practical Machine Learning for Software Developers
From Everand
AI-Driven Web Apps: Practical Machine Learning for Software Developers
Sivaramarajalu Ramadurai Venkataraajalu
No ratings yet
(2011) (12) - Performability Modeling of Electronic Funds Transfer Systems
No ratings yet
(2011) (12) - Performability Modeling of Electronic Funds Transfer Systems
20 pages
(2014) (28) - Performability Evaluation of Emergency Call Center
No ratings yet
(2014) (28) - Performability Evaluation of Emergency Call Center
16 pages
Model Based Performability and Dependability Evaluation of A System With VM Migration As Rejuvenation in The Presence of Bursty Workloads
No ratings yet
Model Based Performability and Dependability Evaluation of A System With VM Migration As Rejuvenation in The Presence of Bursty Workloads
33 pages
(2015) (01) - A Performability Approach For Evaluating The Impact of Risks in Software Development
No ratings yet
(2015) (01) - A Performability Approach For Evaluating The Impact of Risks in Software Development
29 pages
CSO303 Week2 H AI IA
No ratings yet
CSO303 Week2 H AI IA
80 pages
Swot Template Thomason
No ratings yet
Swot Template Thomason
16 pages
Number Theory Handout 2: 1 Problems
No ratings yet
Number Theory Handout 2: 1 Problems
2 pages
Bcac-101 - Arka Dey
No ratings yet
Bcac-101 - Arka Dey
26 pages
05-IP Addressing
No ratings yet
05-IP Addressing
13 pages
Unit I - MMD - Lecture NoteStu
No ratings yet
Unit I - MMD - Lecture NoteStu
10 pages
Mark Min
No ratings yet
Mark Min
6 pages
Practical No. - 1
No ratings yet
Practical No. - 1
55 pages
Parallel Query Processing in PostgreSQL
No ratings yet
Parallel Query Processing in PostgreSQL
15 pages
CSE-113: Structured Programming Language CSE-114: Structured Programming Language Lab
No ratings yet
CSE-113: Structured Programming Language CSE-114: Structured Programming Language Lab
21 pages
Home and Building Automation Systems
No ratings yet
Home and Building Automation Systems
54 pages
STUDY NOTES TTL 100 Prelims - Unit 1
No ratings yet
STUDY NOTES TTL 100 Prelims - Unit 1
8 pages
Aec3012 4001
No ratings yet
Aec3012 4001
17 pages
A Tool For The Analysis of Chromosomes: Karyotype: Taxon June 2016
No ratings yet
A Tool For The Analysis of Chromosomes: Karyotype: Taxon June 2016
8 pages
Asha International Institute of Marine Technology: Refresher Training in Advanced Fire Fighting
No ratings yet
Asha International Institute of Marine Technology: Refresher Training in Advanced Fire Fighting
1 page
Peoplesoft Enterprise Recruiting Solutions 9.0
No ratings yet
Peoplesoft Enterprise Recruiting Solutions 9.0
68 pages
Auto Flight - FMS Management of Vertical Navigation
No ratings yet
Auto Flight - FMS Management of Vertical Navigation
11 pages
TCL Linkzone Mw63 Factsheet Final-1
No ratings yet
TCL Linkzone Mw63 Factsheet Final-1
2 pages
7HBW23 Programming QuickGuide
100% (1)
7HBW23 Programming QuickGuide
8 pages
Unit 3 - Let's Go 1
100% (1)
Unit 3 - Let's Go 1
17 pages
Lab 04 - Composition
No ratings yet
Lab 04 - Composition
3 pages
Cspo GEM
No ratings yet
Cspo GEM
3 pages
DLD Lab 7
No ratings yet
DLD Lab 7
9 pages
Laboratory Exercise 3
No ratings yet
Laboratory Exercise 3
3 pages
Detecting Outliers With Grubbs' Test
No ratings yet
Detecting Outliers With Grubbs' Test
8 pages
Fifth Generation: List Processing: LISP
No ratings yet
Fifth Generation: List Processing: LISP
7 pages
CM100 SpecificationEng
No ratings yet
CM100 SpecificationEng
3 pages
Datastage Interview Questions & Answers
No ratings yet
Datastage Interview Questions & Answers
8 pages
Solutions To COMP9334 Week 5 Sample Problems: Problem 1
No ratings yet
Solutions To COMP9334 Week 5 Sample Problems: Problem 1
8 pages
OPT B1plus Unit Test 11 Higher
No ratings yet
OPT B1plus Unit Test 11 Higher
6 pages

(2019) (23) - Performability Evaluation and Optimization of Workflow Applications in Cloud Environments

Uploaded by

(2019) (23) - Performability Evaluation and Optimization of Workflow Applications in Cloud Environments

Uploaded by

J Grid Computing (2019) 17:749–770

Performability Evaluation and Optimization of Workflow

Table 1 Comparison of the state of art for cloud workflow scheduling

Work Metric Multi-tenancy Non-deterministic times Failure model

Kliazovich et al. [29] Makespan No No Not considered

Fig. 1 Overview of the

Fig. 2 Genetic algorithm

Fig. 3 Chromosome representation Fig. 4 PMX crossover operator

the order of two tasks or changing a task to a differ-

5 Performability Model for the Fitness Function

In this section, we describe the simulation model used

Fig. 7 Modeled system

Fig. 8 Discrete event

In the simulation model, the service time of each job

– The tasks are scheduled to the same VM. In

5.2 Reliability Modeling

Failure of virtual machines and physical servers have

P {#primary up = 1 or #spare active up = 1},

which represents the steady state probability of hav-

Fig. 11 DAG and scheduling converted to an SPN model

Fig. 13 Availability model

Fig. 14 Simulation model - high level view

transitions can be configured to fire according to a Table 3 Model parameters

0 2000 4000 6000 8000

Fig. 15 Scatter plot of all solutions evaluated by brute force

fitness values for each generation. A horizontal line

Fig. 18 DAGs for second

Fig. 19 Fitness values for each generation

workflow. Figure 22 shows kernel density plot for the

and an expected value distant from the determin-

This paper presented a simulation-based optimiza-

Publisher’s Note Springer Nature remains neutral with

1. Alwabel, A., Walters, R., Wills, G.: Desktopcloudsim: Sim-

You might also like