0% found this document useful (0 votes)
61 views16 pages

AI-driven Prediction Based Energy-Aware Fault-Tolerant Scheduling Scheme (PEFS) For Cloud Data Center Abstract

This document proposes an AI-driven Prediction based Energy-aware Fault-tolerant Scheduling Scheme (PEFS) for cloud data centers. The scheme uses a deep neural network to predict task failure rates and classify tasks as failure-prone or not. It then develops different resource allocation and scheduling strategies for each task type to reduce both task failure and energy consumption. Specifically, it constructs "super tasks" for failure-prone tasks using a vector reconstruction method to replicate tasks across different hosts for fault tolerance while minimizing redundant execution. Evaluation results show the scheme achieves lower failure rates and energy consumption compared to existing approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views16 pages

AI-driven Prediction Based Energy-Aware Fault-Tolerant Scheduling Scheme (PEFS) For Cloud Data Center Abstract

This document proposes an AI-driven Prediction based Energy-aware Fault-tolerant Scheduling Scheme (PEFS) for cloud data centers. The scheme uses a deep neural network to predict task failure rates and classify tasks as failure-prone or not. It then develops different resource allocation and scheduling strategies for each task type to reduce both task failure and energy consumption. Specifically, it constructs "super tasks" for failure-prone tasks using a vector reconstruction method to replicate tasks across different hosts for fault tolerance while minimizing redundant execution. Evaluation results show the scheme achieves lower failure rates and energy consumption compared to existing approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

AI-driven Prediction based Energy-aware

Fault-tolerant Scheduling Scheme(PEFS) for Cloud


Data Center
ABSTRACT:
Cloud data centers (CDCs) have turn out to be more and more more famous and
considerable in current years with the developing popularity of cloud computing
and high-overall performance computing. Due to the multi-step computation of
facts streams and heterogeneous project dependencies, project failure often
occurs, ensuing in negative consumer enjoy and further electricity intake. To
lessen project execution failure in addition to electricity intake, we recommend
a singular AI-pushed electricity-conscious proactive fault-tolerant scheduling
scheme for CDCs on this paper. Firstly, a prediction version primarily based
totally at the system getting to know technique is skilled to categorise the
arriving obligations into “failure-susceptible obligations” and “non-failure-
susceptible obligations” in line with the expected failure rate. Then, green
scheduling mechanisms are proposed to allocate forms of obligations to the
maximum suitable hosts in a CDC. The vector reconstruction approach is
evolved to assemble first rate obligations from failure-susceptible obligations
and one at a time time table those first rate obligations and non-failure-
susceptible obligations to the maximum appropriate bodily host. All the
obligations are scheduled in an earliest-deadline-first manner. Our assessment
outcomes display that the proposed scheme can intelligently expect project
failure and achieves higher fault tolerance and decreases overall electricity
intake higher than the prevailing schemes. In modern data centers , energy
consumption accounts for a considerably large slice of operational expenses.
The state of the art in data center energy optimization is focusing only on job
distribution between computing servers based on workload or thermal profiles.
This paper underlines the role of communication fabric in data center energy
consumption and presents a scheduling approach that combines energy
efficiency and network awareness. The methodology balances the energy
consumption of a data center , individual job performance, and traffic demands.
The proposed approach optimizes the trade off between job consolidation (to
minimize the amount of computing servers) and distribution of traffic patterns
(to avoid hotspots in the data center network).
KEYWORDS: Cloud data centers, failure-susceptible obligations, non-failure-
susceptible obligations, Cloud computing, cloud data center, scheduling, fault-
tolerance, energy-efficiency, task failure, prediction, deep neural network
INTRODUCTION:
Cloud Data centers (CDCs) offer a extensive variety of centers to end-
customers in exceptional domain names along with fitness care, medical
computing, clever grid, e-commerce, and nuclear science . The challenge and
aid disasters are inevitable due to the developing wide variety of CDC
resources, which offer ICT infrastructures. However, dependable on-call for
resources are wanted for provider carriers to satisfy the provider level
agreement (SLA) of customers . Therefore, it's miles of great importance to
make sure the reliability and availability of such systems . Modeling and
inspecting dynamic fault-tolerant strategies for virtualized CDCs is challenging.
First, cloud packages are typically large-scale and include an substantial wide
variety of disbursed computing nodes. The complex shape and fault-tolerant
conduct of heterogeneous CDC packages are tough to describe. Second, CDC
packages dynamically regulate the digital machine (VM) configurations to
satisfy consumer requests which require multi-dimensional resources
(processor, memory, disk storage). The concurrency and uncertainty of requests
will boost the difficulty of version validation and verification with inside the
scheduling. Proactive fault-tolerant techniques are widely adopted in CDCs .
However, efficient use of proactive fault tolerance relies heavily on the prior
knowledge of the failed tasks. Generally, the task has failed due to over-
utilization or lack of availability of resources, hardware failures, required
libraries that are not installed properly, execution cost or execution time
exceeding the threshold value, system running out of memory/disk space, and
so on. Previous studies have adopted several fault-tolerance mechanisms, such
as load balancing , migration , load balancing , replication, retry , task
resubmission , etc. Recently, data have very complex structures and parameters.
Therefore, simple statistical approaches may not be able to handle the patterns
of complex data. Some studies have used a machine learning approach to
predict task failure, and they have not leveraged the prediction to facilitate task
scheduling, while the additional resources used in proactive fault-tolerant
schemes, will result in additional energy consumption, and very few fault-
tolerant studies have taken optimization of energy consumption into
consideration. In this paper, our aim is to predict a task failure according to the
requested resources before the actual failure occurs, and leverage the prediction
to design a task scheduling scheme, thus reducing the task execution failure and
total energy consumption. To this end, a novel AI-driven Prediction based
Energy-aware Fault-tolerant Scheduling scheme (PEFS) is proposed. The
scheme involves two stages: 1) prediction of task failure probability, and 2) task
scheduling. In the first stage, task parameters (involving the requested
resources, actually allocated resources, and whether failure occurred) are
gathered from historical data set (HDS). Then, all the task parameters are
inserted into Tensor Flow as inputs. Using a deep neural network (DNN)
approach, a model is trained to predict the failure rate of each arriving task. In
this way, all the arriving tasks can be classified into failure-prone tasks and non-
failure-prone tasks based on model outputs. In the second stage, a scheduling
algorithm is developed based on vector bin packing to schedule the two types of
tasks efficiently. The main difference between these two scheduling schemes is
that, for the failure-prone tasks, super tasks are firstly generated based on an
elegant vector reconstruction method for fault-tolerant purpose. A replication
strategy is applied to replicate only the fault prone tasks, then arranged into
super tasks. The elegant vector reconstruction method is designed to construct
super tasks to execute different copies of the tasks in different hosts. The super
tasks are uniquely constructed so that the sub-task will not be overlapped, and
redundant execution will be avoided. The main contributions of this work
include: A DNN model is proposed for the prediction of the possibility of
failure of incoming tasks, so that further scheduling strategy can be developed
based on the prediction .Different resource allocation and scheduling strategies
are developed for failure-prone and non-failure prone tasks for both reduction of
task failure and energy consumption. A unique fault-tolerant mechanism is
developed to schedule failure-prone tasks by constructing super tasks based on
the vector reconstruction method. Extensive experiments using the CloudSim2
toolkit have been conducted to evaluate our scheme. The results validate that
our scheme outperforms the state-of-the-art in terms of failure ratio, resource
utilization, and energy consumption. The remainder of the paper is organized as
follows: Related works are introduced in Section . The structure of the
Prediction based Energy-aware Fault-tolerant Scheduling scheme (PEFS)
system, as well as the models for resources, tasks, energy consumption, and
fault-tolerance, are described in Section . Resource allocation and task
scheduling schemes are developed in Section . The experimental setting is
introduced, and quantitative analysis of the correlations of different parameters
is presented, in Section . The results of the experiments are presented and
analyzed in Section . Finally, some concluding remarks are presented in

Section. Data centers are becoming increasingly popular for the provisioning
of computing resources. The cost and operational expenses of data centers have
skyrocketed with the increase in computing capacity [24].Energy consumption
is a growing concern for data
centersoperators. It is becoming one of the main entries on a datacenter
operational expenses (OPEX) bill . The Gartner Group estimates energy
consumptions to account for up to 10%of the current OPEX, and this estimate is
projected to rise to50% in the next few years .The slice of roughly 40% is
related to the energy consumed by information technology (IT) equipment ,
which includes energy consumed by the computing servers as well as data
center network hardware used for interconnection. In fact , about one-third of
the total IT energy is consumed by communication links, switching, and
aggregation elements ,while the remaining two-thirds are allocated to
computing servers. Other systems contributing to the data center energy
consumption are cooling and power distribution systems that account for 45%
and 15% of total energy consumption, respectively .The first data center energy
saving solutions operate don a distributed basis and focused on making the data
center hardware energy efficient. There are two popular techniques for power
savings in computing systems. The Dynamic Voltage and Frequency Scaling
(DVFS) technology. Finally, the selection of the optimal target PMs is modeled
as an optimization problem that is solved using an improved particle swarm
optimization algorithm. We evaluate our approach against five related
approaches in terms of the overall transmission overhead, overall network
resource consumption, and total execution time while executing a set of parallel
applications. Experimental results demonstrate the efficiency and effectiveness
of our approach.
1.1 PROBLEM DEFINITION:
The Prediction based Energy-aware Fault-tolerant Scheduling scheme (PEFS)
proposed in this paper involves two stages: 1) failure prediction and, 2) task
scheduling, as illustrated . The predictor is designed based on a DNN to train
and test the task in the HDS. This predictor is used to predict the probability of
task failure, and based on this probability, and tasks are classified into failure-
prone and non-failure-prone tasks. Failure-prone and non-failure prone tasks are
organized in a failure-prone task queue and non-failure task queue, respectively,
then these types of tasks are using the scheduling method, separately. The
power model is adopted to address the energy-saving concern of the CDC.
Similarly, the fault model is used to design a fault-tolerant mechanism for the
failure-prone tasks. The task failures may occur due to the unavailability of
resources, hardware failures, execution cost and time exceeding a threshold
value, system running out of memory or disk space, over-utilization of
resources, improper installation of required libraries, and so on. These faults can
be transient or permanent and are assumed to be independent. So, developing a
fault-tolerant scheduling scheme needs to guarantee the deadline of all the tasks
in the system that are met before fault occurs even under the worst-case
scenario. As we know, the replication strategy is widely used for fault tolerance,
which generally replicates tasks into two or more copies, then schedules to
different hosts. Therefore, there are more possibilities for wastage of resources
and an increase in the unusual energy consumption. Thus, in this paper, only
failure-prone tasks are replicated. First, three consecutive tasks are taken from
the failure-prone task queue, and each task is replicated into three copies. Then,
the vector reconstruction method is designed to reconstruct the super task from
replicate copies Reconstructed super tasks are mapped to the most suitable
hosts, allocated with resources, and then scheduled in different hosts, separately.
The sequence of replicate copies of tasks in super tasks is designed so that the
execution of different copies of the tasks in different hosts will not be
overlapped in order to avoid redundant execution.
1.2 SCOPE OF THE PROJECT
Firstly, a prediction model based on the machine learning approach is trained to
classify the arriving tasks into “failure-prone tasks” and “non-failure-prone
tasks” according to the predicted failure rate .Then, two efficient scheduling
mechanisms are proposed to allocate two types of tasks to the most appropriate
hosts in a CDC. The vector reconstruction method is developed to construct
super tasks from failure-prone tasks and separately schedule these super tasks
and non-failure-prone tasks to the most suitable physical host When multiple
task instances from different applications start to execute on numerous hosts,
some of the hosts may fail accidentally, resulting in a fault in the system. This
phenomenon is usually avoided by a fault-tolerance mechanism . Various
factors can lead to a host failure. In addition, a failure event usually stimulates
another fault event. These failures may include operating system crashes,
network partitions, hardware malfunctions, power outages, abrupt software
failures, etc
2. BACKGROUND
2.1 DOMAIN
2.2 SUB DOMAIN
3.LITERATURE SURVEY
1.TOPIC: Quantitative comparisons of the state-of-the-art data center
architectures
AUTHOR: K. Bilal, S. U. Khan, L. Zhang, H. Li, K. Hayat, S. A. Madani, N.
Min-Allah, L. Wang, D. Chen, M. Iqbal, C.-Z. Xu, and A. Y. Zomaya
YEAR: 2012
Data centers are experiencing a remarkable growth in the number of
interconnected servers. Being one of the foremost data center design concerns,
network infrastructure plays a pivotal role in the initial capital investment and
ascertaining the performance parameters for the data center. Legacy data center
network (DCN) infrastructure lacks the inherent capability to meet the data
centers growth trend and aggregate bandwidth demands. Deployment of even
the highest-end enterprise network equipment only delivers around 50% of the
aggregate bandwidth at the edge of network. The vital challenges faced by the
legacy DCN architecture trigger the need for new DCN architectures, to
accommodate the growing demands of the ‘cloud computing’ paradigm. We
have implemented and simulated the state of the art DCN models in this paper,
namely: (a) legacy DCN architecture, (b) switch-based, and (c) hybrid models,
and compared their effectiveness by monitoring the network: (a) throughput and
(b) average packet delay. The presented analysis may be perceived as a
background benchmarking study for the further research on the simulation and
implementation of the DCN-customized topologies and customized addressing
protocols in the large-scale data centers. We have performed extensive
simulations under various network traffic patterns to ascertain the strengths and
inadequacies of the different DCN architectures. Moreover, we provide a firm
foundation for further research and enhancement in DCN architectures. 
2.TOPIC:DENS: Data Center Energy-EfficientNetwork-Aware Scheduling
AUTHOR: D. Kliazovich, P. Bouvry, and S. U. Khan,
YEAR: 2013
 In modern data centers, energy consumption accountsfor a considerably large
slice of operational expenses. The state ofthe art in data center energy
optimization is focusing only on jobdistribution between computing servers
based on workload orthermal profiles. This paper underlines the role
ofcommunication fabric in data center energy consumption andpresents a
scheduling approach that combines energy efficiencyand network awareness,
termed DENS. The DENS methodologybalances the energy consumption of a
data center, individual jobperformance, and traffic demands. The proposed
approachoptimizes the tradeoff between job consolidation (to minimize
theamount of computing servers) and distribution of traffic patterns(to avoid
hotspots in the data center network)
3.TOPIC: Energy-efficient data centers
AUTHOR: J. Shuja, S. A. Madani, K. Bilal, K. Hayat, S. U. Khan, and S.
Sarwar
YEAR: 2012
Energy consumption of the Information and Communication Technology (ICT)
sector has grown exponentially in recent years. A major component of the
today’s ICT is constituted by the data centers which have experienced an
unprecedented growth in their size and population, recently. The Internet giants
like Google, IBM and Microsoft house large data centers for cloud computing
and application hosting. Many studies, on energy consumption of data centers,
point out to the need to evolve strategies for energy efficiency. Due to large-
scale carbon dioxide emissions, in the process of electricity production, the ICT
facilities are indirectly responsible for considerable amounts of green house gas
emissions. Heat generated by these densely populated data centers needs large
cooling units to keep temperatures within the operational range. These cooling
units, obviously, escalate the total energy consumption and have their own
carbon footprint. In this survey, we discuss various aspects of the energy
efficiency in data centers with the added emphasis on its motivation for data
centers. In addition, we discuss various research ideas, industry adopted
techniques and the issues that need our immediate attention in the context of
energy efficiency in data centers.
4.TOPIC: Using proactive fault-tolerance approach to enhance cloud service
reliability
AUTHOR: J. Liu, S. Wang, A. Zhou, S. Kumar, F. Yang, and R. Buyya
YEAR: 2018
The large-scale utilization of cloud computing services for hosting
industrial/enterprise applications has led to the emergence of cloud service
reliability as an important issue for both cloud service providers and users. To
enhance cloud service reliability, two types of fault tolerance schemes, reactive
and proactive, have been proposed. Existing schemes rarely consider the
problem of coordination among multiple virtual machines (VMs) that jointly
complete a parallel application. Without VM coordination, the parallel
application execution results will be incorrect. To overcome this problem, we
first propose an initial virtual cluster allocation algorithm according to the VM
characteristics to reduce the total network resource consumption and total
energy consumption in the data center. Then, we model CPU temperature to
anticipate a deteriorating physical machine (PM). We migrate VMs from a
detected deteriorating PM to some optimal PMs. Finally, the selection of the
optimal target PMs is modeled as an optimization problem that is solved using
an improved particle swarm optimization algorithm. We evaluate our approach
against five related approaches in terms of the overall transmission overhead,
overall network resource consumption, and total execution time while executing
a set of parallel applications. Experimental results demonstrate the efficiency
and effectiveness of our approach.
5.TOPIC: Probabilistic model for evaluating a proactive fault tolerance
approach in the cloud
AUTHOR: O. Hannache and M. Batouche
YEAR: 2015
Cloud computing is an emerging paradigm where computing services are
provided across the web. Virtualization powers the cloud by mutualizing
physical resources thus ensuring flexibility and high availability of the cloud.
Certainly fault tolerance like load balancing or advancement programming
security aim to foster availability but classic reactive fault tolerance techniques
prove to be greedy in terms of memory and recovery time. Elsewhere, proactive
fault tolerance is possible by preemptive virtual machine migration requiring a
strong and accurate failure predictor. In quest of an effective approach for
proactive fault tolerance we introduce in this paper a probabilistic model of the
cloud with a failure generator for evaluating a proposed approach based on three
scenarios of virtual machine migration.
6.TOPIC: Failover strategy for fault tolerance in cloud computing environment
AUTHOR: B. Mohammed, M. Kiran, M. Kabiru, and I.-U. Awan
YEAR: 2015
Cloud fault tolerance is an important issue in cloud computing platforms and
applications. In the event of an unexpected system failure or malfunction, a
robust fault-tolerant design may allow the cloud to continue functioning
correctly possibly at a reduced level instead of failing completely. To ensure
high availability of critical cloud services, the application execution, and
hardware performance, various fault-tolerant techniques exist for building self-
autonomous cloud systems. In comparison with current approaches, this paper
proposes a more robust and reliable architecture using optimal checkpointing
strategy to ensure high system availability and reduced system task service
finish time. Using pass rates and virtualized mechanisms, the proposed smart
failover strategy (SFS) scheme uses components such as cloud fault manager,
cloud controller, cloud load balancer, and a selection mechanism, providing
fault tolerance via redundancy, optimized selection, and checkpointing. In our
approach, the cloud fault manager repairs faults generated before the task time
deadline is reached, blocking unrecoverable faulty nodes as well as their virtual
nodes. This scheme is also able to remove temporary software faults from
recoverable faulty nodes, thereby making them available for future request. We
argue that the proposed SFS algorithm makes the system highly fault tolerant by
considering forward and backward recovery using diverse software tools.
Compared with existing approaches, preliminary experiment of the SFS
algorithm indicates an increase in pass rates and a consequent decrease in
failure rates, showing an overall good performance in task allocations. We
present these results using experimental validation tools with comparison with
other techniques, laying a foundation for a fully fault-tolerant infrastructure as a
service cloud environment.
7.TOPIC: Elastic reliability optimization through peer-to-peer checkpointing in
cloud computing
AUTHOR: J. Zhao, Y. Xiang, T. Lan, H. H. Huang, and S. Subramaniam
YEAR: 2017
Modern day data centers coordinate hundreds of thousands of heterogeneous
tasks and aim at delivering highly reliable cloud computing services. Although
offering equal reliability to all users benefits everyone at the same time, users
may find such an approach either inadequate or too expensive to fit their
individual requirements, which may vary dramatically. In this paper, we
propose a novel method for providing elastic reliability optimization in cloud
computing. Our scheme makes use of peer-to-peer checkpointing and allows
user reliability levels to be jointly optimized based on an assessment of their
individual requirements and total available resources in the data center. We
show that the joint optimization can be efficiently solved by a distributed
algorithm using dual decomposition. The solution improves resource utilization
and presents an additional source of revenue to data center operators. Our
validation results suggest a significant improvement of reliability over existing
schemes.
8. TOPIC: A comparative study into distributed load balancing algorithms for
cloud computing
AUTHOR: M. Randles, D. Lamb, and A. Taleb-Bendiab
YEAR: 2010
The anticipated uptake of Cloud computing, built on well-established research
in Web Services, networks, utility computing, distributed computing and
virtualisation, will bring many advantages in cost, flexibility and availability for
service users. These benefits are expected to further drive the demand for Cloud
services, increasing both the Cloud's customer base and the scale of Cloud
installations. This has implications for many technical issues in Service
Oriented Architectures and Internet of Services (IoS)-type applications;
including fault tolerance, high availability and scalability. Central to these
issues is the establishment of effective load balancing techniques. It is clear the
scale and complexity of these systems makes centralized assignment of jobs to
specific servers infeasible; requiring an effective distributed solution. This paper
investigates three possible distributed solutions proposed for load balancing;
approaches inspired by Honeybee Foraging Behaviour, Biased Random
Sampling and Active Clustering.
9. TOPIC: Energy-aware fault-tolerant dynamic task scheduling scheme for for
virtualized cloud data center
AUTHOR: A. Marahatta, Y.-S. Wang, F. Zhang, A. K. Sangaiah, S. K. Sah
Tyagi, and Z. Liu
YEAR: 2021
Resource scheduling is a challenging job in multi-cloud environments. The
multi-cloud technology attracted much research to work on it and look forward
to solving the problems of vendors lock-in, reliability, interoperability, etc.
The uncertainty in the multi-cloud environments with heterogeneous user
demands made it a challenging job to dispense the resources on demand of
the user. Researchers still focused on predicting efficient optimized resource
allocation management from the existing resource allocation policies in multi-
cloud environments. The research aims to provide a broad systematic literature
analysis of resource management in the area of multi-cloud environments. The
numbers of optimization techniques have been discussed among the open
issues and future challenges in consideration due to flexibility and reliability
in present environments. To analyses the literature work, it is necessary to cover
the existing homogenous/ heterogeneous user demands and cloud
applications, and algorithms to manage it in multi-clouds. In this paper, we
present the definition and classification of resource allocation techniques in
multi-clouds and generalized taxonomy for resource management in cloud
environments. In the last, we explore the open challenges and future
directions of resource management in a multi-cloud environment.
10.TOPIC: Network failure-aware redundant virtual machine placement in a
cloud data center
AUTHOR: A. Zhou, S. Wang, C.-H. Hsu, M. H. Kim, and K. S. Wong
YEAR: 2017
Cloud has become a very popular infrastructure for many smart city
applications. A growing number of smart city applications from all over the
world are deployed on the clouds. However, node failure events from the cloud
data center have negative impact on the performance of smart city applications.
Survivable virtual machine placement has been proposed by the researchers to
enhance the service reliability. Because of the ignorance of switch failure,
current survivable virtual machine placement approaches cannot achieve the
best effect. In this paper, we study to enhance the service reliability by
designing a novel network failure–aware redundant virtual machine placement
approach in a cloud data center. Firstly, we formulate the network failure–aware
redundant virtual machine placement problem as an integer nonlinear
programming problem and prove that the problem is NP-hard. Secondly, we
propose a heuristic algorithm to solve the problem. Finally, extensive simulation
results show the effectiveness of our algorithm.
4. SYSTEM ANALYSIS
4.1 EXISTING SYSTEM
Our evaluation results show that the proposed scheme can intelligently predict
task failure and achieves better fault tolerance and reduces total energy
consumption better than the existing schemes.
The existing fault-tolerant techniques in CDCs include replication, check-point,
job migration, retry, task resubmission, etc. Some studies introduced methods
based on certain principles, such as retry, resubmission, replication, renovation
of software, screening, and migration, to harmonize the fault-tolerant
mechanism with CDC task scheduling. However, for parallel and distributed
computing systems, the most widely adopted and acknowledged method is to
replicate data to multiple hosts. A rearrangement-based improved fault-tolerant
scheduling algorithm (RTFR) has been presented to deal with the dynamic
scheduling issue for tasks in cloud systems . A primary-backup model is
adopted to realize fault-tolerance in this method. The corresponding backup
copy will be released after the primary replica is completed, in order to release
the resource it occupies. In addition, the waiting tasks can be rearranged to
utilize the released resources. In contrast, after the task is sent to the waiting
queue of the virtual machine, the execution sequence is fixed and cannot be
changed.
In addition, the performance of the proposed scheduling scheme, i.e. PEFS, is
compared with some existing techniques, real-time fault-tolerant scheduling
algorithm with rearrangement (RFTR) , dynamic fault tolerant scheduling
mechanism (DFTS) and modified breadth first search (MBFS) as all of them
are designed for fault-tolerant scheduling .whereas, in most existing algorithms,
the executing sequence is settled after sending tasks to the waiting queue of a
VM. Experiments on Internet Data Set and Eular Data Set are conducted, and
the experimental results validate the merits of the proposed scheme in
comparison with existing techniques.

4.2 PROPOSED SYSTEM


In this paper, our aim is to predict a task failure according to the
requested resources before the actual failure occurs, and leverage the prediction
to design a task scheduling scheme, thus reducing the task execution failure and
total energy consumption. To this end, a novel AI-driven Prediction based
Energy-aware Fault-tolerant Scheduling scheme (PEFS) is proposed. the
proposed scheme. In the experiments, the predicted task failure is compared
with actual task failure to validate the failure prediction. In addition, the
performance of the proposed scheduling scheme, i.e. PEFS, is compared with
some existing techniques, real-time fault-tolerant scheduling algorithm with
rearrangement (RFTR) [19], dynamic fault tolerant scheduling mechanism
(DFTS) [20] and modified breadth first search (MBFS) [21] as all of them are
designed for fault-tolerant scheduling. The proposed PEFS reduces the task
failure ratio by approximately 23.529%, 18.75%, 16.13% and 29.73% relative
to those of PEFS0 RFTR, DFTS and MBFS, respectively on IDS, and 28.125%,
25.806%, 20.689% and 34.285% relative to those of PEFS0 RFTR, DFTS and
MBFS, respectively on EDS. the proposed scheme optimized energy
outstandingly. 0 0.15 0.3 0.45 0.6 0.75 0.9 10 20 30 40 Total Energy
Consumption (KWh) Task Count (103 ) PEFS
4.3 SYSTEM SPECIFICATION
4.3.1 HARDWARE REQUIREMENT
4.3.2 SOFTWARE REQUIREMENT
5. SYSTEM DESIGN
5.1 SYSTEM ARCHITECTURE
5.2 SYSTEM FLOW DRAWING
6. CONCLUSION
This paper presents an AI-driven Prediction based Energyaware Fault-tolerant
Scheduling scheme (PEFS) for the cloud data center. First, task parameters
(involving the requested resources, actually allocated resources, and whether
failure occurred) are gathered from the historical data set. Then a DNN-based
prediction model is trained to predict the failure rate of each of the arriving
tasks. In this way, all the arriving tasks can be classified into failure-prone tasks
and non-failure-prone tasks based on model outputs. Second, a scheduling
algorithm based on vector bin packing is proposed to schedule two types of
tasks efficiently. The main difference between these two scheduling processes is
that, for the failure-prone tasks, super tasks are generated firstly based on an
elegant vector reconstruction method for the fault-tolerant. A replication
strategy is applied to replicate only the failure-prone tasks, then arranged into
super tasks in a way of vector reconstruction so that the execution of different
copies of the tasks in different hosts will not be overlapped so that redundant
execution will be avoided. Experiments on Internet Data Set and Eular Data Set
are conducted, and the experimental results validate the merits of the proposed
scheme in comparison with existing techniques. Future work includes further
augmentation of largerscale simulations to scrutinize the performance of
heuristics. Other improvements will also be considered to address reliability and
consistency requirements.
7. REFERENCE
[1] K. Bilal, S. U. Khan, L. Zhang, H. Li, K. Hayat, S. A. Madani, N. Min-
Allah, L. Wang, D. Chen, M. Iqbal, C.-Z. Xu, and A. Y. Zomaya, “Quantitative
comparisons of the state-of-the-art data center architectures,” Concurrency
Computation: Practice and Experience, vol. 25, no. 12, 2012.
[2] D. Kliazovich, P. Bouvry, and S. U. Khan, “Dens: data center energy-
efficient network-aware scheduling,” Cluster Computing, vol. 16, no. 1, p. 65–
75, 2013.
[3] J. Shuja, S. A. Madani, K. Bilal, K. Hayat, S. U. Khan, and S. Sarwar,
“Energy-efficient data centers,” Computing, vol. 94, no. 12, p. 973–994, 2012.
[4] J. Liu, S. Wang, A. Zhou, S. Kumar, F. Yang, and R. Buyya, “Using
proactive fault-tolerance approach to enhance cloud service reliability,” IEEE
Transactions on Cloud Computing, no. 99, 2017.
[5] O. Hannache and M. Batouche, “Probabilistic model for evaluating a
proactive fault tolerance approach in the cloud,” IEEE International Conference
on Service Operations And Logistics, And Informatics, pp. 94–99, 2015. [6] B.
Mohammed, M. Kiran, M. Kabiru, and I.-U. Awan, “Failover strategy for fault
tolerance in cloud computing environment,” Software: Practice and Experience,
vol. 47, no. 9, p. 1243–1274, 2017.
[7] J. Zhao, Y. Xiang, T. Lan, H. H. Huang, and S. Subramaniam, “Elastic
reliability optimization through peer-to-peer checkpointing in cloud
computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 28,
no. 2, pp. 491–502, 2017.
[8] M. Randles, D. Lamb, and A. Taleb-Bendiab, “A comparative study into
distributed load balancing algorithms for cloud computing,” IEEE International
Conference on Advanced Information Networking and Applications
Workshops, pp. 551–556, 2010.
[9] A. Marahatta, Y.-S. Wang, F. Zhang, A. K. Sangaiah, S. K. Sah Tyagi, and
Z. Liu, “Energy-aware fault-tolerant dynamic task scheduling scheme for for
virtualized cloud data center,” Mobile Networks and Applications, 2018.
[10] A. Zhou, S. Wang, C.-H. Hsu, M. H. Kim, and K. S. Wong, “Network
failure-aware redundant virtual machine placement in a cloud data center,”
Concurrency
[11] C. Wang, L. Xing, H. Wang, Z. Zhang, and Y. Dai, “Processing time
analysis of cloud services with retrying fault-tolerance technique,” IEEE
International Conference on Communications in China, pp. 63– 67, 2012.
[12] G. Ramalingam and K. Vaswani, “Fault tolerance via idempotence,”
SIGPLAN Not., vol. 48, no. 1, pp. 249–262, 2013.
[13] K. Plankensteiner, R. Prodan, and T. Fahringer, “A new fault tolerance
heuristic for scientific workflows in highly distributed environments based on
resubmission impact,” IEEE International Conference on e-Science, pp. 313–
320, 2009.
[14] M. A. Mukwevho and T. Celik, “Toward a smart cloud: A review of fault-
tolerance methods in cloud systems,” IEEE Transactions on Services
Computing, 2018.
[15] J. Wu, P. Zhang, and C. Liu, “A novel multiagent reinforcement learning
approach for job scheduling in grid computing,” Future Generation Computer
Systems, vol. 27, no. 5, p. 430–439, 2011.
[16] M. A. Shafii, L. M. Shafie Abd, and B. M. Bakri, “On-demand grid
provisioning using cloud infrastructures and related virtualization tools : A
survey and taxonomy,” International Journal of Advanced Studies in Computer
Science and Engineering IJASCSE, vol. 3, no. 1, pp. 49–59, 2014.
[17] V. S. Kushwah, S. K. Goyal, and P. Narwariya, “A survey on various fault
tolerant approaches for cloud environment during load balancing,” IJCNWMC,
vol. 4, no. 6, pp. 25–34, 2014.
[18] P. Kassian, P. Radu, F. Thomas, K. Attila, and P. Kacsuk, “Fault tolerant
behavior in state-of-the-art grid workflow management systems,” Institute for
Computer Science University of Innsbruck Attila Kert Core GRID Technical
Report Number TR-0091, 2007.
[19] P. Guo and Z. Xue, “Real-time fault-tolerant scheduling algorithm with
rearrangement in cloud systems,” 2017 IEEE 2nd Information Technology,
Networking , Electronic and Automation Control Conference (ITNEC), pp.
399–402, 2017.
[20] J. Soniya, J. Angela, J. Sujana, and T. Revathi, “Dynamic fault tolerant
scheduling mechanism for real time tasks in cloud computing,” International
Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT),
2016.
[21] R. K. Yadav and V. Kushwaha, “An energy preserving and fault tolerant
task scheduler in cloud computing,” IEEE ICAETR, 2014.
[22] M. Amoon, “A fault-tolerant scheduling system for computational grids,”
Comput. Elect. Eng., vol. 38, no. 2, p. 399–412, 2012.
[23] P. Keerthika and S. P, “A multiconstrained grid scheduling algorithm with
load balancing and fault tolerance,” Sci. World J., 2015.
[24] H. Duan, C. Chen, G. Min, and Y. Wu, “Energy-aware scheduling of
virtual machines in heterogeneous cloud computing systems,” Future
Generation Computer Systems, vol. 74, pp. 142–150, 2017.
[25] C. Ghribi, M. Hadji, and D. Zeghlache, “Energy efficient vm scheduling
for cloud data centers: Exact allocation and migration algorithms,” 2013 13th
IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing,
2013.
[26] B. Nazir, K. Qureshi, and P. Manuel, “Adaptive check pointing strategy to
tolerate faults in economy based grid,” Journal of Supercomputing, vol. 50, no.
1, pp. 1–18, 2009.
[27] A. Marahatta, C. Chi, F. Zhang, and Z. Liu, “Energy-aware faulttolerant
scheduling scheme based on intelligent prediction model for cloud data center,”
The 9th International Green and Sustainable Computing Conference, Pittsburgh,
USA, 2018.
[28] X. Fan, W.-D. Weber, and L. A. Barroso, “Power provisioning for a
warehouse-sized computer,” ISCA, pp. 13–23, 2007.
[29] V. V. Vazirani, “Approximation algorithms,” Springer-Verlag, New York,
Inc., 2001.
[30] R. Panigrahy, K. Talwar, L. Uyeda, and U. Wieder, “Heuristics for vector
bin packing,” Microsoft Research, Tech. Rep., 2011.
[31] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. De Rose, and R. Buyya,
“Cloudsim:a toolkit for modeling and simulation of cloud computing
environments and evaluation of resource,” Software: Practice and Experience,
vol. 41, no. 1, pp. 23–50, 2011.
[32] “Eular data set.” [33] “Internet data set.” [Online]. Available:
https://github.com/ somec001/InternetData [34] “Google cluster data set.”
[Online]. Available: https://github.
com/google/cluster-data/blob/master/ClusterData2011 2.md

You might also like