0% found this document useful (0 votes)

14 views5 pages

Model For Fault Tolerance and Checkpoints in Cloud Computing Environment

Uploaded by

Anteyi Benedict

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views5 pages

Model For Fault Tolerance and Checkpoints in Cloud Computing Environment

Uploaded by

Anteyi Benedict

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

MODEL FOR FAULT TOLERANCE AND CHECKPOINTS IN CLOUD

COMPUTING ENVIRONMENT
Name: Anteyi Benedict O.

Matric No: CSC/13/4987

1.1 Background of Study

Fault tolerance is the ability of a system to continue performing its intended functions in

presence of faults. In a broad sense, fault tolerance is associated with reliability, with successful

operation, and with the absence of breakdowns. A fault-tolerant system should be able to handle

faults in individual hardware or software components, power failures, or other kinds of

unexpected problems and still meet its specification. Fault tolerance is necessary because it is

practically impossible to build a perfect system. The fundamental problem is that, as the

complexity of a system grows, its reliability drastically decreases, unless compensatory measures

are taken. For example, if the reliability of individual components is 99.99%, then the reliability

of a system consisting of 100 non-redundant components is 99.01%, whereas the reliability of a

system consisting of 10,000 non-redundant components is just 36.79%. Such a low reliability is

unacceptable in most applications.

Checkpoint/restart is a fault tolerance strategy that increases the wall clock time of the execution

of applications which increases the execution cost. Checkpointing is the process of saving

system states periodically during failure-free execution. By employing the Checkpointing fault

tolerance strategy, if a failure does occur while a system is running, the system can roll back to

the latest checkpoint and restart again from this checkpoint, thereby bounding the amount of lost

operations to be recomputed.
Cloud computing, the long-held dream of computing as a utility, has opened up a new era of

future computing, transformed a large part of IT industry, and reshaped the purchase and use of

IT software and hardware. Cloud computing is a large-scale distributed computing paradigm

driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable,

highly available, and configurable and reconfigurable computing resources (e.g., networks,

servers, storage, applications, data, and so on) can be rapidly provisioned and released with

minimal management effort in the data centers. Services are delivered on demand to external

customers over high-speed Internet with the “X as a service (XaaS)” computing architecture,

which is broken down into three segments: “applications”, “platforms”, and “infrastructure”. Its

aims are to provide users with more flexible services in a transparent manner and with ever

cheaper and more powerful processors. Cloud computing offers new capacity and flexibility

solution to high performance computing (HPC) applications with provisioning of a large number

of virtual machines for computational intensive applications. Fault tolerance allows HPC systems

on cloud with multiple of nodes to complete execution of computational intensive applications in

the present of fault. The most commonly used fault tolerance techniques for HPC is

checkpoint/restart and replication. However, in this research the focus is to present a model for

fault tolerance and checkpoint in cloud computing environment.

1.2 Motivation

Cloud computing has the ability and flexibility to greatly provision computer resources for HPC

(High Performance Computing) scientific purposes. However, fault tolerance is one of the

challenges that cloud for HPC applications are facing. With increasing numbers of processors on

today’s HPC systems which will also be provision on cloud; virtual instances, communication
links, and integrated circuit environment run on virtual machines (VMs), fault tolerance (FT) for

such applications running on the cloud need to ensure that computational intensive applications

run smoothly and simultaneously with reduced overhead as well as with visibility of the

environment. High fault tolerance issue is one of the major obstacles for opening up a new era of

high serviceability cloud computing environment as fault tolerance sockets as well as processors

are prone to failure. It has been predicted that a system with 100,000 processors will experience a

processor failure every few minutes. Therefore, Fault tolerant service is an essential part of

Service Level Objectives (SLOs), thus fault tolerance and checkpoint function should be

considered in clouds. When users transfer their critical systems to clouds, can the cloud

serviceability achieve 100 % uptime is a question users always ask. Unfortunately, cloud

serviceability and fault tolerance are still far from perfect. Failures are normal rather than

exceptional in cloud computing environments, due to large-scale time-critical data support, and

because cloud platforms are usually run in the form of voluntary, much cheaper, less powerful

and virtual computing nodes, cloud nodes are usually connected by unpredictable

communication links, thus communication failures, such as time out, will greatly influence the

serviceability of clouds, and some malicious behaviors occur in clouds as user contributed nodes.

Nowadays, demands for high fault tolerance and high serviceability are becoming

unprecedentedly strong. However, building a high fault tolerance and high serviceability cloud is

a critical, challenging, and urgently required task in cloud computing environment. Hence, in this

project, an optimized model for fault tolerance and checkpoint serviceability would be

developed.
1.3 Objective of Study

i. To design an optimized model for fault tolerance and checkpoint in cloud computing

environment using linear programming.

ii. Implement the model in (i)

1.4 Methodology

The goal of any optimization problem is to minimize objective functions such as error, faults,

failure, fault tolerance degree, fault tolerance overhead, response time, and other factors that

affect the computation of processes in HPC while maximizing the reliability of

computational operations and processes subjects to operational constraints of the Cloud

computing system environment. In this work, our main goal is to minimize the objective

function subject to the fault tolerance, FŦ subject to units of cloud computing environment

The steps listed below were taken in order to develop the optimized model for fault tolerance

in cloud computing environment.

i. Literature Review: A detailed review of relevant literatures on fault tolerance in cloud

computing was carried out.

ii. The strengths and limitations of reviewed works were highlighted.

iii. A mathematical model using Linear Programming Optimization will be implemented.

This model will be implemented by formulating a Linear Programming problem from

generation cost coefficients. The coefficient will be linearized by finding the

incremental linear approximation since the initial coefficient is non-linear in nature.

The programming problem that will be formulated will be formulated will be of the

form,
Minimize cTx

Subject to Ax ≤ b

and x ≥ 0

where cTx is the objective function, Ax ≤ b and x ≥ 0 are both equality and inequality

constraints respectively.

1.5 Expected Contribution to Knowledge

At the end of this research, an optimized model would have been developed for fault tolerance

and checkpoint in cloud computing environment which will contribute to the existing study in

cloud computing systems.

The Spring Cloud Handbook: Practical Solutions for Cloud-Native Architecture
From Everand
The Spring Cloud Handbook: Practical Solutions for Cloud-Native Architecture
Robert Johnson
No ratings yet
Folder Contents
100% (2)
Folder Contents
4 pages
SDNDFHGSDF
No ratings yet
SDNDFHGSDF
41 pages
V6i302 PDF
No ratings yet
V6i302 PDF
9 pages
Study On Fault Tolerance Methods
No ratings yet
Study On Fault Tolerance Methods
6 pages
Modeling For Fault Tolerance in Cloud Computing Environment
No ratings yet
Modeling For Fault Tolerance in Cloud Computing Environment
11 pages
A New Fault-Tolerant Algorithm Based On Replicatio
No ratings yet
A New Fault-Tolerant Algorithm Based On Replicatio
14 pages
Modeling For Fault Tolerance in Cloud Computing Environment: Rampratap, T
No ratings yet
Modeling For Fault Tolerance in Cloud Computing Environment: Rampratap, T
11 pages
Efficient Fault Tolerant Cost Optimized Approach For Scientific Workflow Via Optimal Replication Technique Within Cloud Computing Ecosystem
No ratings yet
Efficient Fault Tolerant Cost Optimized Approach For Scientific Workflow Via Optimal Replication Technique Within Cloud Computing Ecosystem
11 pages
Du3 1
No ratings yet
Du3 1
54 pages
Chaos Mesh for Resilient Kubernetes Deployments: The Complete Guide for Developers and Engineers
From Everand
Chaos Mesh for Resilient Kubernetes Deployments: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
CCSP - Certified Cloud Security Professional Exam Success
From Everand
CCSP - Certified Cloud Security Professional Exam Success
SUJAN
No ratings yet
Compusoft, 3 (9), 1103-1107 PDF
No ratings yet
Compusoft, 3 (9), 1103-1107 PDF
5 pages
AI-driven Prediction Based Energy-Aware Fault-Tolerant Scheduling Scheme (PEFS) For Cloud Data Center Abstract
No ratings yet
AI-driven Prediction Based Energy-Aware Fault-Tolerant Scheduling Scheme (PEFS) For Cloud Data Center Abstract
16 pages
Mainframe Meets Modernization: Mastering Hybrid Cloud Design: Mainframes
From Everand
Mainframe Meets Modernization: Mastering Hybrid Cloud Design: Mainframes
Ricardo Nuqui
No ratings yet
Journal Tiis 16-7 1984025229
No ratings yet
Journal Tiis 16-7 1984025229
29 pages
Azure Fundamentals Success Kit
From Everand
Azure Fundamentals Success Kit
PRIYANKA
No ratings yet
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
From Everand
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
Steve Brown
No ratings yet
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
From Everand
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
Poonam Devi
No ratings yet
CC 4-1
No ratings yet
CC 4-1
21 pages
Inductionn + Chapter 1 Part 1
No ratings yet
Inductionn + Chapter 1 Part 1
22 pages
Cloud Computing Essentials: A Practical Guide with Examples
From Everand
Cloud Computing Essentials: A Practical Guide with Examples
William E. Clark
No ratings yet
Comptia Cloud+ CV0 - 004: 715 Questions and Explanation
From Everand
Comptia Cloud+ CV0 - 004: 715 Questions and Explanation
Arabella Kushner
No ratings yet
Fault Tolerance in Distributed Computing
No ratings yet
Fault Tolerance in Distributed Computing
32 pages
Shedding Light on Cloud Computing
From Everand
Shedding Light on Cloud Computing
Gregor Petri
5/5 (1)
Survey ON Fault Tolerance IN Grid Computing: P. Latchoumy and P. Sheik Abdul Khader
No ratings yet
Survey ON Fault Tolerance IN Grid Computing: P. Latchoumy and P. Sheik Abdul Khader
14 pages
Building Scalable Systems with C: Optimizing Performance and Portability
From Everand
Building Scalable Systems with C: Optimizing Performance and Portability
Larry Jones
No ratings yet
Failover In-Depth
No ratings yet
Failover In-Depth
4 pages
Future Trends in Fault Tolerant (Lect.10)
No ratings yet
Future Trends in Fault Tolerant (Lect.10)
3 pages
Azure Patterns for Real-World Apps: Resilient by Design
From Everand
Azure Patterns for Real-World Apps: Resilient by Design
Kameron Hussain
No ratings yet
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
From Everand
Google Associate Cloud Engineer Exam Companion: Q&A with Explanations
SUJAN
No ratings yet
Yadav 2015
No ratings yet
Yadav 2015
6 pages
Embedded Systems Programming with C++: Real-World Techniques
From Everand
Embedded Systems Programming with C++: Real-World Techniques
Robert Johnson
No ratings yet
Mainframe Modernization with DevOps Mastery: Mainframes
From Everand
Mainframe Modernization with DevOps Mastery: Mainframes
Ricardo Nuqui
No ratings yet
Concurrency in C++: Writing High-Performance Multithreaded Code
From Everand
Concurrency in C++: Writing High-Performance Multithreaded Code
Robert Johnson
No ratings yet
Lesson 1 - Introduction To Fault-Tolerant Computing
No ratings yet
Lesson 1 - Introduction To Fault-Tolerant Computing
6 pages
Computer Science Self Management: Fundamentals and Applications
From Everand
Computer Science Self Management: Fundamentals and Applications
Fouad Sabry
No ratings yet
Fault Tolerance Automated Policy Management
No ratings yet
Fault Tolerance Automated Policy Management
7 pages
Synchronized Multi-Load Balancer With Fault Tolerance in Cloud
No ratings yet
Synchronized Multi-Load Balancer With Fault Tolerance in Cloud
8 pages
Failure Management For Reliable Cloud Computing: A Taxonomy, Model and Future Directions
No ratings yet
Failure Management For Reliable Cloud Computing: A Taxonomy, Model and Future Directions
10 pages
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
Cybersecurity in Cloud Computing
From Everand
Cybersecurity in Cloud Computing
Akula Achari
No ratings yet
Dynamic Data Fault Tolerance Mechanism To Enhance Reliability and Availability in Cloud
No ratings yet
Dynamic Data Fault Tolerance Mechanism To Enhance Reliability and Availability in Cloud
16 pages
Mastering OpenTelemetry: Building Scalable Observability Systems for Cloud-Native Applications
From Everand
Mastering OpenTelemetry: Building Scalable Observability Systems for Cloud-Native Applications
Robert Johnson
No ratings yet
Differentiated Availability in Cloud Computing Slas
No ratings yet
Differentiated Availability in Cloud Computing Slas
8 pages
Ijcse V11i4p101
No ratings yet
Ijcse V11i4p101
10 pages
Dancing on a Cloud: A Framework for Increasing Business Agility
From Everand
Dancing on a Cloud: A Framework for Increasing Business Agility
David Sterling
No ratings yet
Deploying Highly Available and Secure Cloud Solutions
No ratings yet
Deploying Highly Available and Secure Cloud Solutions
26 pages
Embedded Systems Programming with C: Writing Code for Microcontrollers
From Everand
Embedded Systems Programming with C: Writing Code for Microcontrollers
Larry Jones
No ratings yet
Optimized Computing in C++: Mastering Concurrency, Multithreading, and Parallel Programming
From Everand
Optimized Computing in C++: Mastering Concurrency, Multithreading, and Parallel Programming
Peter Jones
No ratings yet
Dis Sys
No ratings yet
Dis Sys
16 pages
Cloud Computing Interview Questions You'll Most Likely Be Asked: Second Edition
From Everand
Cloud Computing Interview Questions You'll Most Likely Be Asked: Second Edition
Vibrant Publishers
No ratings yet
Red Hat AMQ Streams for Cloud-Native Messaging: The Complete Guide for Developers and Engineers
From Everand
Red Hat AMQ Streams for Cloud-Native Messaging: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
DS Unit - 4
No ratings yet
DS Unit - 4
20 pages
AI-Driven Web Apps: Practical Machine Learning for Software Developers
From Everand
AI-Driven Web Apps: Practical Machine Learning for Software Developers
Sivaramarajalu Ramadurai Venkataraajalu
No ratings yet
Efficient Deployment Automation with Fabric: Definitive Reference for Developers and Engineers
From Everand
Efficient Deployment Automation with Fabric: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Real-Time Phoenix: Building Scalable Elixir Applications with Live Updates and WebSocket Streams
From Everand
Real-Time Phoenix: Building Scalable Elixir Applications with Live Updates and WebSocket Streams
Sam Stevenson
No ratings yet
Litmus Chaos Experiments in Practice: The Complete Guide for Developers and Engineers
From Everand
Litmus Chaos Experiments in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Cloud: Get All The Support And Guidance You Need To Be A Success At Using The CLOUD
From Everand
Cloud: Get All The Support And Guidance You Need To Be A Success At Using The CLOUD
John Hawkins
No ratings yet
Mastering Embedded C: The Ultimate Guide to Building Efficient Systems
From Everand
Mastering Embedded C: The Ultimate Guide to Building Efficient Systems
Robert Johnson
No ratings yet
Electromagnetic Flowmeter MODBUS RTU
No ratings yet
Electromagnetic Flowmeter MODBUS RTU
12 pages
How To Add Further Fields To Business Agreement
No ratings yet
How To Add Further Fields To Business Agreement
2 pages
PHP Code Example For View Edit Delete Search Update Database Table
93% (14)
PHP Code Example For View Edit Delete Search Update Database Table
12 pages
Ita2002 Software-Testing Eth 1.0 37 Ita2002
No ratings yet
Ita2002 Software-Testing Eth 1.0 37 Ita2002
2 pages
SQE Presentation On Test Case Generation Based On Use Cases
No ratings yet
SQE Presentation On Test Case Generation Based On Use Cases
15 pages
GCP Part2
No ratings yet
GCP Part2
5 pages
Pic16f8x PDF
No ratings yet
Pic16f8x PDF
126 pages
Cool Gui
No ratings yet
Cool Gui
2 pages
Trail
No ratings yet
Trail
203 pages
My AliExpress Manage Orders
No ratings yet
My AliExpress Manage Orders
3 pages
Budgetary Control
No ratings yet
Budgetary Control
166 pages
TMCL Reference and Programming Manual: Trinamic Motion Control GMBH & Co. KG Sternstraße 67 D - 20357 Hamburg, Germany
No ratings yet
TMCL Reference and Programming Manual: Trinamic Motion Control GMBH & Co. KG Sternstraße 67 D - 20357 Hamburg, Germany
86 pages
Operation Manual
No ratings yet
Operation Manual
27 pages
Symbian Based Rash Driving Detection System
No ratings yet
Symbian Based Rash Driving Detection System
3 pages
Evs xt2 Apercu Quick View
No ratings yet
Evs xt2 Apercu Quick View
6 pages
LIS 2022 New 1-161-171
No ratings yet
LIS 2022 New 1-161-171
11 pages
Lab 4a 1077561
No ratings yet
Lab 4a 1077561
26 pages
SAP ABAP Central - RFC Gateway Security, Part 1 - Basic Understanding
No ratings yet
SAP ABAP Central - RFC Gateway Security, Part 1 - Basic Understanding
5 pages
Measurement & Control Question Paper
No ratings yet
Measurement & Control Question Paper
4 pages
4 2020 Big Data Analytics For Cyber-Physical System in Smart City - 663, 768
No ratings yet
4 2020 Big Data Analytics For Cyber-Physical System in Smart City - 663, 768
2,049 pages
gsp5 Tools Menus and How-To List
No ratings yet
gsp5 Tools Menus and How-To List
3 pages
Deployment Options
No ratings yet
Deployment Options
31 pages
Revit 2026 Shortcuts
No ratings yet
Revit 2026 Shortcuts
3 pages
Single Sign-On Implementation
No ratings yet
Single Sign-On Implementation
19 pages
IoT Cybersecurity Alliance Demystifying IoT Cybersecurity
No ratings yet
IoT Cybersecurity Alliance Demystifying IoT Cybersecurity
10 pages
DBMS U1 One Shot Notes
No ratings yet
DBMS U1 One Shot Notes
103 pages
3 SDS Documnet
No ratings yet
3 SDS Documnet
84 pages
Lab9 QGIS3
No ratings yet
Lab9 QGIS3
12 pages

Model For Fault Tolerance and Checkpoints in Cloud Computing Environment

Uploaded by

Model For Fault Tolerance and Checkpoints in Cloud Computing Environment

Uploaded by

MODEL FOR FAULT TOLERANCE AND CHECKPOINTS IN CLOUD

Matric No: CSC/13/4987

1.1 Background of Study

faults in individual hardware or software components, power failures, or other kinds of

of a system consisting of 100 non-redundant components is 99.01%, whereas the reliability of a

unacceptable in most applications.

IT software and hardware. Cloud computing is a large-scale distributed computing paradigm

driven by economies of scale, in which a pool of abstracted, virtualized, dynamically-scalable,

on cloud with multiple of nodes to complete execution of computational intensive applications in

fault tolerance and checkpoint in cloud computing environment.

environment using linear programming.

ii. Implement the model in (i)

affect the computation of processes in HPC while maximizing the reliability of

computational operations and processes subjects to operational constraints of the Cloud

in cloud computing environment.

i. Literature Review: A detailed review of relevant literatures on fault tolerance in cloud

computing was carried out.

ii. The strengths and limitations of reviewed works were highlighted.

iii. A mathematical model using Linear Programming Optimization will be implemented.

This model will be implemented by formulating a Linear Programming problem from

generation cost coefficients. The coefficient will be linearized by finding the

incremental linear approximation since the initial coefficient is non-linear in nature.

1.5 Expected Contribution to Knowledge

cloud computing systems.

You might also like