Workflow Chapter1
Workflow Chapter1
INTRODUCTION
1.1 INTRODUCTION
research centre, in the very recent couple of years. It is for the most part shows the
virtualized computing resources with scheduler and the resources in the logical
integration, to manage data center assets, virtualization and client submitted to the
mission requirements for greatest use rate for the client to give administrations
TV channels, cloud will get our attention. The present most prominent long range
Indeed, even the U.S government expects to introduce cloud-based systems as the
default option for federal agencies of 2012 [JAN13]. Cloud computing is a growing
the client’s request and necessity. Cloud comprises of a gathering of virtual machine
which incorporates both computational and storage facility. The main aim is to give
systems, create applications, and utilize existing services to build programs. It depends
1
to process capability, storage, networking and information technology framework.
Today, anybody with a credit card can subscribe to Cloud benefits, and convey and
design servers for an application in hours, develop and contract the foundation serving
its application as indicated by the request, and pay just for the time these resources have
been used. Cloud computing is still under its improvement to organize and has many
compensation as we utilize show. Because of the versatile idea of the cloud, the resource
application. The application of a computation is needed for processing and handling the
huge information with additional warehousing facilities for the resources to be shared
among users.
servers, the best means for distributing computing resources for existing resource pools
is based on demand and scientific computing. This chapter offers a brief overview of
the Cloud Computing concepts, types and services. The chapter also discusses about
straightforward entry to remote processing destinations given by the Internet. The term
2
available in a remote area. Cloud can give benefits over system on open systems or on
private systems (i.e., Wide Area Network (WAN), Local Area Network (LAN) or
Cloud computing uses the web to bring to another person's software running on
refers to getting or figuring resources that are commonly processed and formed by a
forthright speculation. It has given way for conveying practical figuring control over
the Internet, including virtual private systems. From the viewpoint of a sensible cloud
advocate, cloud administrations limit the capital cost of processing, attach a working
3
Through cloud computing, ready to utilize programming is conveyed through
the Internet on the program with no organization, has an application on the Internet,
sets up own remote file storage and database system and more. Cloud computing gives
us methods, whether we can stick to the applications as utilities, over the Internet. It
enables us to build, design, and modify applications on the network. Cloud computing
provides online data storage, infrastructure and application as shown in figure 1.1
The cloud service for several multinational companies in which some of the
· Google - It is a private cloud that extends services for email access, document
· Microsoft - It allows the users of business intelligence tools to move into the
The components of cloud computing are given below and shown in figure 1.2.
Cloud Components
Client Storage
Application
4
· Clients - The client involves computer hardware or software for application
· Services - A cloud service includes “products, services and solutions that are
rendered and consumed in real-time over the Internet. For example, Web
software
often eliminating the need to install and run the application on the customer’s
of applications without the cost and complexity of buying and managing the
environment, as a service.
· Front End - The front end is in charge of the communication between the clients
and the servers. There are different APIs to access the actual storage. It consists
5
of interfaces and applications that are required to access the cloud computing
· Storage Layer - The storage logic layer handles a variety of features, and is in
· Back End - It refers to the cloud itself. It consists of all the resources required
with protocols such as the GFS (Google File System). It involves the use of
Internet
Application
Service
Management
Cloud Runtime
Security
Back End
Storage
Infrastructure
6
1.3.1 Characteristics of Cloud Computing
Cloud computing allows for the sharing and scalable deployment of services, as
needed, from virtually any location, and for which the customer can be billed based on
the computing capabilities, such as server processing time and network storage
· Broad Network Access - Cloud capabilities are available over the network and
accessed through various platforms (e.g., mobile phones, laptops, and tablets)
[GAM13].
provider and the consumer of the utilized service. The advantage here is that
needed. This dynamic scaling needs to be done while maintaining high levels
7
· Network Access - Needs to be accessed across the internet from a wide range of
devices such as PCs, laptops, and mobile devices, using standards-based APIs
· Managed Metering - Uses metering for managing and optimizing the service,
and to provide reporting and billing information. In this way, consumers are
billed for services according to the use they have made actually during the
billing period.
The cloud makes it feasible for one to get to one’s data from anywhere,
indistinguishable area of data storage device, the cloud makes away that stride. The
the hardware that stores the data. The cloud provider can both possess and house the
hardware and software that are important to maintain one’s home or business
applications. Little organizations can store their data in the cloud, evacuating the cost
of purchasing and putting away memory gadgets. Some of the benefits of cloud
8
Figure 1.4 Benefits of Cloud Computing
The following are some of the possible benefits of those who offer cloud computing-
· Cost Savings - Companies can reduce their capital expenditures and use
lower barrier to enter and also requires fewer in-house IT system resources to
to a large deployment fairly rapidly, and then scale back, if necessary. Also, the
9
· Maintenance - Cloud service providers manage the system maintenance, and
access is through APIs that do not require application installations onto PCs, so
resources
There are services and models working to make cloud computing possible and
reachable to end users. Subsequent are the working models for cloud computing:
· Service Models
· Deployment Models
· Virtualization
The cloud services can be sorted into software services and infrastructure or
more advanced than the equipment in the cloud. Once a cloud is set up, how its
distributed computing administrations are sent regarding plans of action can vary
contingent upon prerequisites are shown in figure 1.5. The primary service models are
defined as
10
Figure 1.5 Service Model of Cloud Computing
the double by getting into it on the cloud [HUT11]. In this model, an entire
instance of the service keeps running on the cloud and various end clients are
on servers or programming licenses, while for the supplier, the expenses are
brought down, since just a single application should be facilitated and kept.
Zoho, etc.
service, on which the high levels of service can be built and it allows the
customer to build his own applications on cloud infrastructure. The Paas offers
11
the integration of OS and application servers includes LAMP platform (Linux,
Apache, MySql and PHP), restricted J2EE, Ruby etc. to face the scalability
and to manage workloads over the network. The customer builds his own
requirements and characteristics and that which supports the cloud users and service
requirements. The following four deployment models differ in various ways are shown
Public Cloud
service premise over the Internet, through web applications/web administrations from
premises. The cloud framework is accessible to the overall population or a vast industry
AppEngine.
12
Community Cloud
comparable requirements and expect to share foundation in order to read a lot of the
benefits of cloud computing. With the costs spread over less clients than a public cloud
(even more than a sole occupant) this choice is costlier; yet may offer a larger amount
Hybrid Cloud
The term “Half breed Cloud” has been used to mean either two separate clouds
occurrences utilized together with actual physical hardware. The most right meaning of
the expression “Cross breed Cloud” is the utilization of physical equipment and
Two clouds that have been consolidated are all the more effective called a “combined
cloud”. A hybrid storage cloud uses a combination of public and private storage clouds.
13
Hybrid storage clouds are regularly valuable for archiving and backup functions,
Private Cloud
and secure cloud based condition in which just the predetermined customer can work.
Similarly to the other cloud models, private clouds give figuring power as an
asset. In any case, under the private cloud, the cloud (the pool of asset) just opens by a
single organization providing that organization with greater control and privacy.
1.4.3 Virtualization
The basic virtualized model consists of cloud users, service models, virtualized
model and its host OS and their hardware. The virtualization is the process of designing
a virtual version of a device or resource, which divides the resource into one or
more execution environments such as a server, storage device, network and operating
system. Devices, applications and human users can interact with the virtual resource.
For instance, hard drive also considered as virtualization as it consists of one drive
and partitions it into two separate hard drives. The term virtualization has became a
14
· Level Virtualization - It is a type of server virtualization technology , which
process.
Data Protection
They fear losing information to rivalry and the data privacy of purchasers. They fear
losing data to competition and the data confidentiality of consumers. In the current
mandatory. The Operational services are essential in service level agreements and
runtime of applications. It supports some of the process for data recovery and
availability.
15
They are
Ø Data Replication
Ø Disaster recovery
Management Capabilities
stage and the framework is still in its early stages. Components like Auto-scaling for
instance, are a pivotal prerequisite for some ventures. In that perspective, there should
highlights.
information and other data to be physically available outside the state or nation. In order
to meet such prerequisites, cloud providers need to set up a or a storage capacity site
only inside the country to agreeing to the controls. Having such a foundation may not
16
1.5.1 Advantages of Cloud Computing
· Lower Space - Most of the cloud computing is web based which allows to run
on the desktop, PC, laptops, tablets and it does not need the memory space or
processing power.
· Improved Performance - Due to less programs and fewer processes which are
loaded into memory, it allows the computers to reboot and run faster in a cloud
environment.
· Unlimited Data Storage Capacity and Increased Data Reliability - The cloud
computers have the ability to store hundreds of Petabytes of data. The cloud
crashing among the cloud computers does not affect the storage of data.
an Internet connection
The sharing and management of resources of cloud make the cloud computing
17
· E-Learning - E-Learning is an emerging attractive environment for students,
faculty members and researchers and allows them to connect to their own
improve its business through cloud. The installation of the ERP in the cloud
payroll etc.
can be achieved by paying a monthly fee so that the customers can receive
in the office like DaaS (Desktop as a Service) or SaaS and it supports for
scalable and customized to offer the services to its citizens, institutions and to
prominent way for issuing the UID (Jan Lokpal Bill) one and only single Id used
instead of Voter ID Card, Address Proof like Electricity Bill, Ration Card etc to
access services such as obtaining a bank account, passport, driving license and
so on.
18
Database for finding the nearby hospital location and to get advice from the
· Agriculture - Agriculture and the handicraft industry workers are facing issues
to sell their own products in the market which leads to exploitation of the
farmers.
framework for various applications [DEE03]. The idea of cloud computing keeps on
spreading broadly, as it has been acknowledged well of late. Cloud computing has
empowers the usage of different cloud administrations to encourage the work process
scheduling is to computerize the systems, particularly which are engaged with the way
toward passing the data and the records between the members of the cloud, keeping up
multi-target nature of the booking issue in Clouds makes it hard to take, particularly on
19
A workflow empowers the organizing of applications in a directed acyclic graph
shape where every nodes speak to the constituent task and edges speak to represent inter
task conditions of the applications .A single workflow for the most part comprises of
the work process. A workflow for the most part engaged with computerization of
shared resources that are not straightforwardly under its control. A workflow
measurements that are time, cost, fidelity, reliability and security [KUM13].
20
The significance of the cloud is its application adaptability or versatility. This
nature of cloud encourages changes of asset and qualities at run time. This capacity of
cloud empowers the work process administration frameworks to promptly meet the
figure 1.7. This model is characterized as a workflow management system and the most
critical framework interfaces. The figure 1.8 demonstrates the reference model of the
work process as
· Workflow Engine - Software that gives the run-time condition, keeping in mind the
robotized control.
workflow systems.
conditions.
21
Process Definition
Interface 1
Interface 4
Administration & Interface 5 Workflow Engine(s)
Wo
Workflow Engine(s)
Monitoring tasks
Interface 2 Interface 3
Workflow Invoked
Client Applications
Application
22
1.6.1 Workflow Scheduling Problem
which graph nodes undertake the tasks and the edges speak about information
conditions among the tasks with weights on the nodes representing the computation
complexity and weights on the edges representing communication volume. In this way,
[RAH13]. Accordingly, despite the fact that the DAG scheduling problem can be
length or makespan. In this way, the goal of workflow scheduling techniques is to limit
assume workflow W (T, E) consists of a set of tasks, T= {T1, T2,…Tx… Ty, Tn} and a
set of dependencies among the tasks, E= {< Ta, Tb >, ...,< Tx, Ty >}, where Tx is the
parent task of Ty. R = {R1, R2,… ,Rx, ,Ry,Rm} is the set of available resources in the
workflow tasks with grid resources (T →R) so that the makespan M is minimized.
For the most part, a workflow task is an arrangement of guidelines that can be
task does not have any parent errand and a leave assignment does not have any child
task. Also, a child task can't be executed until the point that the greater part of its parent
task is finished. In terms of scheduling, the task that has the greater part of its parent
23
1.7 SCHEDULING IN CLOUD COMPUTING
and cloud computing. Numerous analysts have proposed different algorithms for
assigning, planning and scaling the assets proficiently in the cloud. Scheduling
display in the system framework and gathers status data identified with
them.
troublesome for a server, on the grounds that the number of jobs asking for is
extensive in number, which required a few assets to execute. The timetable along
these lines that works by the server must be idealistic and sufficient with the goal
that each demand by the client gets reaction in time, and each Task/Job gets
appropriate assets for its execution. In Cloud, there are distinctive servers over the
cloud which takes customer's solicitations and after execution offers react to the
customers. The ideal Workflow Scheduling for such framework is given in figure
1.9
24
Figure 1.9 Examples for Job Scheduling
Virtual machines take input from clients and after the execution responds
[SIN13]. Thought about taking the servers as VMs and these VMs react to
customer's solicitations.
Step-1 All the incoming tasks will be grouped together as according to their
certain behavior or attribute (like deadline constrained or low cost
requirement).
Step-2 After grouping of task, the tasks will be prioritized. A task can be
prioritize on the basis of its attributes like deadline, execution time etc.
Step-3 The Virtual machine with minimum response time and is capable to
execute selected task will be assigned for the execution of the task.
25
1.7.1 Limitations of Existing Workflow and Scheduling Mechanisms
Static scheduling maps assets to each individual task before workflow execution; it
bolsters the entire work process execution .However, it relies upon the data of
supervisor which approves task conditions. The DAG task in the line meets the
dependence need and is not isolated from other free vacations. For example, in the
Condor system, DAG checks between the errand reliance and submits only the
assignment arranged to execute into the line administered by Condo, which designs the
occupations in the line in First Come First Service (FCFS) outline. Concerning static
heuristics of DAG getting ready for a heterogeneous space, there are two social
occasions that are more common than others, for example, List booking heuristics and
out the positions of all errands, and then picks best resource (by portraying cost target
underlying stride is the clustering phases are errands are accumulated into groups of
virtual processors for particular criteria, for example, remembering the true objective
to cover silly trades. The second one is the mapping stage, wherein all part errand of
26
same bunches is allocated to the same processor. Dominant Sequence Clustering (DSC)
and Clustering and Scheduling System (CASS) also have a place with this gathering.
Wide comparative examinations are performed and they testify that the static
framework can possibly perform close perfect, and can exhibit legitimate with some
certifiable work process applications. The entertainment work also prescribes that static
philosophies still perform better than dynamic ones for data raised work process
applications ,even with off course information about future courses. Nevertheless, it is
astoundingly difficult to gage execution accurately, since the execution condition may
change a significant measure after resource mapping. Late work exhibits that arranging
through resource reservation and execution can ensure the advantage of availability in
the midst of execution and speculatively makes the lattice all the more obvious.
However, their philosophies don't resolve each one of the issues. Others endeavour to
make the static approach more adaptable to change occurring in execution arrange.
where it is normally incited by contract infringement. In any case, the attempts are out
and out driven for iterative applications, empowering the system to perform
techniques for changing the beginning with one individual from the family, then onto
the following when the execution of one development chart flops, however every one
of the plans are done without learning of the future condition change. Another
27
1.8 OPTIMIZATION TECHNIQUES
User applications may cause huge information recovery and execution costs
when they are planned considering just the 'execution time'. In addition to optimizing
the execution time, the cost emerging from information exchanges between resources
.Current heuristic based calculations are produced and have been turned out to be
appropriate for work process booking. The improvement procedures such as Grey wolf
Optimization and African Buffalo Optimization are utilized based on the calendar and
at least one capacities with any conceivable limitations. It can be grouped in the light
of the kind of imperatives, nature of outline factors, physical structure of the issue, idea
the plan factors, separability of the functions and number of objective functions. The
Objective Function
on the at least one amount. The target work confronts the issues which have the single
target capacity or more target works yet the distinctive goals are not good. The factors
that advance one goal might be a long way from ideal for the others. The issue with
28
Variables
A set of unknown elements are called variables and it can be continuous, discrete
or Boolean which are used to define the objective function and constraints. The design
variable cannot choose randomly because it has to satisfy the certain functional and
Constraints
The unknowns sets are allowed to analyse the certain values by a set of
constraints which are dependent on certain conditions that should satisfy the conditions
Several algorithms are discussed, but since some of the problems occurred,
hence while referring the data mining the efficiency, accuracy and the speed should be
considered. In some cases, the problem can be rectified by unique and static datasets,
but in real time datasets it is difficult to find the solution. In order to enhance the
technique and to find the process failure and to improve learning algorithm efficiency.
The various methods and current platforms, and the challenging problems in dynamic
resource scheduling based on cloud computing require to reduce the user request’s
response time and to balance the workloads, to improve the availability and reliability
of the resource.
29
1.10 MOTIVATION
fast response time, high throughput and cost effectiveness. It defines the strategies
which are used to allocate the processor. Therefore the scheduling and management of
resources in cloud environment are complex and risky and the demands for
sophisticated tools are also high. There is a need for an algorithm that improves the
availability and reliability for workflow scheduling based on cloud computing. The
1.11 OBJECTIVES
clustering data.
computing.
· To analyse the data that has not been carefully screened and represent the quality
of data.
· To discover structures and user patterns in real time high dimensional data set
effectively
· To set certain limitations to each resource, so that the resource allocation and
30
1.12 ORGANIZATION OF THE WORK
Chapter 1 deals with the introduction of cloud computing and workflow scheduling
techniques.
Chapter 2 reviews the previous work done in the areas of cloud workflow
scheduling techniques classified as Best effort based scheduling and QoS based
scheduling.
Chapter 3 deals with the brief description of the “Research Methodology” of the
proposed approaches.
The works of several researchers are quoted and used as evidence to support the
concepts explained in the thesis. All such evidences used are listed in the Reference
31
1.13 SUMMARY
This chapter illustrates the cloud computing concepts, workflow and scheduling
scheduling. The problem statement, objective and organization of the thesis are also
briefly summarized. The work done by various researches are studied, analysed and
32