0% found this document useful (0 votes)
51 views26 pages

Module 3-2

The document discusses various aspects of cloud computing, including its challenges, architectural styles, and applications in different fields such as science and engineering. Key challenges include security, resource management, interoperability, and the implications of cloud adoption. It also highlights specific applications like GrepTheWeb and the use of tools like ZooKeeper for distributed coordination.

Uploaded by

dizzy.mit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views26 pages

Module 3-2

The document discusses various aspects of cloud computing, including its challenges, architectural styles, and applications in different fields such as science and engineering. Key challenges include security, resource management, interoperability, and the implications of cloud adoption. It also highlights specific applications like GrepTheWeb and the use of tools like ZooKeeper for distributed coordination.

Uploaded by

dizzy.mit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Module -3

Cloud Computing: Applications and Paradigm

3.1 Challenges for Cloud Computing


3.2 Architectural Styles for Cloud Applications
3.3 Workflows: Coordination of Multiple Activities
3.4 Coordination Based on a State Machine Model: The ZooKeeper
3.5 The MapReduce Programming Model
3.6 A Case Study: The GrepTheWeb Application
3.7 Clouds for Science and Engineering
3.8 High-Performance Computing on a Cloud
3.9 Cloud Computing for Biology Research
3.10 Social Computing, Digital Content, and Cloud Computing
3.1 Challenges for Cloud Computing

Cloud computing, while transformative, is accompanied by a range of challenges


that stem from its fundamental reliance on resource sharing, resource
virtualization, and the utility computing model. These challenges differ across the
three cloud delivery models—Software-as-a-Service (SaaS), Infrastructure-as-a-
Service (IaaS), and Platform-as-a-Service (PaaS). Here, we explore the major
challenges in greater detail.

1. Security: The Most Significant Challenge

Security is the foremost concern for cloud computing. Gaining the trust of users,
especially for sensitive applications, is crucial. While public clouds are popular for
scalability and cost efficiency, For these cases, private or hybrid clouds provide
more secure alternatives.

Data Vulnerability:

Data at rest and data in transit are particularly vulnerable to unauthorized access
and breaches. While encryption protects data in storage, it must be decrypted for
processing, which exposes it to potential attacks.

Data replication, essential for fault tolerance and service continuity, increases the
risk of unauthorized access or compromise, especially if proper safeguards are not
in place.

Threat Landscape:

o Cloud services are susceptible to attacks such as Distributed


Denial of Service (DDoS), which can disrupt operations for both providers and
users.
o The potential presence of malicious insiders who can exploit
privileged access further complicates security efforts.
2. Challenges Specific to Cloud Delivery Models

The three cloud delivery models face distinct challenges due to their unique
characteristics.

SaaS (Software-as-a-Service):

o Users interact with cloud services via predefined interfaces, which


limit direct system access, reducing some attack vectors. However, challenges like
data breaches, phishing attacks targeting credentials, and DoS attacks persist.
o The centralization of user data in SaaS environments makes
storage servers a prime target for attackers.

IaaS (Infrastructure-as-a-Service):

 Users have extensive control, increasing potential misuse.


 Virtualization introduces risks, as infected virtual machines (VMs) can
persist and re-infect systems even after cleanups.
 Virtual environments require secure management of the trusted
computing base (TCB), which includes hardware, hypervisors, and management
OS.

PaaS (Platform-as-a-Service):

o Developers rely on PaaS for building and deploying applications,


but the abstraction layers can obscure underlying vulnerabilities.
o Integrating third-party libraries or APIs into PaaS applications
introduces risks, as attackers may exploit weaknesses in external dependencies.

3. Resource Management Challenges

Efficient resource management is critical for cloud performance and user


satisfaction. However, managing resources in a cloud environment with millions
of servers distributed globally is a monumental task.

Policy Implementation:Controllers must implement policies for admission


control, capacity allocation, load balancing, and energy optimization while
ensuring quality of service (QoS) guarantees.
Self-Management and Automation:To address these limitations, cloud systems
increasingly rely on self-management and self-organization. However, this
approach complicates logging, auditing, and identifying the root causes of actions
that lead to security breaches.

4. Interoperability and Standardization

Interoperability between cloud providers remains a pressing concern.

Vendor Lock-In:Users often find themselves tied to a specific cloud provider due
to proprietary APIs, data formats, and integration tools. This dependence can
hinder flexibility, especially if a service critical to operations becomes unavailable.

Challenges in Standardization:While standardization promotes interoperability


and portability, introducing standards prematurely can stifle innovation. Cloud
computing is still evolving, and balancing standardization with innovation is a
delicate task.

5. Social and Economic Implications

Beyond technical challenges, cloud computing has significant social and economic
implications:

Government & Education: Adoption challenges due to security and compliance


concerns.

Sustainability: Need for energy-efficient practices to reduce environmental


impact.

3.3 Workflows: Coordination of Multiple Activities


workflow:Cloud applications often involve multiple interdependent tasks, which
together form a workflow.
A workflow model is an abstraction that defines key properties of entities within a
workflow management system.

The core unit in workflow modeling is a task, which has several attributes:
 Name: (unique identifier
 Description: (natural language explanation)
 Actions: (changes caused by task execution)
 Preconditions :(conditions that must be true before
execution)
 Post-conditions: (conditions that must be true after
execution)
 Attributes: (resource requirements, security needs, reversibility
etc.)
 Exceptions :(error-handling mechanisms)
Tasks can be primitive (indivisible) or composite (composed of multiple tasks with a
defined execution order).
A routing task manages: the flow between tasks, enabling sequential, concurrent, or
iterative execution.

Different routing types include:


 Fork tasks: (splitting largde task into smaller-execute concurrently)
 Join tasks :(combining the result of multiple tasks)
 Iterative tasks :(repeat execution between fork and join)

A process description (or workflow schema) outlines task execution order and is often
written in a Workflow Definition Language (WFDL).
Workflow descriptions resemble flowcharts, supporting branching, concurrency, and
iteration.

Key properties of a workflow:

 Safety – Prevents errors or "bad" states.


 Liveness – Ensures that progress is always possible.

Errors like deadlocks (when tasks block each other due to resource contention) can
occur. One way to prevent deadlocks is to acquire all necessary resources at once,
though this may reduce resource utilization.

Workflow Life Cycle


The life cycle of a workflow consists of four phases:
1.Creation – Defining the workflow.
2.Definition – Writing the workflow using a workflow description language (similar to
programming).
3.Verification – Checking for logical errors, such as deadlocks and lack of progress.
4.Enactment – Executing the workflow using an enactment engine (like running a
program).

Workflow Patterns
Workflow patterns define the relationships between tasks:
1.Sequence Pattern – Tasks are executed one after another.
2.AND Split Pattern – Multiple tasks are triggered concurrently.
3.Synchronization Pattern – A task starts only after multiple preceding tasks
complete.
4.XOR Split Pattern – A decision determines which of two tasks will execute.
3.4 Coordination Based on a State Machine Model: The ZooKeeper
What is Distributed Coordination Services?
Distributed Coordination Service (DCS) is a service that allows multiple nodes to
operate and coordinate in a distributed system. They provide tools to manage all nodes
and integrat everything to ensure consistency and prevent errors.
A popular example of a shared hosting service is Apache ZooKeeper.
ZooKeeper is a popular open-source tool designed to handle this challenge by
providing services like leader election, distributed locking, and configuration
management.

It helps maintain synchronization and consistency across distributed systems, making


managing shared resources and handling failures easier.

It follows a state machine model, where multiple servers work together, electing a
leader for coordination. Clients can connect to any server via TCP connections to send
requests and receive responses

Key Features of ZooKeeper


1.High Availability – Multiple servers run ZooKeeper, and clients can connect to any
available server.
2.Leader Election – Servers elect a leader responsible for managing updates.
3.Atomic Transactions – Updates are either fully completed or not applied at all.
4.Sequential Consistency – Updates occur in the exact order received.
5.Single System Image – Clients receive identical responses from any server.
6.Persistence – Data updates remain until explicitly changed.
7.Reliability – The system functions as long as most servers remain operational.

ZooKeeper Service Guarantees in Simple Words


Atomicity – A task either completes fully or doesn’t happen at all, preventing errors.
Order of Updates – Changes happen in the exact order they are made.
Same View for All – No matter which server a client connects to, they see the same
data.
Permanent Changes – Once data is updated, it stays the same until changed again.
Reliable System – The system keeps working as long as most servers are running.
This ensures ZooKeeper is safe, consistent, and reliable for managing distributed
systems.

ZooKeeper's Data Model and API


ZooKeeper organizes data into a hierarchical namespace similar to a file system, using
znodes. Each znode stores state data, which is updated atomically.

The application programming interface (API) to the ZooKeeper service is very simple
and consists of seven operations:
•create – add a node at a given location on the tree.
•delete – delete a node.
•get data – read data from a node.
•set data – write data to a node.
•get children – retrieve a list of the children of the node.
•synch-wait for the to propagate propagate.
The system also supports the creation of How ZooKeeper Works
1.Clients connect via TCP, sending requests and receiving updates.
2.Reads are handled by local replicas, while writes go through the leader.
3.The Paxos-based messaging protocol ensures consistency by requiring quorum
agreement for updates.
4.ZooKeeper supports ephemeral nodes, which exist only for the duration of a client
session.
Use Cases
ZooKeeper is widely used in distributed systems for service coordination,
synchronization, leader election, and group membership. Notable users include
Yahoo!’s Message Broker and various cloud applications.

3.6 A Case Study: The GrepTheWeb Application


 An application called GrepTheWeb, - production at Amazon.
 We use it to illustrate the power and appeal of cloud computing.
 The application allows a user to define a regular expression and search
the Web for records that match it.
 GrepTheWeb is analogous to the Unix grep command used to search a file
for a given regular expression.

This application performs a search of a very large set of records, attempting to


identify records that satisfy a regular expression.
 The source of this search is a collection of document URLs produced
by the Alexa Web Search, a software system that crawls the Web every night.
 The inputs to the applications are a regular expression and the large
data set produced by the Web-crawling software;
 the output is the set of records that satisfy the expression.
 The user is able to interact with the application and get the current status
 The application uses message passing to trigger the activities of
multiple controller threads that launch the application, initiate processing, shut down
the system, and create billing records.

GrepTheWeb uses Hadoop MapReduce, an open-source software package that


splits a large data set into chunks, distributes them across multiple systems, launches
the processing, and, when the processing is complete, aggregates the outputs from
different systems into a final result.
Apache Hadoop is a software library for distributed processing of large data sets
across clusters of computers using a simple programming model.

The details of the workflow of GrepTheWeb are captured in Figure 4.7(b) and
consist of the following steps:
1.The startup phase. Creates several queues – launch, monitor, billing, and
shutdown queues. Starts the corresponding controller threads. Each thread
periodically polls its input queue and, when a message is available, retrieves the
message, parses it, and takes the required actions.

2.The processing phase. This phase is triggered by a StartGrep user request; then a
launch message is enqueued in the launch queue. The launch controller thread picks
up the message and executes the launch task; then, it updates the status and time
stamps in the Amazon Simple DB domain.
Finally, it enqueues a message in the monitor queue and deletes the message from the
launch queue.
The processing phase consists of the following steps:
a. The launch task starts Amazon EC2 instances. It uses a Java Runtime
Environment preinstalled Amazon Machine Image (AMI), deploys required Hadoop
libraries, and starts a Hadoop Job (run Map/Reduce tasks).
b.Hadoop runs map tasks on Amazon EC2 slave nodes in parallel. A map task
takes files from Amazon S3, runs a regular expression, and writes the match results
locally, along with a descrip- tion of up to five matches. Then the combine/reduce
task combines and sorts the results and consolidates the output.
c .Final results are stored on Amazon S3 in the output bucket.

3.The monitoring phase. The monitor controller thread retrieves the message left at
the beginning of the processing phase, validates the status/error in Amazon Simple
DB, and executes the monitor task.
It updates the status in the Amazon Simple DB domain and enqueues messages in
the shutdown and billing queues.
The monitor task checks for the Hadoop status periodically and updates the Simple
DB items with status/error and the Amazon S3 output file.
Finally, it deletes the message from the monitor queue when the processing is
completed.

4.The shutdown phase. The shutdown controller thread retrieves the message from
the shutdown queue and executes the shutdown task, which updates the status and
time stamps in the Amazon Simple DB domain. Finally, it deletes the message from
the shutdown queue after processing.
The shutdown phase consists of the following steps:

a.The shutdown task kills the Hadoop processes, terminates the EC2 instances after
getting EC2 topology information from Amazon Simple DB, and disposes of the
infrastructure.

b.The billing task gets the EC2 topology information, Simple DB usage, and S3 file
and query input, calculates the charges, and passes the information to the billing
service.

5.The cleanup phase. Archives the Simple DB data with user info.

6.User interactions with the system. Get the status and output results. The
GetStatus is applied to the service endpoint to get the status of the overall system (all
controllers and Hadoop) and download the filtered results from Amazon S3 after
completion.

To optimize the end-to-end transfer rates in the S3 storage system, multiple files are
bundled up and stored as S3 objects. Another performance optimization is to run a
script and sort the keys and the URL pointers and upload them in sorted order to S3.
In addition, multiple fetch threads are started in order to fetch the objects.

This application illustrates the means to create an on-demand infrastructure and run it
on a massively distributed system in a manner that allows it to run in parallel and
scale up and down based on the number of users and the problem size.
3.7 Clouds for Science and Engineering:
In a talk delivered in 2007 and posted on his Web site just before he went missing in
January 2007, computer scientist Jim Gray discussed eScience as a transformative
scientific method Today, eScience unifies experiment, theory, and simulation; data
captured from measuring instruments or generated by simulations are processed by
software systems, and data and knowledge are stored by computer systems and
analyzed using statistical packages.

The generic problems in virtually all areas of science are:


1.Collecting experimental data.
2. Managing very large volumes of data.
3.Building and executing models.
4.Integrating data and literature.
3.Documenting experiments.
4.Sharing the data with others;
5.data preservation for long periods of time.

All these activities require powerful computing systems.


A typical example of a problem faced by agencies and research groups is data
discovery in large scientific data sets.
Examples of such large collections are the biomedical and genomic data at NCBI, the
astrophysics data at NASA, or the atmospheric data at NOAA and NCAR.

The process of online data discovery can be viewed as an ensemble of several


phases :
(i) recognition of the information problem;
(ii) generation of search queries using one or more search engines;
(iii) evaluation of the search results;
(iv) evaluation of the Web documents; and
(v) comparison of information from different sources.

The Web search technology allows scientists to discover text documents related to such
data, but the binary encoding of many of the documents poses serious challenges.
Metadata is used to describe digital data and provides an invaluable aid for discovering
useful information in a scientific data set.
A recent paper describes a system for data discovery that supports automated fine-
grained metadata extraction and summarization schemes for browsing large data sets
and is extensible to different scientific domains.

The system, called Glean, is designed to run on a computer cluster or on a cloud; its
run-time system supports two computational models, one based on MapReduce and the
other on graph-based orchestration.

3.8 High-Performance Computing on a Cloud


High-Performance Computing (HPC) is essential for advanced scientific research,
including climate modeling, fusion research, quantum mechanics, astrophysics,
and particle physics.

Traditionally, supercomputers have been the primary choice for these computations
due to their high processing power and efficient interconnects. However, cloud
computing is emerging as a flexible and cost-effective alternative.

A study conducted at the National Energy Research Scientific Computing Center


(NERSC) compared the performance of Amazon EC2 with three supercomputers—
Carver, Franklin, and Lawrencium—to evaluate the feasibility of using cloud
infrastructure for scientific applications.

The study tested a range of computational applications, including Community


Atmosphere Model (CAM) for climate modeling, GAMESS for quantum chemistry,
GTC for fusion research...

Each of these applications has different computational demands, with some being more
dependent on raw processing power while others require high-speed communication
between nodes.

The results while EC2 performed well for compute-intensive tasks, it struggled with
applications requiring frequent inter-node communication
Carver, Franklin, and Lawrencium demonstrated superior performance, particularly for
workloads that involve extensive data exchange.

However, cloud computing offers significant advantages, including on-demand


scalability, remote accessibility, and lower upfront costs.
NOTE:compute-intensive tasks--require a large amount of computing power and
memory

One of the key challenges of using cloud computing for HPC is its high network
latency and lower communication bandwidth, which make it inefficient for parallel-
processing applications that rely on fast interconnects.
Additionally, cloud platforms often suffer from performance variability, as resources
are shared among multiple users.
Despite these limitations, cloud computing remains aviable option for independent,
compute-heavy tasks and can complement supercomputers in a hybrid HPC
model.

where cloud infrastructure is used for data preprocessing and storage while high-
performance systems handle complex computations.
In conclusion, while cloud computing is not yet a full replacement for supercomputers,
it provides a scalable and cost-effective solution for certain scientific workloads.

The results in Table 4.1 give us some ideas about the characteristics of scientific
applications likely to run efficiently on the cloud. Communication-intensive
applications will be affected by the increased latency (more than 70 times larger then
Carver) and lower bandwidth (more than 70 times smaller than Carver).
3.9 Cloud Computing for Biology Research
Biology, a field requiring vast computational power, has been an early adopter of cloud
computing to handle large-scale data processing.
[NOTE:computational power -ability of a computer to process data and perform task]

Two major computational challenges in biology include molecular dynamics


simulations (CPU-intensive) and protein sequence alignment (data-intensive).

Microsoft Research conducted an experiment demonstrating the power of cloud


computing by running AzureBLAST, a modified version of the BLAST algorithm, on
Microsoft Azure to compare 10 million protein sequences (4.2GB) from the NCBI
nonredundant protein database.

To complete the computation efficiently, the team allocated 3,700 weighted instances
across three data centers, using 475 extra-large VMs (each with 8-core CPUs, 14GB
RAM, and 2TB storage). The computation, which would have taken 6–7 CPU-years,
was completed in 14 days, producing 260GB of compressed output across 400,000 files.

Observations & Challenges Identified


Post-experiment analysis highlighted several inefficiencies in cloud computing for
scientific applications:
1.Repeated Computation Issues – leading to unnecessary duplicate computations.
2.Resource Utilization Imbalance – Many VMs remained idle for long periods due to
inefficient job scheduling.
3.Instance Failures – About 50% of active instances lost connection to the storage
service,
4.System Updates Causing Failures – leading to data loss and job interruptions.
5.Workflow Complexity – Scientific computing requires multiple binaries and scripts
to execute correctly, which can be challenging for domain scientists to manage.

To address workflow complexity, researchers developed a general-purpose


platform for running legacy Windows applications on Azure.

Azure Infrastructure for Scientific Computing


The platform consists of:
Web roles – Handle front-end tasks and user interactions.
Worker roles – Perform computational tasks and provide APIs for execution.

Azure Storage Services:


Blob Storage (ABS) – Stores large raw datasets.
Table Storage (ATS) – Manages semi-structured data.
Queue Storage (AQS) – Manages message queues for task execution.
To improve efficiency, researchers implemented BigJob, a system that decouples
resource allocation from resource binding, reducing the burden on applications.

3.10 Social Computing, Digital Content, and Cloud Computing


Social computing can be defined as follows:
"Social Computing" refers to systems that support the gathering, representation,
processing, use, and dissemination of information that is distributed across social
collectivities such as teams, communities, organizations, and markets.

Social networks have grown in both size and functionality, making large-scale data
analysis crucial.
Cloud computing enables efficient distribution of computational workloads for
evaluating social closeness, which is highly resource-intensive.

Traditional methods like sampling and surveying are inadequate for large networks.
Social intelligence, involving knowledge discovery and pattern recognition, benefits
from cloud resources.
Case-based reasoning (CBR) is a preferred approach for large-scale recommendation
systems, as it handles data accumulation better than rule-based systems.
The BetterLife 2.0 system demonstrates CBR in social computing. It consists of a cloud
layer, a CBR engine, and an API. Using MapReduce, the system computes pairwise
social closeness, retrieving similar cases efficiently.

CBR follows a reasoning cycle:


retrieve, reuse, revise, and retain cases for future problem-solving.
four key steps:
1.Retrieve – Identify and retrieve the most relevant past cases from memory to solve
the new problem.
2.Reuse – Adapt the solution from the retrieved case to fit the new problem.
3.Revise – Test the proposed solution in the real world or a simulation, making
necessary modificatins.
4.Retain – If the solution is effective, store it as a new case for future reference.

This iterative process allows CBR systems to improve over time by learning from past
experiences.

In the past, social networks have been constructed for a specific application domain like
biology (MyExperiment) and nanoscience (nanoHub), enabling researchers to share
workflows.

Another form of social computing is volunteer computing, allows users to contribute


computing power for projects like SETI@Home and Folding@Home, but such
platforms lack Service Level Agreements (SLAs).

To address this, credit-based models like PlanetLab and middleware solutions like
BOINC provide accountability.

credit-based system in which users earn credits by contributing resources and then
spend those credits when using other resources.

Ex:An architecture designed as a Facebook application for a social cloud . Methods to


get a range of data, including friends, events, groups, application users, profile
information, and photos, are available through a Facebook API.

It uses Facebook Markup Language (FBML) and Facebook JavaScript (FBJS) to


create a distributed infrastructure. Many cloud applications, including Facebook apps,
are hosted on Amazon Web Services (AWS).

digital content Cloud computing provides a flexible and scalable infrastructure for
managing digital content, enabling efficient storage, distribution, and delivery of
various media formats, including documents, images, audio, and videos.

The new technologies supported by cloud computing favor the creation of digital
content.
Data mashups or composite services combine data extracted by different sources;
[Data mashup is the process of combining data from multiple sources into a single data
source.] .
Event-driven mashups, also called Svc, interact through events rather than the
request/response traditional .
To improve reliability, the mashup system uses Java Message Service (JMS) for
asynchronous communication. Famethodult tolerance is achieved through VMware
vSphere, where primary and secondary virtual machines (VMs) run simultaneously. If
one fails, the other seamlessly takes over, ensuring continuous operation.
Social computing, cloud technology, and digital content are deeply interconnected.
Cloud computing enhances data analysis, recommendation systems, volunteer
computing, and social media scalability. It also ensures fault tolerance and service
reliability, making it a crucial backbone of modern digital applications.

3.2 Architectural Styles for Cloud Applications


3.5 The MapReduce Programming Model
MapReduce is a distributed computing model inspired by the Map and Reduce
functions in LISP. It is designed to process large datasets on computing
clusters. The model transforms an input set of <key, value> pairs into an
output set of <key, value> pairs.

map(String key, String value):


//key: document name; value: document contents
for each word w in value:
EmitIntermediate (w, "1");
reduce (String key, Iterator values):
// key: a word; values: a list of counts
int result = 0;
for each v in values:
result += ParseInt (v);
Emit (AsString (result));

The key steps involved in its execution are:

1.Task Initialization

The input data is split into M chunks (typically 16–64 MB each).

N worker nodes are assigned, with one acting as the master.

The master assigns Map and Reduce tasks to available workers.

2.Map Phase

Each Map worker reads its assigned input split and processes it using the
user-defined Map function.

The intermediate <key, value> pairs generated are buffered in memory and
then partitioned into R regions, stored locally on disk.

3.Shuffle and Sort Phase

The master informs the Reduce workers where the intermediate data is stored

Reduce workers retrieve this data via remote procedure calls (RPCs).

The data is sorted by key before processing


4.Reduce Phase

The Reduce function processes each unique key and its associated values.

The final output is written to an output file.

5.Fault Tolerance Mechanism:

The master tracks the state of tasks and pings workers periodically.

If a worker fails, its tasks are reassigned.

The master periodically writes checkpoints of its control data to ensure


recovery in case of failure.

Once all tasks are completed, the master signals the user program that execution is finished.
The system ensures efficient scheduling and fault tolerance, making it robust for
processing large-scale dat

Application

Master instance
2

1 1 7

Map
Segment 1
instance 1 Local disk
Reduce
Segment 2 Map instance 1
instance 2 Local disk Shared
Reduce storage
Segment 3 Map Local disk instance 2
instance 3
Shared
storage
Reduce
3 4 5 instance R 6
Map
Segment M instance M Local disk

Input data Map phase Reduce phase

You might also like