0% found this document useful (0 votes)
147 views58 pages

Unit Iv Programming Model

1) HaLoop is an extension of Hadoop that provides efficient support for iterative computations on large datasets by caching intermediate results between iterations to improve data locality and reduce shuffling costs. 2) Key features of HaLoop include inter-iteration locality, reducer input caching, reducer output caching, and mapper input caching to reuse data across iterations. 3) Twister is another framework that aims to improve performance of iterative MapReduce jobs by allowing static data to be loaded once rather than on each iteration and employing "fat map" tasks to reduce data accesses.

Uploaded by

kavitha sree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views58 pages

Unit Iv Programming Model

1) HaLoop is an extension of Hadoop that provides efficient support for iterative computations on large datasets by caching intermediate results between iterations to improve data locality and reduce shuffling costs. 2) Key features of HaLoop include inter-iteration locality, reducer input caching, reducer output caching, and mapper input caching to reuse data across iterations. 3) Twister is another framework that aims to improve performance of iterative MapReduce jobs by allowing static data to be loaded once rather than on each iteration and employing "fat map" tasks to reduce data accesses.

Uploaded by

kavitha sree
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 58

UNIT IV PROGRAMMING MODEL

Programming Models in Cloud


Computing – Characteristics Strategies
of Cloud Services

Programming Models in Cloud Computing: Everyone heard about the emerging technique cloud
computing in this contemporary world. First of all, what is cloud computing and why it is important?
Generally, cloud computing is the services of computing delivery like storage, networking, analytics,
software, database, and servers over the internet. Thus cloud computing courses is much useful
nowadays in this world for all these purposes. Cloud computing programming is helpful to access all
these above services efficiently.

What is the Programming Models in Cloud


Computing?
Usually, there are some certain strategies to recognize where cloud services will use and how it achieves
for business goals. Particularly, to reach your goals in business are possible through infrastructure as a
service and Platform as a Service, private and public cloud, testing and development, big data analytics
course, file storage, disaster recovery, and backup of the cloud. Through the essential servers and
computer programming of the cloud, it is efficient to access all the facilities offered by this computing
technology. Additionally, cloud course development empowers the user to acquire their application for
quick marketing. To mention that, there is no data loss due to hardware failures because of backups in
the network. Also, it uses the organization’s saving, and remote servers.

Software for Cloud Computing


Generally, many peoples are thinking wrong about the cloud’s software. They think for the creation of
cloud service, it requires only certain platforms rather than software. But the cloud computing uses
software for creating all the services of the cloud like Ubuntu. Usually, the Ubuntu is the solution for
creating a group of machines depends on Ubuntu server which is useful in the services of private or
public hosting cloud.
Characteristics of Programming Models in
Cloud Computing
Customarily, there are main five characteristics of cloud. They are the following:

 On-demand self-service

 Broad Network Access

 Resource Pooling

 Rapid Elasticity

 Measure Service

1. On-demand self-service
With this feature, one can individually establish the automatic server time and network storage capabilities
of Oracle Database Cloud Service. This is possible without interacting with the humans and with the
service providers.

2. Broad Network Access


In this, the competencies are obtainable over the network and accessed via standard mechanisms that
stimulate use by various thin or thick client platforms. For instance, mobile phones, tablets, laptops, and
workstations.

3. Resource Pooling
To serve for multiple consumers, it is necessary to pool the resources of provider’s computing with multi-
talented models. Likewise, this model includes various dynamic resources of physical and virtual that
assigns and reassigns as per the demands of the consumers. For example, the resources include
memory, storage, processing, and network bandwidth.

4. Rapid Elasticity
In some cases, the capabilities can be automatically released. On the other hand, it is essential for quick
inward and outward scaling corresponding with the demand. To the consumer, the capabilities obtainable
for provisioning frequently appear to be limitless and can be adopted in any time for any quantity.

5. Measured Service
Specifically, cloud systems habitually control and enhance the usage of the resource via appropriate to
the type of service through the level of abstraction. This service includes processing, storage, bandwidth,
and active user accounts. To mention that, the usage of the resource is monitor, control, and report that
provides transparency for the provider as well as the consumer utilized services.
Programming Language of Cloud Computing
To point out, the cloud is a concept level of the operating system than the language concept. But the
Programming Models in Cloud Computing programming includes some computer programming
languages for accessing and operating the cloud. Such languages are Perl, learn javascript, AJAX, ASP,
Java, PHP and learn MySQL. Additionally, if you wish to implement a new programming language that is
specifically designed to operate the cloud should optimize for cloud computing and must be easy to learn,
efficient, modern, fast, and powerful. As a result, cloud computing programming, the latest technology is
much useful to improve your career and use to access and operate the cloud.

HALOOP

HaLoop was developed by YingYi Bu, Bill Howe, and Magda Balazinska


auniversity of washington. HaLoop is nothing but an extension of Hadoop
which along with processing of data provides and interesting way to perform
iterative computation on the data.

Architecture
HaLoop is a great extension for Hadoop as it provides support for iterative
application. In order to meet these requirement the changes that are made in
Hadoop to efficiently support Iterative data analysis are:
1.) Providing a new application programming interface to simplify the iterative
expressions.
2.) An automatic generation of Map reduce program by the master node using
a loop control  module until the loop condition is met.
3.) The new task scheduler supports data locality in these application in order
to efficiently perform iterative operations.
4.) The task scheduler and task tracker are modified not only to manage
execution but also manage cache indices on slave module.

Some of the important feature of HaLoop which makes all these feasible are:

1.) Inter-iteration Locality : The major goal of HaLoop is to keep the data for
map and reduce that uses same data on different iteration on the same
machine. Here data is easily cached and is reused for various other
application.

2.) Reducer Input Cache:  HaLoop will cache reducer inputs across all
reducers and create a local index for the cached data. Additionally, the
reducer inputs are cached before each reduce function invocation, so that
tuples in the reducer input cache are sorted and grouped by reducer input key.

3.) Reducer Output Cache: The reducer output cache stores and indexes the
most recent local output on each reducer node. This cache is used to reduce
the cost of evaluating fixpoint termination conditions. That is, if the application
must test the convergence condition by comparing the current iteration output
with the previous iteration output, the reducer output cache enables the
framework to perform the comparison in a distributed fashion.The reducer
output

4.) Mapper Input Cache: HaLoop’s mapper input cache aims to avoid non-


local data reads in mappers during non-initial iterations. In the first iteration, if
a mapper performs a non-local read on an input split, the split will be cached
in the local disk of the mapper’s physical node. Then, with loop-aware task
scheduling, in later iterations, all mappers read data only from local disks,
either from HDFS or from the local file system

Program on HaLoop :
In order to write a HaLoop program we need to follow following steps:
1.) Loop body (as one or more mapreduce pair)
2.) terminating condition and loop invariant data (optional)
3.) Map function to convert input key-value pair into intermediate
(in_key,in_value) pair.
4.) reduce function to produce (out_key,out_value)

To make a termination decision we have introduced a Fixpoint operator to


MapReduce. The Fixpoint operator signals when the distributed computation
should end. Few function supported to figure out these fixed points are:

1.) SetFixedPointThreshold sets a distance between one iteration and the


next.
2.) ResultDistance It is distance between two output from the iteration where
the iteration should stop.
3.) SetMaxNumOfIterations: Additional addition to control the loop statement.
HaLoop terminates job if number of iteration > SetMaxNumOfIterations.
4.) SetIterationInput: associates an input source for each iteration.
5.) AddStepInput: This associates an additional input source with an
intermediate map-reduce pair in the loop body
6.) AddInvariantTable: specifies an input HDFS file which is loop invariant.

PERFORMANCE EVALUATION

 
The figure on the left compares the performance of HaLoop for the iterative algorithms.

Figure 1 sows that HaLoop need not shuffle the constant input from Mappers to Reducers
at every iteration, which explains the significant savings.
Figure 2 shows an overall performance by iteration of HaLoop vs. Hadoop on the Billion
Triple data-set. The figure shows that HaLoop scales better than Hadoop since it can cache
intermediate results from iteration to iteration.

  TWISTER

As we discussed that map reduce is really a robust framework manage large


amount of data. The map reduce framework has to involve a lot of overhead
when dealing with iterative map reduce.Twister is a great framework to
perform iterative map reduce.

Additional functionality:

1.) Static and variable Data : Any iterative algorithm requires a static and
variable data. Variable data are computed with static data (Usually the larger
part of both) to generate another set of variable data. The process is repeated
till a given condition and constrain is met. In a normal map-reduce function
using Hadoop or DryadLINQ the static data are loaded uselessly every time
the computation has to be performed. This is an extra overhead for the
computation. Even though they remain fixed throughout the computation they
have to be loaded again and again.

Twister introduces a “config” phase for both map and reduces to load any
static data that is required. Loading static data for once is also helpful in
running a long running Map/Reduce task       

2.) Fat Map task : To save the access a lot of data the map is provided with
an option of configurable map task, the map task can access large block of
data or files. This makes it easy to add heavy computational weight on the
map side.

3.) Combine operation: Unlike GFS where the output of reducer are stored in
separate files, Twister comes with a new phase along with map reduce called
combine that’s collectively adds up the output coming from all the reducer.

4.) Programming extensions: Some of the additional functions to support


iterative functionality of Twister are:
i) mapReduceBCast(Value value) for sending a single to all map tasks. For
example, the “Value” can be a set of parameters, a resource (file or
executable) name, or even a block of data   

ii) configureMaps(Value[]values) and configureReduce(Value[]values) to
configure map and reduce with additional static data   
                                               
TWISTER ARCHITECTURE

 The Twister is designed to effectively support iterative MapReduce function.


To reach this flexibility it reads data from the local disk of the worker nodes
and handle the intermediate data data in the distributed memory of the
workers mode.

The messaging infrastructure in twister is called broker network and it is


responsible to perform data transfer using publish/subscribe messaging.

Twister has three main entity:


1. Client Side Driver responsible to drive entire MapReduce computation
2. Twister Daemon running on every working node.
3. The broker Network.

Access Data 
1. To access input data  for map task it either reads dta from the local disk of
the worker nodes.
2. receive data directly via the broker network.
They keep all data read as file and having data as native file allows Twister to
pass data directly to any executable. Additionally they allow tool to perform
typical file operations like
(i) create directories, (ii) delete directories, (iii) distribute input files across
worker nodes, (iv) copy a set of resources/input files to all worker nodes, (v)
collect output files from the worker nodes to a given location, and (vi) create
partition-file for a given set of data that is distributed across the worker nodes.

Intermediate Data
The intermediate data are stored in the distributed memory of the worker
node. Keeping the map output in distributed memory enhances the speed of
the computation by sending the output of the map from these memory to
reduces.

Messaging
The use of publish/subscribe messaging infrastructure improves the efficiency
of Twister runtime. It use scalable NaradaBrokering messaging infrastructure
to connect difference Broker network and reduce load on any one of them.

Fault Tolerance
There are three assumption for for providing fault tolerance for iterative
mapreduce:
(i)  failure of master node is rare adn no support is provided for that.
(ii) Independent of twister runtime the communication network can be made
fault tolerant.
(iii) the data is replicated among the nodes of the computation infrastructure.
Based on these assumptions we try to handle failures of map/reduce tasks,
daemons, and worker nodes failures.
PERFORMANCE EVALUATION

Figure 1 above give the comparison of DryadLINQ, Hadoop and Twister while


running an iterative algorithm of SW-G(SW-G is Smith Waterman Gotoh which
calculates the distance between each pair of genes in a given gene
collection)  calculation. We can clearly see that the running time of twister. For
performance analysis we use 768 CPU core cluster with 32 nodes.
Figure 2 results indicate that except for the smallest data set which ran for 343
iterations, for the other two cases Twisterperforms better than the MPI
implementation. Although the test were performed on the same hardware
using the same number of CPU cores,  different software stacks were used to
perform them. The MPI implementation was run on Windows Server 2008
operating system with MPI.NET while the Twister implementation was run on
Red Hat Enterprise 5.4 operating system with JDK 1.6.0.18. A simple
comparison on Java and C#performance for matrix multiplication revealed that
for the hardware we used, Windows/C# version is 14% slower than the
Linux/Java version.

SPINNER
Although HaLoop and Twister are efficient in handling many iterative algorithm
but many machine learning and graph algorithms still perform poorly, due to
those systems’ inability to exploit the (sparse) computational dependencies
present in these tasks
As Spinner refers to the recomputed solution as partial solution so we
differentiation iteration into Bulk iteration, where each iteration produces
completely new partial solution from previous iteration result and Incrmental
Iteration, where each iteration result differs partially from the previous result.

Existing data flow system supports bulk iteration where the whole result is
consumed to get a new result but incremental iteration evolve the result by
adding some data points. This implies adding of a mutable state that is carried
to the new iteration.

Comparing both the iteration:


1.) The bulk iteration is carried out since a termination condition is met while
for incremental iteartion the partial solution s is a set of
data points and the algorithms do not fully recompute si+1 from si, but rather
update si by adding or updating some of its data points.
2.) In terms of performance implication bulk and incremental iterations may
differ significantly. So the incremental iteration will only touch some of the data
points instead of taking the whole data point together.
So Spinner supports both Bulk as well as as incremental iteration to compute
an iterative iteration this gives us to perform the computation more selectively
and efficiently. 

An incremental iteration can be expressed using the bulk iterations with two
data sets (S and W) for the partial solution and a step functions combining u
and
. The step function reads both data sets and computes a new version of S and
W. However, recall that the primary motivation for incremental iterations is to
avoid creating a completely new version of the partial solution, but to apply
point updates instead. The updated partial solution should be implicitly carried
to the next iteration.

PERFORMANCE EVALUATION
Figure 1 Compares the Spinner(Stratosphre) and compare them with other
framework like Spark and Giraffe.

Figure 2 Compares the execution time of running the algorithm on various


websites. The runtime of the algorithm is similar in Spark, Giraph, and
Stratosphere for the small Wikipedia dataset. We were unable to use Spark
and Giraph with the large data-sets, because the number of messages
created exceeds the heap size on each node.

Description:
With the enormous increase in data there has been an urgent requirement to
process these data in a effective manner.These frameworks take the huge
blocks of data and convert them into simple key value pair and make them
easy and modular to analyze. The interesting part in their implementation is
that the framework is capable of performing these task parallel on multiple
Nodes and hence balancing the load to reduce overhead on any single node.

                                                                                    Why Iterative
The MapReduce framework like Hadoop and Dryad has been very successful
in fulfilling the need of the people to analyze huge files and compute data
intensive problems. Although it takes care of many problems but many data
analysis techniques require iterative computations, including PageRank ,
HITS (Hypertext-Induced Topic Search) , recursive relational queries,
clustering, neural-network analysis, social network analysis, and network
traffic analysis.

These techniques have a common trait: data are processed iteratively until the
computation satisfies a convergence or stopping condition. Most of the
iterative algorithm are run once and then output is operated with initial output
to generate the required result. This type of program terminates only when
fixed output is reached i.e the result does not changes from one iteration to
another.

The MapReduce framework does not directly support these iterative data
analysis applications. Instead, programmers must implement iterative
programs by manually issuing multiple MapReduce jobs and orchestrating
their execution using a driver program . in which the data flow takes the form
of a directed acyclic graph of operators. These platforms lack built-in support
for iterative programs.

Hadoop Library from Apache

Hadoop is an Apache open source framework written in java that allows distributed
processing of large datasets across clusters of computers using simple programming
models. The Hadoop framework application works in an environment that provides
distributed storage and computation across clusters of computers. Hadoop is designed
to scale up from single server to thousands of machines, each offering local
computation and storage.

Hadoop Architecture
At its core, Hadoop has two major layers namely −

 Processing/Computation layer (MapReduce), and


 Storage layer (Hadoop Distributed File System).

MapReduce
MapReduce is a parallel programming model for writing distributed applications
devised at Google for efficient processing of large amounts of data (multi-terabyte
data-sets), on large clusters (thousands of nodes) of commodity hardware in a reliable,
fault-tolerant manner. The MapReduce program runs on Hadoop which is an Apache
open-source framework.
Hadoop Distributed File System
The Hadoop Distributed File System (HDFS) is based on the Google File System
(GFS) and provides a distributed file system that is designed to run on commodity
hardware. It has many similarities with existing distributed file systems. However, the
differences from other distributed file systems are significant. It is highly fault-tolerant
and is designed to be deployed on low-cost hardware. It provides high throughput
access to application data and is suitable for applications having large datasets.
Apart from the above-mentioned two core components, Hadoop framework also
includes the following two modules −
 Hadoop Common − These are Java libraries and utilities required by other
Hadoop modules.
 Hadoop YARN − This is a framework for job scheduling and cluster resource
management.

How Does Hadoop Work?


It is quite expensive to build bigger servers with heavy configurations that handle large
scale processing, but as an alternative, you can tie together many commodity
computers with single-CPU, as a single functional distributed system and practically,
the clustered machines can read the dataset in parallel and provide a much higher
throughput. Moreover, it is cheaper than one high-end server. So this is the first
motivational factor behind using Hadoop that it runs across clustered and low-cost
machines.
Hadoop runs code across a cluster of computers. This process includes the following
core tasks that Hadoop performs −
 Data is initially divided into directories and files. Files are divided into uniform
sized blocks of 128M and 64M (preferably 128M).
 These files are then distributed across various cluster nodes for further
processing.
 HDFS, being on top of the local file system, supervises the processing.
 Blocks are replicated for handling hardware failure.
 Checking that the code was executed successfully.
 Performing the sort that takes place between the map and reduce stages.
 Sending the sorted data to a certain computer.
 Writing the debugging logs for each job.
Advantages of Hadoop
 Hadoop framework allows the user to quickly write and test distributed systems.
It is efficient, and it automatic distributes the data and work across the machines
and in turn, utilizes the underlying parallelism of the CPU cores.
 Hadoop does not rely on hardware to provide fault-tolerance and high availability
(FTHA), rather Hadoop library itself has been designed to detect and handle
failures at the application layer.
 Servers can be added or removed from the cluster dynamically and Hadoop
continues to operate without interruption.
 Another big advantage of Hadoop is that apart from being open source, it is
compatible on all the platforms since it is Java based.

Application Mapping: Concepts & Best


Practices for the Enterprise
Modern enterprise IT environments comprise complex system
architecture. IT workloads, apps, and hardware system components
interact and depend on each other to deliver the necessary functionality.
The infrastructure and apps evolve as more lines of code are added to the
codebase that scales IT apps and services to a growing user base, running
on software-defined, virtualized, or cloud-based IT environments.

Discovering these dependencies is critical to understanding how the IT


environment behaves, which touches many aspects of an IT-driven
operational environment. So, in this article, we’ll discuss key concepts in
application mapping. We’ll take a look at:

 Application mapping
 How it works
 Its role in DevOps use cases
 Best practices on mapping application dependencies
What is application mapping?
Application mapping is the process of discovering and identifying the
interactions and interdependencies between application components and
their underlying hardware infrastructure.
To ensure that apps perform optimally, it’s important to discover and map
the underlying dependencies. The technology that enables this capability
is common called “application mapping”, but Application Discovery and
Dependency Mapping (ADDM) is another word for it. Application mapping
solutions are:
A management solution that discovers the relationships of app components
and the underlying components and maps them to deliver a comprehensive
insight into the resources running in the IT infrastructure and their
dependencies.
(See how  IT discovery  &  service mapping  work.)

How application mapping works


Application mapping can be implemented in several ways, ranging from
brainstorming and manual element polling to automated discovery of the
entire IT ecosystem. The techniques can be agent-based or
agentless monitoring, as described below:
Sweep and poll

The agentless monitoring technique is the traditional way of discovering IT


assets: by pinging IP addresses and identifying the responding devices.
The technique identifies app components, devices, and server systems
based on information such as discovery ping rate and device group
information, established based on known device data.
The sweep and poll method is lightweight and allows users to sweep the
entire network from a single connected node location.

The process may be slow for large-scale data centers, which is a drawback


considering that dependencies can change during the process and leave
critical assets undiscovered. Furthermore, discovering app components,
particularly in dynamic virtualized and cloud-based IT environments with
limited visibility and control, may be particularly challenging.

Network monitoring

Network monitoring looks at real-time packet information and captures


accurate data on application dependencies. The Netflow protocol contains
IP traffic information such as:
 Volume
 Path
 Source and destination nodes
 Other IP flow attributes

Using protocols such as Netflow for traffic monitoring has its


disadvantages.

Netflow implementation can impact the performance of devices given its


large bandwidth requirements. To address this issue, the data is sampled
at intervals, which reduces the demand for bandwidth consumption but
also collects fewer packet information since the unsampled data is not
monitored.

Additionally, Netflow containing IP address and TCP port data cannot


differentiate application-level dependencies. As an alternative, data
packets can be captured but still provide only limited information
collected at the time of probing.

Agent on server

With agent-based monitoring, a software component is installed on the


client server to collect data.
A monitoring station at the server side requests the data at periodic
intervals based on a predefined policy. The agent monitors incoming and
outgoing traffic in real-time and can identify the topology changes in
application dependency as they happen.

This capability is particularly suitable for the


dynamic virtualized infrastructure environments. Additionally, they can
differentiate between apps running on the same server instances and the
overall cost of running agents is less than using individual hardware
devices to collect packet data.
However, agents must be installed on every server to ensure complete
visibility. This can ultimately cause the monitoring agents to consume too
much of the computing resources, impacting the server infrastructure’s:

 Overall cost
 Operating performance

Orchestration-level application mapping

Automation and orchestration platforms are becoming increasingly


popular to manage IT environments. These enable IT to provision
resources for specific IT workloads efficiently using advanced software
solutions. These solutions combine multiple automated tasks and
configurations across app components that must utilize the necessary IT
resources. In doing so, the orchestration technologies also keep track of
the app components and the underlying server resources.
Together with Application Performance Monitoring (APM), AIOps, and
other agentless or agent-based technologies, a hybrid discovery and
dependency mapping can both:
 Enable accurate reporting
 Maintain optimal cost and performance of the monitoring systems

Even with the right application monitoring information available,


organizations must be prepared to proactively manage risks and prevent
performance bottlenecks. Three key aspects should be considered in
devising a playbook guideline to achieve these goals:

 Apps and systems. Obtain comprehensive information about the


apps and systems that can be potentially impacted. Continuously
update existing dependency mapping documentation as new features,
app components, and infrastructure resources are deployed.
 Risks and mitigation. Evaluate and prioritize the risks involved.
Analyze vulnerabilities from a security, performance, and cost
perspective. Provide specific guidelines and identify roles and
responsibilities among team members as part of an effective risk
mitigation plan.
 Feedback and iterate. The IT environment is constantly evolving as
new software components are installed and new, scalable
infrastructure resources are provisioned. As organizations shift
workloads from legacy on-premise systems to advanced cloud-based
infrastructure, the IT environment becomes more complex and
dynamic. It is therefore important to follow up with team members,
improve the knowledge base, and invest in the right technology
resources and practices for app discovery and dependency mapping.

DevOps infrastructure challenges


With this understanding of application mapping, let’s take a look at it
inside a DevOps environment. There, application mapping is a crucial
activity.

Let’s review common infrastructure challenges in DevOps environments.

Server consolidation & virtualization

DevOps teams require rapid provisioning and management of IT


resources in a complex mix of hybrid and multi-cloud environments. In
order to consolidate server resources and manage dynamic workloads, IT
needs a comprehensive view of all application dependencies.
According to a BMC-Forrester survey, 56% of the responding IT managers
cited inadequate view into application dependencies as a key challenge to
server consolidation.
In DevOps environments, this means that the administrative burden, cost,
and time spent on server consolidation efforts remains far from optimal
and inadequate to guarantee effective CI/CD practices, a fundamental
DevOps activity.

Incident management challenge

High availability is a critical DevOps goal—one that requires


comprehensive planning and careful system design.
An important characteristic of a highly available system design is the end-
to-end mapping of the incident management practice. Application
discovery and mapping corresponds to this practice, aiming to find
dependencies between software components exposed to risks attributed
to various incidents.
When you know these dependencies, you and your DevOps team can
proactively isolate and contain damages in the event of an incident.
Resource capacity and redundancy can be planned based on the risk
factor.

Another aspect of managing IT incidents within a DevOps organization is


to maintain full control over the infrastructure and empower DevOps
teams to move multi-tier IT environments without having to discover,
map, and rediscover IT environments. However, automation of such
processes is only possible when you already have full visibility into
application dependencies.
Rework & waste processes

DevOps is all about cutting down on waste processes. According to


research, more than 75% of the responding organizations cite rework
rates of 11% or higher. This trend suggests a huge disconnect
between operations and infrastructure processes and teams.
DevOps teams can take advantage of application mapping to guide
decisions based on the value created in the process of mapping
application components. The process requires teams to understand the
implications of interactions taking place at various levels of the
infrastructure.

For instance, operating a hybrid mix of private and public cloud and data
center deployments makes sense if your goal is to minimize cost.
However, adding too many high availability nodes may prevent
standardization in node configurations. Managing change controls in such
environments could be more expensive and add to the administrative
burden—exactly what you were trying to avoid in the first place.

With application mapping, you’re better suited to handle this challenge. By


automating dependency discovery, application mapping helps you
understand how various infrastructure components interact with the
software running on top of them.

Regulatory compliance

Proactive and accurate identification of compliance issues is only possible


as long as the interactions with application components are both:

 Discoverable
 In compliance with regulatory policy frameworks adopted for
sensitive IT workloads
Compliance is an ongoing effort that also requires organizations to
identify gaps, prioritize risk, and track compliance progress with every
change. A change control management practice would traditionally
require IT to track dependency changes and apply the necessary changes.
A correct standard operating procedure would involve automated
triggering of configuration changes in adherence to a compliance policy
that remains consistent regardless of the application changes.
According to research, however, more than a third of organizations cannot
track assets or resort to manual asset tracking capabilities—risking
compliance failure.

Application mapping best practices: reduce dependencies


The following best practices can help DevOps teams manage challenges
associated with application mapping by—yes—avoiding dependencies in
the first place:

 Account for various forms of dependencies. Dependencies aren’t


limited to technology. Take various forms of dependencies into
account, including hardware, software, versions, and especially in
DevOps teams, soft dependencies such as business cost and culture.
 Develop agnostic, defensible code. Avoid interfaces that are
expected to deprecate in the near future versions. Encourage and
develop with code that’s agnostic to operating systems.
 Skip proprietary dependencies when possible. Avoid
dependencies on libraries and functions limited to specific versions of
proprietary technologies.
 Ensure that all dependencies are testable. Integrations with
internal and external software should be visible and controllable.
 Ensure the infrastructure can be tested at all levels. Third-party
cloud vendors often offer limited visibility into resources running their
SaaS applications.
 Enhance observability and information flow across the entire IT
infrastructure. This should include both the testing and production
environments.
 Use advanced application mapping technologies. Common
mapping tools include BMC Helix
Discovery, SolarWinds, Dynatrace and AppDynamics, among others.
Most cloud vendors including AWS, Azure and Google Cloud also offer
technology capabilities that enable application dependency mapping.

BMC supports application mapping


BMC supports application mapping in enterprise environments. BMC Helix
Discovery is a SaaS-based, cloud-native discovery and dependency
modeling system that provides instant visibility into hardware, software,
and service dependencies across multi-cloud, hybrid, and on-premises
environments.

Programming Support for Google Apps engine

Google App Engine (GAE) is a Platform as a Service (PaaS) cloud-based Web hosting
service on Google's infrastructure. For an application to run on GAE, it must comply with
Google's platform standards, which narrows the range of applications that can be run
and severely limits those applications' portability.
GAE supports the following major features:
1. Dynamic Web services based on common standards
2. Automatic scaling and load balancing
3. Authentication using Google's Accounts API
4. Persistent storage, with query access sorting and transaction management
features
5. Task queues and task scheduling
6. A client-side development environment for simulating GAE on your local system
7. One of either two runtime environments: Java or Python
Google File System:
 Abbreviated as GFS, a Global File System is a cluster file system that enables a
cluster of computers to simultaneously use a block device that is shared between
them.
 GFS reads and writes to the block device like a local file system, but also allows
the computers to coordinate their I/O to maintain file system consistency.
 With GFS any changes that are made to the file system on one computer will
immediately be seen on all other computers in that cluster.
 GFS provides fault tolerance, reliability, scalability, availability and performance
to large networks and connected nodes. GFS is made up of several storage
systems built from low-cost commodity hardware components.
 It is optimized to accommodate Google's different data use and storage needs,
such as its search engine, which generates huge amounts of data that must be
stored.
Big Tables and Google NO SQL System:
 Google Cloud Bigtable is a productized version of the NoSQL database that
stores Google's bits and bytes.
 The big selling point is it doesn't require the maintenance traditionally needed for
compatible on-prem NoSQL solutions.
 Bigtable is a compressed, high performance, and proprietary data storage
system built on Google File System, Chubby Lock Service and a few other
Google technologies.
 Bigtable maps two arbitrary string values (row key and column key) and
timestamp (hence three-dimensional mapping) into an associated arbitrary byte
array.
 It is not a relational database and can be better defined as a sparse, distributed
multi-dimensional sorted map.
 Bigtable is designed to scale into the petabyte range across "hundreds or
thousands of machines, and to make it easy to add more machines [to] the
system and automatically start taking advantage of those resources without any
reconfiguration".
Google’s Distributed Lock Service (Chubby):
 Chubby is a distributed lock service intended for coarse-grained synchronization
of activities within Google's distributed systems.
 Chubby has become Google's primary internal name service; it is a common
rendezvous mechanism for systems such as MapReduce; the storage systems
GFS and Bigtable use Chubby to elect a primary from redundant replicas; and it
is a standard repository for files that require high availability, such as access
control lists.
 Chubby is a relatively heavy-weight system intended for coarse-grained locks,
locks held for "hours or days", not "seconds or less."

Amazon AWS

What is AWS? Amazon Cloud (Web)


Services Tutorial

What is Cloud Computing?


Cloud computing is a term referred to storing and accessing data over the
internet. It doesn’t store any data on the hard disk of your personal computer.
In cloud computing, you can access data from a remote server.

What is AWS?
The full form of AWS is Amazon Web Services. It is a platform that offers
flexible, reliable, scalable, easy-to-use and, cost-effective cloud computing
solutions.

AWS is a comprehensive, easy to use computing platform offered Amazon.


The platform is developed with a combination of infrastructure as a service
(IaaS), platform as a service (PaaS) and packaged software as a service
(SaaS) offerings.

In this tutorial, you will learn,

 What is Cloud Computing?


 What is AWS?
 History of AWS
 Important AWS Services
 Applications of AWS services
 Companies using AWS
 Advantages of AWS
 Disadvantages of AWS
 Best practices of AWS

History of AWS
 2002- AWS services launched
 2006- Launched its cloud products
 2012- Holds first customer event
 2015- Reveals revenues achieved of $4.6 billion
 2016- Surpassed $10 billon revenue target
 2016- Release snowball and snowmobile
 2019- Offers nearly 100 cloud services
 2021- AWS comprises over 200 products and services

Important AWS Services


Amazon Web Services offers a wide range of different business purpose
global cloud-based products. The products include storage, databases,
analytics, networking, mobile, development tools, enterprise applications, with
a pay-as-you-go pricing model.
Important AWS Services

Here, are essential AWS services.

AWS Compute Services


Here, are Cloud Compute Services offered by Amazon:
Learn Java Programming with Beginners Tutorial

1. EC2(Elastic Compute Cloud)- EC2 is a virtual machine in the cloud on which


you have OS level control. You can run this cloud server whenever you want.
2. LightSail- This cloud computing tool automatically deploys and manages the
computer, storage, and networking capabilities required to run your
applications.
3. Elastic Beanstalk- The tool offers automated deployment and provisioning of
resources like a highly scalable production website.
4. EKS (Elastic Container Service for Kubernetes)- The tool allows you
to Kubernetes on Amazon cloud environment without installation.
5. AWS Lambda- This AWS service allows you to run functions in the cloud.
The tool is a big cost saver for you as you to pay only when your functions
execute.

Migration
Migration services used to transfer data physically between your datacenter and AWS.

1. DMS (Database Migration Service)– DMS service can be used to migrate on-
site databases to AWS. It helps you to migrate from one type of database to
another — for example, Oracle to MySQL.
2. SMS (Server Migration Service)– SMS migration services allows you to
migrate on-site servers to AWS easily and quickly.
3. Snowball— Snowball is a small application which allows you to transfer
terabytes of data inside and outside of AWS environment.

Storage
1. Amazon Glacier- It is an extremely low-cost storage service. It offers secure
and fast storage for data archiving and backup.
2. Amazon Elastic Block Store (EBS)- It provides block-level storage to use
with Amazon EC2 instances. Amazon Elastic Block Store volumes are
network-attached and remain independent from the life of an instance.
3. AWS Storage Gateway- This AWS service is connecting on-premises
software applications with cloud-based storage. It offers secure integration
between the company’s on-premises and AWS’s storage infrastructure.

Security Services
1. IAM (Identity and Access Management)— IAM is a secure cloud security
service which helps you to manage users, assign policies, form groups to
manage multiple users.
2. Inspector— It is an agent that you can install on your virtual machines, which
reports any security vulnerabilities.
3. Certificate Manager— The service offers free SSL certificates for your
domains that are managed by Route53.
4. WAF (Web Application Firewall)— WAF security service offers application-
level protection and allows you to block SQL injection and helps you to block
cross-site scripting attacks.
5. Cloud Directory— This service allows you to create flexible, cloud-native
directories for managing hierarchies of data along multiple dimensions.
6. KMS (Key Management Service)— It is a managed service. This security
service helps you to create and control the encryption keys which allows you to
encrypt your data.
7. Organizations— You can create groups of AWS accounts using this service to
manages security and automation settings.
8. Shield— Shield is managed DDoS (Distributed Denial of Service protection
service). It offers safeguards against web applications running on AWS.
9. Macie— It offers a data visibility security service which helps classify and
protect your sensitive critical content.
10.GuardDuty— It offers threat detection to protect your AWS accounts and
workloads.

Database Services
1. Amazon RDS- This Database AWS service is easy to set up, operate, and scale
a relational database in the cloud.
2. Amazon DynamoDB- It is a fast, fully managed NoSQL database service. It is
a simple service which allow cost-effective storage and retrieval of data. It also
allows you to serve any level of request traffic.
3. Amazon ElastiCache- It is a web service which makes it easy to deploy,
operate, and scale an in-memory cache in the cloud.
4. Neptune- It is a fast, reliable and scalable graph database service.
5. Amazon RedShift- It is Amazon’s data warehousing solution which you can
use to perform complex OLAP queries.
Analytics
1. Athena— This analytics service allows perm SQL queries on your S3 bucket
to find files.
2. CloudSearch— You should use this AWS service to create a fully managed
search engine for your website.
3. ElasticSearch— It is similar to CloudSearch. However, it offers more features
like application monitoring.
4. Kinesis— This AWS analytics service helps you to stream and analyzing real-
time data at massive scale.
5. QuickSight— It is a business analytics tool. It helps you to create
visualizations in a dashboard for data in Amazon Web Services. For example,
S3, DynamoDB, etc.
6. EMR (Elastic Map Reduce)— This AWS analytics service mainly used for
big data processing like Spark, Splunk, Hadoop, etc.
7. Data Pipeline— Allows you to move data from one place to another. For
example from DynamoDB to S3.

Management Services
1. CloudWatch— Cloud watch helps you to monitor AWS environments like
EC2, RDS instances, and CPU utilization. It also triggers alarms depends on
various metrics.
2. CloudFormation— It is a way of turning infrastructure into the cloud. You
can use templates for providing a whole production environment in minutes.
3. CloudTrail— It offers an easy method of auditing AWS resources. It helps
you to log all changes.
4. OpsWorks— The service allows you to automated Chef/Puppet deployments
on AWS environment.
5. Config— This AWS service monitors your environment. The tool sends alerts
about changes when you break certain defined configurations.
6. Service Catalog— This service helps large enterprises to authorize which
services user will be used and which won’t.
7. AWS Auto Scaling— The service allows you to automatically scale your
resources up and down based on given CloudWatch metrics.
8. Systems Manager— This AWS service allows you to group your resources. It
allows you to identify issues and act on them.
9. Managed Services— It offers management of your AWS infrastructure which
allows you to focus on your applications.

Internet of Things
1. IoT Core— It is a managed cloud AWS service. The service allows connected
devices?like cars, light bulbs, sensor grids, to securely interact with cloud
applications and other devices.
2. IoT Device Management— It allows you to manage your IoT devices at any
scale.
3. IoT Analytics— This AWS IOT service is helpful to perform analysis on data
collected by your IoT devices.
4. Amazon FreeRTOS— This real-time operating system for microcontrollers
helps you to connect IoT devices in the local server or into the cloud.

Application Services
1. Step Functions— It is a way of visualizing what’s going inside your
application and what different microservices it is using.
2. SWF (Simple Workflow Service)— The service helps you to coordinate both
automated tasks and human-led tasks.
3. SNS (Simple Notification Service)— You can use this service to send you
notifications in the form of email and SMS based on given AWS services.
4. SQS (Simple Queue Service)— Use this AWS service to decouple your
applications. It is a pull-based service.
5. Elastic Transcoder— This AWS service tool helps you to changes a video’s
format and resolution to support various devices like tablets, smartphones, and
laptops of different resolutions.

Deployment and Management


1. AWS CloudTrail: The services records AWS API calls and send backlog files
to you.
2. Amazon CloudWatch: The tools monitor AWS resources like Amazon EC2
and Amazon RDS DB Instances. It also allows you to monitor custom metrics
created by user’s applications and services.
3. AWS CloudHSM: This AWS service helps you meet corporate, regulatory,
and contractual, compliance requirements for maintaining data security by
using the Hardware Security Module(HSM) appliances inside the AWS
environment.

Developer Tools
1. CodeStar— Codestar is a cloud-based service for creating, managing, and
working with various software development projects on AWS.
2. CodeCommit— It is AWS’s version control service which allows you to store
your code and other assets privately in the cloud.
3. CodeBuild— This Amazon developer service help you to automates the
process of building and compiling your code.
4. CodeDeploy— It is a way of deploying your code in EC2 instances
automatically.
5. CodePipeline— It helps you create a deployment pipeline like testing,
building, testing, authentication, deployment on development and production
environments.
6. Cloud9— It is an Integrated Development Environment for writing, running,
and debugging code in the cloud.

Mobile Services
1. Mobile Hub— Allows you to add, configure and design features for mobile
apps.
2. Cognito— Allows users to signup using his or her social identity.
3. Device Farm— Device farm helps you to improve the quality of apps by
quickly testing hundreds of mobile devices.
4. AWS AppSync— It is a fully managed GraphQL service that offers real-time
data synchronization and offline programming features.

Business Productivity
1. Alexa for Business— It empowers your organization with voice, using Alexa.
It will help you to Allows you to build custom voice skills for your
organization.
2. Chime— Can be used for online meeting and video conferencing.
3. WorkDocs— Helps to store documents in the cloud
4. WorkMail— Allows you to send and receive business emails.
Desktop & App Streaming
1. WorkSpaces— Workspace is a VDI (Virtual Desktop Infrastructure). It allows
you to use remote desktops in the cloud.
2. AppStream— A way of streaming desktop applications to your users in the
web browser. For example, using MS Word in Google Chrome.

Artificial Intelligence
1. Lex— Lex tool helps you to build chatbots quickly.
2. Polly— It is AWS’s text-to-speech service allows you to create audio versions
of your notes.
3. Rekognition — It is AWS’s face recognition service. This AWS service helps
you to recognize faces and object in images and videos.
4. SageMaker— Sagemaker allows you to build, train, and deploy machine
learning models at any scale.
5. Transcribe— It is AWS’s speech-to-text service that offers high-quality and
affordable transcriptions.
6. Translate— It is a very similar tool to Google Translate which allows you to
translate text in one language to another.

AR & VR (Augmented Reality & Virtual Reality)


1. Sumerian— Sumerian is a set of tool for offering high-quality virtual reality
(VR) experiences on the web. The service allows you to create interactive 3D
scenes and publish it as a website for users to access.

Customer Engagement
1. Amazon Connect— Amazon Connect allows you to create your customer care
center in the cloud.
2. Pinpoint— Pinpoint helps you to understand your users and engage with them.
3. SES (Simple Email Service)— Helps you to send bulk emails to your
customers at a relatively cost-effective price.

Game Development
1. GameLift– It is a service which is managed by AWS. You can use this service
to host dedicated game servers. It allows you to scale seamlessly without taking
your game offline.
Applications of AWS services
Amazon Web services are widely used for various computing purposes like:

 Web site hosting


 Application hosting/SaaS hosting
 Media Sharing (Image/ Video)
 Mobile and Social Applications
 Content delivery and Media Distribution
 Storage, backup, and disaster recovery
 Development and test environments
 Academic Computing
 Search Engines
 Social Networking

Companies using AWS


 Instagram
 Netflix
 Twitch
 LinkedIn
 Facebook
 Turner Broadcasting: $10 million
 Zoopla
 Smugmug
 Pinterest
 Dropbox

Advantages of AWS
Following are the pros of using AWS services:
 AWS allows organizations to use the already familiar programming models,
operating systems, databases, and architectures.
 It is a cost-effective service that allows you to pay only for what you use,
without any up-front or long-term commitments.
 You will not require to spend money on running and maintaining data centers.
 Offers fast deployments
 You can easily add or remove capacity.
 You are allowed cloud access quickly with limitless capacity.
 Total Cost of Ownership is very low compared to any private/dedicated servers.
 Offers Centralized Billing and management
 Offers Hybrid Capabilities
 Allows you to deploy your application in multiple regions around the world
with just a few clicks

Disadvantages of AWS
 If you need more immediate or intensive assistance, you’ll have to opt for paid
support packages.
 Amazon Web Services may have some common cloud computing issues when
you move to a cloud. For example, downtime, limited control, and backup
protection.
 AWS sets default limits on resources which differ from region to region. These
resources consist of images, volumes, and snapshots.
 Hardware-level changes happen to your application which may not offer the
best performance and usage of your applications.

Best practices of AWS


 You need to design for failure, but nothing will fail.
 It’s important to decouple all your components before using AWS services.
 You need to keep dynamic data closer to compute and static data closer to the
user.
 It’s important to know security and performance tradeoffs.
 Pay for computing capacity by the hourly payment method.
 Make a habit of a one-time payment for each instance you want to reserve and
to receive a significant discount on the hourly charge.

Cloud Environment

Introduction to the Cloud


Derrick Rountree, Ileana Castrillo, in The Basics of Cloud Computing, 2014
Cost
Cloud environments can be a source of reduced cost. One of the biggest cost savings is the
transition from capital expense to operational expense. When setting up a traditional
environment, the infrastructure and equipment have to be purchased ahead of time. This
equipment is usually purchased as part of an organization’s capital budget. In a cloud
environment, you don’t have to worry about purchasing the equipment; you only pay for the
service. The cost of the service will usually count against an organization’s operational budget.
Generally, it’s easier to get operational expenses approved than to get capital expenses approved.
In addition, traditional cloud environments are built using utility storage and utility computing.
These are generally cheaper than more specialized components.
View chapter Purchase book

Emerging Security Challenges in Cloud Computing, from Infrastructure-Based


Security to Proposed Provisioned Cloud Infrastructure
Mohammad Reza Movahedisefat, ... Davud Mohammadpur, in Emerging Trends in ICT
Security, 2014
Introduction
In cloud environments, one of the most pervasive and fundamental challenges for organizations
in demonstrating policy compliance is proving that the physical and virtual infrastructure of the
cloud can be trusted—particularly when those infrastructure components are owned and
managed by external service providers.
For many business functions commonly run in the cloud—hosting websites and wikis, for
example—it’s often sufficient to have a cloud provider vouch for the security of the underlying
infrastructure. For business-critical processes and sensitive data, however, third-party attestations
usually aren’t enough. In such cases, it’s absolutely essential for organizations to be able to
verify for themselves that the underlying cloud infrastructure is secure.
The next frontier in cloud security and compliance will be to create transparency at the bottom-
most layers of the cloud by developing the standards, tools, and linkages to monitor and prove
that the cloud’s physical and virtual machines are actually performing as they should. Verifying
what’s happening at the foundational levels of the cloud is important for the simple reason that,
if organizations can’t trust the safety of their computing infrastructure, the security of all the
data, software, and services running on top of that infrastructure falls into doubt. There’s
currently no easy way for organizations to monitor actual conditions and operating states within
the hardware, hypervisors, and virtual machines comprising their clouds. At those depths, we go
dark.
Cloud providers and the IT community are already preparing to address this problem. Groups of
technology companies have banded together to develop a new, interoperable, and highly secure
computing infrastructure for the cloud based on a “hardware root of trust,” which provides
tamperproof measurements of every physical and virtual component in the entire computing
stack, including the hypervisor. Members of the IT community are exploring ways to use these
measurements to improve visibility, control, and compliance in the cloud.
They’re collaborating on a conceptual IT framework to integrate the secure measurements
provided by a hardware root of trust into adjoining hypervisors and virtualization
management software. The resulting infrastructure stack would be tied into data analysis tools
and a governance, risk, and compliance (GRC) console, which would contextualize conditions in
the cloud’s hardware and virtualization layers to present a reliable assessment of an
organization’s overall security and compliance posture. This type of integrated hardware-
software framework would make the lowest levels of the cloud’s infrastructure as inspectable,
analyzable, and reportable for compliance as the cloud’s top-most application services layer.
As we mentioned above, many communities have already adopted the “cloud,” a flexible
computational platform allowing scalability and a service-based provision model.
Unfortunately, there are currently significant limitations when using a cloud infrastructure to
perform security-critical computations and/or store sensitive data. Specifically, at the moment,
there is no way to guarantee the trustworthiness of a Virtual Machine (VM) in terms of its origin
and identity and the trustworthiness of the data uploaded and managed by the Elastic Block
Storage or the Simple Storage Service (S3). These limitations made us to propose a macro-level
solution for identified common infrastructure security requirements and design a hybrid model
for on-demand infrastructure services provisioning.
Because of these limitations, public cloud computing uptake by business-critical communities is
limited. A number of communities whose emerging information models appear otherwise well-
suited to cloud computing are forced to either avoid the pay-per-use model of service provision
or deploy a private cloud infrastructure. Deploying a private cloud is rarely a desirable solution.
It requires an extended time frame and significant investment in hardware, management, and
software resources. These limitations also apply to the deployment of a private cloud based
on open source software because, while licensing costs are eliminated, the bulk of the investment
in hardware and support resources is still required.
This chapter presents recent results of the ongoing research on developing an architecture and a
framework for dynamically provisioned security services as part of the provisioned on-demand
cloud-based infrastructure services. It shows that the proposed model, with a number of emerged
patterns, can be applied to the infrastructure aspect of cloud computing as a proposed shared
security approach in a system development life cycle focusing on the plan-build-run scope. Some
of this information was adopted from Cloud Security and Privacy [1].
View chapter Purchase book

Legal
John Sammons, in The Basics of Digital Forensics (Second Edition), 2015
International e-Discovery
With the cloud environment and data regularly flying across borders, international electronic
discovery is becoming an issue. Not every country has the same views on privacy or the same
legal standards and procedures for discovery. As a result, gaining access to data in a foreign
country is very complex. The Sedona Conference’s Framework for Analysis of Cross-Border
Discovery Conflicts: A Practical Guide to Navigating the Competing Currents of International
Data Privacy and e-Discovery is an excellent introduction to the complexities involved in
international e-Discovery. You can download it for free
from http://www.thesedonaconference.org/.
View chapter Purchase book

The Cloud Threat Landscape


Raj Samani, ... Jim Reavis, in CSA Guide to Cloud Computing, 2015
Insecure Interfaces and APIs
APIs within cloud environments are used to offer end customers software interfaces to interact
with their provisioned services. There are multitudes of APIs available within a cloud
environment; these can include provisioning new hardware and monitoring the cloud services, as
just two examples. According to API Management Company, Mashery, there exist three
categories of Cloud APIs26; these are

Control APIs: APIs that allow the end customer to configure their cloud provisioned
service. Amazon EC2 provides a multitude of APIs that allow customers to configure
their services, as defined within the Amazon Elastic Compute Cloud: API
Reference.27 Examples include the allocation of internet protocol (IP) addresses,
creating/editing of access control lists, or monitoring of specific instances.

Data APIs: APIs within which data may flow into or out of the provisioned service. Such
data flows can also be into alternate cloud providers, so that data can flow from one
provider and into the provisioned service provided by an alternate provider.

Application functionality APIs: Although the earlier APIs provide the ability to transfer
data between alternate providers, or indeed management of the overall solution, the
application functionality APIs can provide considerably more functionality that the end
customer can interact with, ranging from the simple availability of shopping baskets to
integration with social networking solutions, and considerably more in between.
While the flexibility of cloud APIs is not in question, and indeed depending on the source
considered one of the driving forces behind the widespread adoption of cloud computing, there
does remain considerable security considerations.
Indeed, these security considerations may not even be malicious, whereby an administrator may
inadvertently invoke an action that may have significant repercussions. Consider the command
available for EC2 customers entitled ec2-terminate-instances. As you can likely guess, this
command will terminate an EC2 instance, the implication of this action is that the data stored
within the instance will also be deleted.
In order to reduce the risk of such an action being inadvertently carried out, there is an
opportunity to implement a safeguard to prevent inadvertent deletion using a feature available
through the AWS console, command line interface, or API. Such a feature provides protection
against termination with the DisableApiTermination attribute; this controls whether an instance
can indeed be terminated using the console, Command Line Interface, or an API.
While such a feature, or rather attribute, is an important step in preventing accidental deletion of
a particular instance, it is only one example of where an accidental action can have significant
repercussions. A simple error such as mistyping the IP address for an instance is equally likely to
result in the unavailability of the provisioned service, and does not have the luxury of an attribute
to protect against the error. While of course the latter example is a simpler fix than the deletion
of an instance, these examples do demonstrate some of the challenges facing the use of cloud
APIs.
Other challenges facing cloud end customers, and their use of APIs, are also malicious attempts
to circumvent authorized process. In a recent article published by DarkReading,28 author Rob
Lemos presents the security risks API keys present to their end customers. Such keys are utilized
to identify applications utilizing provisioned services; however, should such keys fall into the
hands of malicious actors they can be used to capture confidential data or rack up fees for the end
customer. The issue has arisen not due to a weakness in the keys themselves, but rather the
manner in which they are managed, whereby in particular implementations they are used to
identify users, and as such are not protected by developers as assets that are critical to the
business with examples of them being e-mailed and being stored on desktop hard drives.
Recently, the CSA chapter Switzerland (https://chapters.cloudsecurityalliance.org/switzerland)
held a chapter meeting focusing entirely on service orientated architecture as it relates to cloud
computing in which coauthor Raj Samani recently spoke. This meeting focused on the security
challenges relating to APIs within a cloud environment and presented emerging research within
this field. Emerging areas of research include the use of technology to enforce access policy, and
governance rules as they pertain to the use of APIs. It is therefore recommended for the reader to
coordinate with the chapter should they wish to get more detailed information about this very
important (and sadly not hugely researched) topic.
View chapter Purchase book

Revisiting VM performance and optimization challenges for big data


Muhammad Ziad Nayyer, ... Syed Asad Hussain, in Advances in Computers, 2019
2.2.3 Shared resources
In the cloud environment, the services are provided in a shared model. Resources such as CPU
cache, disk I/O and buffer, network I/O and memory bandwidth are shared among VMs.
However, nature of these shared resources in a data center is changed as compared to a single
server. The most critical shared resource in a data center is network bandwidth and I/O. The
challenge of network bandwidth and I/O contention arises mostly at network storage level due to
multiple VMs sending requests simultaneously. The I/O contention is dependent upon the
volume of data coming in and going out of the storage device. In case of big data, where data are
stored on multiple shared devices, the I/O contention will be at every device level resulting in a
delayed operation. The latest statistics by Amazon EC2 confirm that the throughput variation
faced by Standard Medium Instances (SMIs) can be up to 65% of the total network I/O [33], and
the write I/O bandwidth can vary by as much as 50% from the mean [34].
View chapter Purchase book

How Elasticity Property Plays an Important Role in the Cloud


M.A.N. Bikas, ... M. Grechanik, in Advances in Computers, 2016
Abstract
In a cloud environment, consumers can deploy and run their software applications on a
sophisticated infrastructure that is owned and managed by a cloud provider (eg, Amazon Web
Services, Microsoft Azure, and Google Cloud Platform). Cloud users can acquire resources for
their applications on demand, and they have to pay only for the consumed resources. In order to
take this advantage of cloud computing, it is vital for a consumer to determine if the cloud
infrastructure can rapidly change the type and quantity of resources allocated to an application in
the cloud according to the application's demand. This property of the cloud is known as
elasticity. Ideally, a cloud platform should be perfectly elastic; ie, the resources allocated to an
application exactly match the demand. This allocation should occur as the load to the application
increases, with no degradation of application's response time, and a consumer should pay only
for the resources used by the application. However, in reality, clouds are not perfectly elastic.
One reason for that is it is difficult to predict the elasticity requirements of a given application
and its workload in advance, and optimally match resources with the applications’ needs. In this
chapter, we investigate the elasticity problem in the cloud. We explain why it is still a
challenging problem to solve and consider what services current cloud service providers are
offering to maintain the elasticity in the cloud. Finally, we discuss the existing research that can
be used to improve elasticity in the cloud.
View chapter Purchase book

Literature Review and Problem Analysis


Xiao Liu, ... Jinjun Chen, in Temporal QOS Management in Scientific Cloud Workflow
Systems, 2012
2.1 Workflow Temporal QoS
In a cloud environment, there are a large number of similar or equivalent resources provided by
different service providers. Cloud service users can select suitable resources and deploy them for
cloud workflow applications. These resources may provide the same functionality but optimise
different QoS measures [86]. Meanwhile, different service users or applications may have
different expectations and requirements. Therefore, it is insufficient for a scientific cloud
workflow system to consider only functional characteristics of workflow applications. QoS
requirements such as time limits (temporal constraints) and budget limits (cost constraints) for
cloud workflow execution also need to be managed by scientific cloud workflow systems.
Service users must be able to specify their QoS requirements of scientific cloud workflows at
build time. Then, the actions taken by the cloud workflow systems at run-time must be chosen
according to the original QoS requirements.
Generally speaking, there are five basic dimensions for cloud workflow QoS, namely time, cost,
fidelity, reliability and security [51,103][51][103]. Time is a basic measurement of system
performance [2,3,54][2][3][54]. For workflow systems, the makespan often refers to the total
time overheads required for completing the execution of a workflow. The total cost often refers
to the monetary cost associated with the execution of workflows including such factors as the
cost of managing workflow systems and the usage charge of cloud resources for processing
workflow activities. Fidelity refers to the measurement related to the quality of the output of
workflow execution. Reliability is related to the number of failures of workflows. Security refers
to confidentiality of the execution of workflow tasks and the trustworthiness of resources.
Among them, time, as a basic measurement of performance and general non-functional
requirement, has attracted most of the attention from researchers and practitioners in such areas
as Software Engineering [89], Parallel and Distributed Computing [48,76][48][76] and Service-
Orientated Architectures [38]. In this book, we focus on time, i.e. we investigate the support of
high temporal QoS in scientific cloud workflow systems.
In the real world, most scientific processes are assigned specific deadlines in order to achieve
their scientific targets on time. For those processes with deterministic process structures and fully
controlled underlying resources, individual activity durations, i.e. the completion time of each
activity, are predictable and stable. Therefore, process deadlines can normally be satisfied
through a build-time static scheduling process with resource reservation in advance [39,105][39]
[105]. However, stochastic processes such as scientific workflows are characterised with
dynamic changing process structures due to the nature of scientific investigation. Furthermore,
with a vast number of data and computation intensive activities, complex workflow applications
are usually deployed on dynamic high-performance computing infrastructures, e.g. cluster, peer-
to-peer, grid and cloud computing [5,70,97,100][5][70][97][100]. Therefore, ensuring cloud
workflow applications are finished within specific deadlines is a challenging issue. In fact, this is
why temporal QoS is emphasised more in large-scale distributed workflow applications
compared with traditional centralised workflow applications [2].
In the following sections, we will introduce the current work related to temporal QoS in cloud
and conventional workflow systems.
View chapter Purchase book

Application Management in the Cloud


Rick Sturm, ... Julie Craig, in Application Performance Management (APM) in the Digital
Enterprise, 2017
Summary
The use of public and private cloud environments by organizations around the world continues to
grow at a rapid pace and shows no signs of abating in the near future. The financial benefits of
cloud computing will continue to drive its adoption. However, there is a tradeoff for those
savings. In the IaaS and PaaS environments, the customers have the ability to monitor and
control the environment and the applications running in it. Cloud computing brings unique
challenges for managing the applications that run in those environments.
Application management in cloud environments is multifaceted. There are at least four types of
cloud environments: private cloud, public cloud (SaaS), public cloud (PaaS), and public cloud
(IaaS). In each of the public cloud environments there is a user (i.e., the customer’s IT
department) dimension to the question of application management. There is also a service
provider aspect to application management. The capabilities and responsibilities are different in
each instance. Unless the service provider offers some reporting, SaaS or PaaS environments are
essentially black boxes. In IaaS and private cloud environments, the customer’s IT department is
responsible for the active management of the applications and is able to install the appropriate
tools that allow them to do that.
Manage Application Manage Application Monitor End User
Performance Availability Experience

Private Customer Customer Customer

SaaS Service provider Service provider Customer


Manage Application Manage Application Monitor End User
Performance Availability Experience

PaaS Service provider Service provider Customer

IaaS Customer Customer Customer

View chapter Purchase book

Network Isolation
Zonghua Zhang, Ahmed Meddahi, in Security in Network Functions Virtualization, 2017
Abstract:
Virtual datacenters in cloud environment become increasingly popular and widely used for many
types of business service. In particular, it leverages standardization and consolidation of
commodity hardware to allow effective and safe sharing of pooled resources. Through a
hypervisor-based mechanism, it is able to isolate the compute resources between the tenants that
are co-located on the same end host. However, resource sharing brings new challenges and
security issues, mainly due to the fact that the tenants do not have full control over both
underlying infrastructure and physical, virtual network resources. Thus, malicious attackers are
given opportunities to get the information of the tenants of interest by intentionally or
unintentionally consuming a large part of the network, intrusively trapping their data and further
performing illegal operations through side-channel attacks or DoS attacks. One of the important
solutions is network isolation, which has been taken as an essential building block for improving
security level as well as ensuring security control in resource sharing and data communication.
View chapter Purchase book

Application Migration
Tom Laszewski, Prakash Nauduri, in Migrating to the Cloud, 2012
Amazon and Oracle Cloud Templates
Deploying ADF in a cloud environment is most feasible when a cloud service provider such as
Amazon, Terremark, or Savvis makes available the templates already created by Oracle for that
purpose. These templates provide preinstalled applications such as Oracle WebLogic, Oracle
Enterprise Linux, and Oracle JRockit.
TIP
The most straightforward template to use can be downloaded
from https://edelivery.oracle.com/oraclevm if you are using your own Oracle Virtual
Server instance. When creating an Amazon instance, follow the instructions
at www.oracle.com/technetwork/middleware/weblogic/wlsovm-ref-133104.pdf. This document
also contains instructions for configuring and starting Oracle WebLogic.
Templates should provide a Linux environment with the following software:

Oracle WebLogic Server 10.3.0.0


JRockit JDK 6.0 R27.6


Oracle Enterprise Linux 5.2, JeOS-1.0.1-6, a secure, headless (command-line control


instead of UI-managed operating system) and minimized version of the Oracle Enterprise
Linux OS
It is important to note that a headless Linux operating system requires that command-line
utilities be used instead of the more familiar GUIs. A jumpstart utility in the template is
automatically invoked after the first login. Details about the configuration of the VM are
included in the referenced PDF document
at www.oracle.com/technetwork/middleware/weblogic/wlsovm-ref-133104.pdf. The jumpstart
tool will help configure the Oracle WebLogic server.
WARNING
The most difficult part of working with Oracle WebLogic 10.3 on Amazon's cloud is the ability
to work with the instance from within JDeveloper.
When a developer tries to deploy to an external IP address, such as ec2-18?-??-???-???.compute-
1.amazonaws.com (the question marks would have your specific IP address numbers), the
internal IP address is different, such as 10.???.??.???, and the subsequent deployment is rejected.
The fix for this problem is as follows (at least on Amazon EC2):
1.

Start the Oracle WebLogic console (with an edit session).


2.

Go to [Your domain name] | Environment | Servers and select your SOA deployment


server.
3.

On the Configuration | General tab select the Advanced section at the bottom of the


page.
4.

Enter your AMI IP (e.g., ec2-18?-??-???-???.compute-1.amazonaws.com) in


the External Listen Address field.
5.

Restart your server(s).


All in all, deploying the template and configuration should take well under an hour.
If you wish to install and configure your own Oracle VM locally, start here:

http://blogs.oracle.com/alison/2008/04/installing_oracle_vm.html
NOTE
Deploying ADF applications in the cloud is relatively straightforward given that Oracle provides
a lot of information to configure the environment. Take the time to try out different
configurations using the templates and you will find success based upon your organization's
cloud strategy.

What is CloudSim?
 Last Updated : 14 Jun, 2021
Cloud Computing  is one of the hottest topics in town. It has completely
transformed how modern-day applications are developed and maintained with
high scalability and low latency.
CloudSim is an open-source framework, which is used to simulate cloud
computing infrastructure and services. It is developed by the CLOUDS Lab
organization and is written entirely in Java. It is used for modelling and
simulating a cloud computing environment as a means for evaluating a
hypothesis prior to software development in order to reproduce tests and
results.
For example, if you were to deploy an application or a website on the cloud and
wanted to test the services and load that your product can handle and also tune
its performance to overcome bottlenecks before risking deployment, then such
evaluations could be performed by simply coding a simulation of that
environment with the help of various flexible and scalable classes provided by
the CloudSim package, free of cost.  

Benefits of Simulation over the Actual Deployment:

Following are the benefits of CloudSim:


 No capital investment involved. With a simulation tool like CloudSim
there is no installation or maintenance cost.
 Easy to use and Scalable. You can change the requirements such as
adding or deleting resources by changing just a few lines of code.
 Risks can be evaluated at an earlier stage. In Cloud Computing
utilization of real testbeds limits the experiments to the scale of the testbed
and makes the reproduction of results an extremely difficult undertaking.
With simulation, you can test your product against test cases and resolve
issues before actual deployment without any limitations.
 No need for try-and-error approaches. Instead of relying on theoretical
and imprecise evaluations which can lead to inefficient service performance
and revenue generation, you can test your services in a repeatable and
controlled environment free of cost with CloudSim.
Why use CloudSim?

Below are a few reasons to opt for CloudSim: 


 Open source and free of cost, so it favours researchers/developers
working in the field.
 Easy to download and set-up.
 It is more generalized and extensible to support modelling and
experimentation.
 Does not require any high-specs computer to work on.
 Provides pre-defined allocation policies and utilization models for
managing resources, and allows implementation of user-defined algorithms
as well.
 The documentation provides pre-coded examples  for new developers to
get familiar with the basic classes and functions.
 Tackle bottlenecks before deployment to reduce risk, lower costs,
increase performance, and raise revenue.
CloudSim Architecture:

CloudSim Layered Architecture

CloudSim Core Simulation Engine provides interfaces for the management of


resources such as VM, memory and bandwidth of virtualized Datacenters.
CloudSim layer manages the creation and execution of core entities such as
VMs, Cloudlets, Hosts etc. It also handles network-related execution along with
the provisioning of resources and their execution and management.
User Code is the layer controlled by the user. The developer can write the
requirements of the hardware specifications in this layer according to the
scenario.
Some of the most common classes used during simulation are:
 Datacenter: used for modelling the foundational hardware equipment of
any cloud environment, that is the Datacenter. This class provides methods
to specify the functional requirements of the Datacenter as well as methods
to set the allocation policies of the VMs etc.
 Host: this class executes actions related to management of virtual
machines. It also defines policies for provisioning memory and bandwidth to
the virtual machines, as well as allocating CPU cores to the virtual machines.
 VM: this class represents a virtual machine by providing data members
defining a VM’s bandwidth, RAM, mips (million instructions per second), size
while also providing setter and getter methods for these parameters.
 Cloudlet: a cloudlet class represents any task that is run on a VM, like a
processing task, or a memory access task, or a file updating task etc. It
stores parameters defining the characteristics of a task such as its length,
size, mi (million instructions) and provides methods similarly to VM class
while also providing methods that define a task’s execution time, status, cost
and history.
 DatacenterBroker: is an entity acting on behalf of the user/customer. It is
responsible for functioning of VMs, including VM creation, management,
destruction and submission of cloudlets to the VM.
 CloudSim: this is the class responsible for initializing and starting the
simulation environment after all the necessary cloud entities have been
defined and later stopping after all the entities have been destroyed.

Features of CloudSim:

CloudSim provides support for simulation and modelling of:


1. Large scale virtualized Datacenters, servers and hosts.
2. Customizable policies for provisioning host to virtual machines.
3. Energy-aware computational resources.
4. Application containers and federated clouds (joining and management of
multiple public clouds).
5. Datacenter network topologies and message-passing applications.
6. Dynamic insertion of simulation entities with stop and resume of
simulation.
7. User-defined allocation and provisioning policies.
Installation:
Prerequisites:
 Knowledge of Core Java language features such
as OOP and Collections. 
 Basics of Cloud Computing .
 CloudSim is available for download here.
 For this tutorial, we have downloaded zip file of CloudSim 3.0.3.
Note: CloudSim also uses some utilities of Apache’s commons-math3 library.
Download its Binaries zip file from here. 

Step 1: From the zip folder extracts cloudsim-3.0.3  into a folder. Also, extract
the commons-math3-3.6.1 jar  into the same folder. 

Step 2: Open Eclipse IDE and go to File -> New -> Java Project.
Step 3: Enter any name for your project and then uncheck the Use default
location box just under it and click on Browse.
Browse to the folder where you extracted your files and select the cloudsim-
3.0.3 folder.
Don’t click on Finish yet, because we need to add a jar file to our project.
Step 4 Click Next and go to Libraries -> Add External JARs.  Now browse to
the same folder where you extracted your commons-math3 jar file and Open it.
Step 5 Finally click on Finish and wait for the project to build. After the project
has been built, from the Project Explorer you can click on your project and from
the dropdown go-to examples -> org.cloudbus.cloudsim.examples where you
can find pre-written sample codes and try to run them. 

Scope

With the flexibility and generalizability of the CloudSim framework, it is easy to


model heavy cloud environments which would otherwise require
experimentation on paid computing infrastructures. Extensible capabilities of
scaling the infrastructure and resources to fit any scenario helps in fast and
efficient research of several topics in cloud computing.
CloudSim has been used in several areas of research such as:
 Task Scheduling
 Green Computing
 Resource Provisioning
 Secure Log Forensics.

You might also like