Mod 5 PPT
Mod 5 PPT
MODULE 5
2
3 Content
Amazon EC2 allows deploying servers in the form of virtual machines created
as instances of a specific image.
Images come with a preinstalled operating system and a software stack, and
instances can be configured for memory, number of processors, and storage.
Users are provided with credentials to remotely access the instance
9
and further configure or install software if needed.
2. EC2 instances
3. EC2 environment
The use of ECUs helps give users a consistent view of the performance
offered by EC2 instances.
One ECU is defined as giving the same performance as a 1.0 - 1.2 GHz
2007 Opteron or 2007 Xeon processor.
3. EC2 environment
Elastic IPs allow instances running in EC2 to act as servers reachable from the
Internet.
where xxx-xxx-xxx normally represents the four parts of the external IP address
separated by a dash, and compute-x gives information about the availability zone
where instances are deployed.
4. Advanced compute services
14
AWS CloudFormation constitutes an extension of the simple
deployment model that characterizes EC2 instances.
16
The core service is represented by Amazon Simple Storage Service (S3). This is a
distributed object store that allows users to store information in different formats.
Buckets represent virtual containers in which to store objects; objects represent the
content that is actually stored.
Objects can also be enriched with metadata that can be used to tag the stored content with
additional information.
1 S3 key concepts
3 Amazon ElastiCache
5 Amazon CloudFront
1 S3 key concepts
17
S3 has been designed to provide a simple storage service that’s accessible through
a Representational State Transfer (REST) interface.
These express all the operations that can be performed on the storage in the form
of HTTP requests (GET, PUT, DELETE, HEAD, and POST).
Buckets
Buckets are top-level elements of the S3 storage architecture and do not support nesting.
That is, it is not possible to create “subbuckets” or other kinds of physical divisions.
Users either store files or push to the S3 text stream representing the object’s content.
An object is identified by a name that needs to be unique within the bucket in which the
content is stored.
The name cannot be longer than 1,024 bytes when encoded in UTF-8, and it allows
almost any character. Buckets do not support nesting.
Access control and security
Amazon S3 allows controlling the access to buckets and objects by means of Access Control Policies
19
(ACPs).
An ACP is a set of grant permissions that are attached to a resource expressed by means of an XML
configuration file.
A policy allows defining up to 100 access rules, each of them granting one of the available permissions to
a grantee.
A. READ allows the grantee to retrieve an object and its metadata and to list the content of a bucket as well as
getting its metadata.
B. WRITE allows the grantee to add an object to a bucket as well as modify and remove it.
20 The Amazon Elastic Block Store (EBS) allows AWS users to provide EC2 instances with
persistent storage in the form of volumes that can be mounted at instance startup.
EBS volumes normally reside within the same availability zone of the EC2 instances that
will use them to maximize the I/O performance.
Once mounted as volumes, their content is lazily loaded in the background and according
to the request made by the operating system.
EBS volumes normally reside within the same availability zone of the EC2 instances
that will use them to maximize the I/O performance.
Once mounted as volumes, their content is lazily loaded in the background and
according to the request made by the operating system. This reduces the number of I/O
requests that go to the network.
4 Structured storage solutions
22
Amazon provides applications with structured storage services in three different
forms:
● Amazon SimpleDB.
Preconfigured EC2 AMIs are predefined templates featuring an installation of a given
database management system.
23
EC2 instances created from these AMIs can be completed with an EBS volume for
storage persistence.
Available AMIs include installations of IBM DB2, Microsoft SQL Server, MySQL,
Oracle, PostgreSQL, Sybase, and Vertica.
RDS is relational database service that relies on the EC2 infrastructure and is managed by
Amazon.
Developers do not have to worry about configuring the storage for high availability,
designing failover strategies, or keeping the servers up-to-date with patches.
Moreover, the service provides users with automatic backups, snapshots, point-in-time
recoveries, and facilities for implementing replications.
Amazon SimpleDB is a lightweight, highly scalable, and flexible data storage solution for
applications that do not require a fully relational model for their data.
24
SimpleDB provides support for semistructured data, the model for which is based on the concept
of domains, items, and attributes.
SimpleDB uses domains as top-level elements to organize a data store. These domains are roughly
comparable to tables in the relational model.
Unlike tables, they allow items not to have all the same column structure; each item is therefore
represented as a collection of attributes expressed in the form of a key-value pair.
5 Amazon CloudFront
It leverages a collection of edge servers strategically located around the globe to better serve
requests for static and streaming Web content so that the transfer time is reduced.
9.1.3 Communication services
25
Amazon provides facilities to structure and facilitate the
communication among existing applications and services residing
within the AWS infrastructure.
2. Messaging .
1. Virtual networking
26 Virtual networking comprises a collection of services that allow AWS users to control
the connectivity to and between compute and storage services.
Amazon Virtual Private Cloud (VPC) and Amazon Direct Connect provide
connectivity solutions in terms of infrastructure.
Amazon VPC provides flexibility in creating virtual private networks within the
Amazon infrastructure and beyond.
The service providers prepare templates for network service for advanced
configurations.
Amazon SES provides AWS users with a scalable email service that leverages
the AWS infrastructure.
Once users are signed up for the service, they have to provide an email that SES
will use to send emails on their behalf.
To activate the service, SES will send an email to verify the given address and
provide the users with the necessary information for the activation.
9.2 Google AppEngine
30
Google AppEngine is a PaaS implementation.
For each HTTP request, AppEngine locates the servers hosting the
application that processes the request, evaluates their load, and, if necessary,
allocates additional resources or redirects the request to an existing server.
Static file servers - Web applications are composed of dynamic and static data.
Dynamic data are a result of the logic of the application and the interaction with the
user.
Static data often are mostly constituted of the components that define the graphical
layout of the application or data files.
DataStore can be considered as a large object database in which to store objects that
can be retrieved by a specified key.
4 Application services
35 Applications hosted on AppEngine take the most from the services made available
through the runtime environment.
These services simplify most of the common operations that are performed in Web
applications
UrlFetch - The sandbox environment does not allow applications to open arbitrary
connections through sockets, but it does provide developers with the capability of
retrieving a remote resource through HTTP/HTTPS by means of the UrlFetch service.
MemCache- This is a distributed in-memory cache that is optimized for fast access
and provides developers with a volatile store for the objects that are frequently
accessed.
Mail and instant messaging- AppEngine provides developers with
36
the ability to send and receive mails through Mail.
Task queues- A task is defined by a Web request to a given URL, and the queue
invokes the request handler by passing the payload as part of the Web request to
the handler. It is the responsibility of the request handler to perform the “task
execution,” which is seen from the queue as a simple Web request.
Python SDK- The Python SDK allows developing Web applications for
AppEngine with Python 2.5. It provides a standalone tool, called
GoogleAppEngineLauncher, for managing Web applications locally and
deploying them to AppEngine.
2 Application deployment and management
40
Once application has been developed and tested, it can be deployed on AppEngine with
simple click or command-line tool.
Before performing such task, it is necessary to create application identifier, which will be
used to locate application from Web browser by typing the address http://<application-
id>.appspot.com.
1. Compute services
Compute services are the core components of Microsoft Windows Azure, and
they are delivered by means of the abstraction of roles.
Currently, there are three different roles: Web role, Worker role, and Virtual
Machine (VM) role.
Web role- The Web role is designed to implement scalable Web
45
applications. Web roles represent the units of deployment of Web
applications within the Azure infrastructure. They are hosted on the IIS 7
Web Server.
Worker role - Worker roles are designed to host general compute services
on Azure. They can be used to quickly provide compute power or to host
services that do not communicate with the external world through HTTP.
Blobs Azure allows storing large amount of data in the form of binary
large objects (BLOBs) by means of the blobs service.
Block blobs - Block blobs are composed of blocks and are optimized for
sequential access; therefore they are appropriate for media streaming.
Page blobs - Page blobs are made of pages that are identified by offset
47 from beginning of blob. A page blob can be split into multiple pages or
constituted of single page.
Azure drive Page blobs can be used to store an entire file system in the
form of a single Virtual Hard Drive (VHD) file.
The Web service constitute SaaS application that will store ECG data in the
Amazon S3 service and issue a processing request to the scalable cloud
platform.
54 1. The elasticity of cloud infrastructure that can grow and shrink according to the
requests served. As a result, doctors and hospitals do not have to invest in large
computing infrastructures designed after capacity planning, thus making more
effective use of budgets.
3. Cost savings. Cloud services are priced on a pay-per-use basis and with volume
prices for large numbers of service requests.
10.1.2 Biology: protein structure prediction
55 Protein structure prediction is a computationally intensive task that is fundamental to
different types of research in the life sciences.
The geometric structure of a protein cannot be directly inferred from the sequence
of genes that compose its structure, but it is the result of complex computations
aimed at identifying the structure that minimizes the required energy.
This task requires the investigation of a space with a massive number of states,
consequently creating a large number of computations for each of these states.
One project that investigates the use of cloud technologies for protein structure
prediction is Jeeva – an integrated Web portal that enables scientists to offload
the prediction task to a computing cloud based on Aneka( Figure 10.2 ).
56
10.1.3 Biology: gene expression data analysis for cancer diagnosis
57 Gene expression profiling is the measurement of the expression levels of thousands
of genes at once.
The eXtended Classifier System (XCS) has been utilized for classifying large
datasets in bioinformatics and computer science domains.
The dimensionality of typical gene expression datasets ranges from several
58 thousands to over tens of thousands of genes.
The eXtended Classifier System (XCS) has been utilized for classifying large
datasets in bioinformatics and computer science domains.
10.1.4 Geoscience: satellite image processing
59
Geoscience applications collect, produce, and analyze massive amounts of
geospatial and nonspatial data.
At the PaaS level, Aneka controls the importing of data into the virtualized
infrastructure and the execution of image-processing tasks that produce the
desired outcome from raw satellite images.
61
10.2 Business and consumer applications
62 10.2.1 CRM and ERP
Cloud CRM applications constitute a great opportunity for small enterprises
and start-ups to have fully functional CRM software without large up-front
costs and by paying subscriptions.
Your business and customer data from everywhere and from any device, has
fostered the spread of cloud CRM applications.
ERP solutions on the cloud are less mature and have to compete with well-
established in-house solutions.
Each CRM instance is deployed on a separate database, and application provides users
with facilities for marketing, sales, and advanced customer relationship management.
Dynamics CRM Online features can be accessed either through a Web browser interface
or by means of SOAP and RESTful Web services.
This allows Dynamics CRM to be easily integrated with both other Microsoft products
and line-of-business applications.
Dynamics CRM can be extended by developing plug-ins that allow implementing specific
behaviors triggered on the occurrence of given events.
3 NetSuite
66
NetSuite provides a collection of applications that help customers manage every aspect
of the business enterprise.
Its offering is divided into three major products: NetSuite Global ERP, NetSuite Global
CRM1 , and NetSuite
Global Ecommerce.
Moreover, an all-in-one solution: NetSuite One World, integrates all three products
together.
On top of the SaaS infrastructure, the NetSuite Business Suite components offer
accounting, ERP, CRM, and ecommerce capabilities.
10.2.2 Productivity
67 Productivity applications replicate in cloud. The most common tasks that we are used to
performing on our desktop: from document storage to office automation and complete
desktop environments hosted in the cloud.
Online storage solutions have turned into SaaS applications and become more usable as
well as more advanced and accessible.
Dropbox provides users with a free storage that is accessible through the abstraction of a
folder. Users can either access their Dropbox folder through a browser or by
downloading and installing a Dropbox client, which provides access to the online storage
by means of a special folder.
All the modifications into this folder are silently synched so that changes are notified to
all the local instances of the Dropbox folder across all the devices.
iCloud , a cloud-based document-sharing application provided by Apple to
68
synchronize iOS-based devices in a completely transparent manner.
Documents, photos, and videos are automatically synched as changes are made,
without any explicit operation.
This allows the system to efficiently automate common operations without any
human intervention.
2 Google docs
69 Google Docs is a SaaS application that delivers the basic office automation
capabilities with support for collaborative editing over the Web.
Google Docs allows users to create and edit text documents, spreadsheets,
presentations, forms, and drawings.
It aims to replace desktop products such as Microsoft Office and OpenOffice and
provide similar interface and functionality as a cloud service.
EyeOS stores the data about users and applications on the server file system.
Once the user has logged in by providing credentials, the desktop environment
is rendered in the client’s browser by downloading all the JavaScript libraries
required to build the user interface and implement the core functionalities of
EyeOS.
EyeOS also provides APIs for developing new applications and integrating
new capabilities into the system.
10.2.3 Social networking
73
1 Facebook
With more than 800 million users, it has become one of the largest Websites in
the world.
Currently, the social network is backed by two data centers that have been
built and optimized to reduce costs and impact on the environment.
On top of this highly efficient infrastructure, built and designed out of inexpensive
74 hardware, a completely customized stack of opportunely modified and refined open-
source technologies constitutes the back-end of the largest social network.
The reference stack serving Facebook is based on LAMP (Linux, Apache, MySQL,
and PHP). This collection of technologies is accompanied by a collection of other
services developed in-house.
While serving page requests, the social graph of the user is composed.
Most of the user data are served by querying a distributed cluster of MySQL instances,
which mostly contain key-value pairs.
10.2.4 Media applications
75 1 Animoto
The Website provides users with a very straightforward interface for quickly creating
videos out of images, music, and video fragments submitted by users.
Users select a specific theme for a video, upload the photos and videos and order them
in the sequence they want to appear, select the song for the music, and render the video.
The process is executed in the background and the user is notified via email once the
video is rendered. A proprietary artificial intelligence (AI) engine, which selects the
animation and transition effects according to pictures and music, drives the rendering
operation.
Users only have to define the storyboard by organizing pictures and videos into the
desired sequence.
The infrastructure of Animoto is complex and is composed of different systems
76 that all need to scale.
It uses Amazon EC2 for the Web front-end and worker nodes; Amazon S3 for the
storage of pictures, music, and videos; and Amazon SQS for connecting all the
components.
The analysis of these images can help engineers identify problems and correct
their design.
Three-dimensional rendering tasks take considerable amounts of time,
79
especially in the case of huge numbers of frames, but it is critical for the
department to reduce the time spent in these iterations.
The service integrates with both Amazon Web Services technologies (EC2, S3,
and CloudFront) and Rackspace (Cloud Servers, Cloud Files, and Limelight
CDN access).
To use the service, users have to specify the location of the video to
82
transcode, the destination format, and the target location of the video.
Online games support hundreds of players in the same session, made possible by the
specific architecture used to forward interactions, which is based on game log
processing.
Players update the game server hosting the game session, and the server integrates all
the updates into a log that is made available to all the players through a TCP port.
The client software used for the game connects to the log port and, by reading the
log, updates the local user interface with the actions of other players.
Game log processing is also utilized to build statistics on players and rank them.
84 These features constitute the additional value of online gaming portals that
attract more and more gamers.
The use of cloud computing technologies can provide the required elasticity for
seamlessly processing these workloads and scale as required when the number
of users increases.