0% found this document useful (0 votes)
133 views97 pages

Cloud Raghav 1

project report and synopsis for major project in guru jambheshwar university of science and technology. Major project 8th semester. Contains topic like image face matching.

Uploaded by

BHAVESH GUPTA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views97 pages

Cloud Raghav 1

project report and synopsis for major project in guru jambheshwar university of science and technology. Major project 8th semester. Contains topic like image face matching.

Uploaded by

BHAVESH GUPTA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

1

PRACTICAL FILE

CLOUD
COMPUTING

Submitted to – Submitted by-


Dr. Anupama Sangwan Raghav Aggarwal
Assistant Professor 16013035
(Dept. of CSE) B-Tech CSE 8th sem
2
3

INDEX

SR. NO EXPERIMENT PAGE NO. SIGN

Implementation of spreadsheet using Google


Drive to perform various operation. 3-9
1.

Using Google Drive to perform various


operation in google docs 10-14
2.
Installation of Google app Engine to develop
application 15-20
3.

Use of Google app Engine to develop


application in python 21-24
4.
Implementation of Para-Virtualization using
VM Ware‘s Workstation/ Oracle‘s Virtual 25-35
5.
Box and Guest O.S
Process of Installation and Configuration of
Hadoop 36-39
6.

Create an application (Ex: Word Count) using


Hadoop Map/Reduce 40-46
7.
Case Study: PAAS (Face book, Google App
Engine) 47-62
8.

AWS Case Study: Amazon.com 63-74


9.
4

EXPERIMENT NO. - 1

Experiment Title: Implementation of spreadsheet using Google Drive to perform


various operation.

Aim: Performing operation in spreadsheet using google drive.

Theory :

Google Sheets is a spreadsheet app on steroids. It looks and functions much like
any other spreadsheet tool, but because it's an online app, it offers much more than
most spreadsheet tools. Here are some of the things that make it so much better:

● It’s a web-based spreadsheet that you can use anywhere — no more forgetting
your spreadsheet files at home.
● It works from any device, with mobile apps for iOS and Android along with its
web-based core app.
● Google Sheets is free, and it's bundled with Google Drive, Docs, and Slides to
share files, documents, and presentations online.
● It includes almost all of the same spreadsheet functions—if you know how to
use Excel, you'll feel at home in Google Sheets.
● You can download add-ons, create your own, and write custom code.
● It's online, so you can gather data with your spreadsheet automatically and do
almost anything you want, even when your spreadsheet isn't open.

Whether you’re a spreadsheet novice or an Excel veteran looking for a better way
to collaborate, this book will help you get the most out of Google Sheets. We'll
start out with the basics in this chapter—then keep reading to learn Google Sheets'
advanced features, find its best add-ons, and learn how to build your own.

Getting Started with Google Sheets :


The best way to learn a tool like Sheets is to dive straight in. In this chapter, you'll
learn how to:

1. Creating spreadsheet and fill it with data.


2. Format data for easy viewing.
3. Add, Average and filter the data with formula.
5

4. Share, protect and move the data.

Common Spreadsheet Terms


To kick things off, let's cover some spreadsheet terminology to help you
understand this the
terms in this book:
● Cell: A single data point or element in a spreadsheet.
● Column: A vertical set of cells.
● Row: A horizontal set of cells.
● Range: A selection of cells extending across a row, column, or both.
● Function: A built-in operation from the spreadsheet app, which can be used to
calculate
cell row, column, or range values, manipulate data, and more.
● Formula: The combination of functions, cells, rows, columns, and ranges used
to obtain a specific result.
● Worksheet (Sheet): The named sets of rows and columns making up your
spreadsheet; one spreadsheet can have multiple sheets
● Spreadsheet: The entire document containing your worksheets

1. Creating spreadsheet and fill it with data :


There are 3 ways to create a new spreadsheet in Google Sheets:

1. Click the red "NEW" button on your your Google Drive dashboard and
select "Google Sheets"
2. Open the menu from within a spreadsheet and select "File > New
Spreadsheet"
3. Click "Blank" or select a template on the Google Sheets homepage.
6

2. Format Data for Easy Viewing :


The basic formatting options in Google Sheets are available above your first cell.
They're
labeled in the image below, but for quick reference while you're working on a
sheet, just hover over an icon to see its description and shortcut key.

Print, Undo / Redo, and the Font Settings / Styling function similarly to what you'd
expect from your favorite word processor. The shortcut keys are the same as well,
so just treat it like you’re editing any other document.

3. Add, Average, and Filter Data with Formulas :


The most basic formulas in Sheets include:

● SUM: adds up a range cells (e.g. 1+2+3+4+5 = sum of 15)


● AVERAGE: finds the average of a range of cells (e.g. 1,2,3,4,5 = average of
3)
7

● COUNT: counts the values in a range of cells (ex: 1,blank,3,4,5 = 4 total cells
with values)
● MAX: finds the highest value in a range of cells (ex: 1,2,3,4,5 = 5 is the
highest)
● MIN: finds the lowest value in a range of cells (ex: 1,2,3,4,5 = 1 is the lowest)
● Basic Arithmetic: You can also perform functions like addition, subtraction,
and multiplication directly in a cell without calling a formula.

Using the SUM Formula :


Let’s start with adding up the total number of ingredients required for each
recipe. I’ll use the SUM formula to add each value in the recipes and get a total
amount.There are three ways
to use the basic formulas accessible via the top navigation:
1. Select a range then click the formula (this will put the result either below
or to the side
of the range).
2. Select the result cell (i.e. the cell where you want the result to appear),
then click on the formula you want to use from the toolbar. Finally, select
the range of cells to perform
your operation on.
3. Type the formula into the result cell (don't forget the = sign) then either
manually type
a range or select the range.
8

Using the AVERAGE formula :

I’ve added some faux minimum and maximum prices per unit on my ingredients
list to the right of my breakfast options. We’ll want to get an average price for each
ingredient using low and high rates, then multiply the resulting average price of
the ingredient by respective unit count in each recipe.

I’ll start by highlighting the range of values (in this case it’s two side-by-side
rather than a vertical range) and selecting the AVERAGE formula from the
toolbar.

Using Simple Arithmetic Formulas :


We need to calculate the total cost of the breakfast by multiplying the average
price of each ingredient by its unit count in the recipe. To accomplish this,
9

manually type a formula into the "Avg Price" row.Our basic arithmetic formula
would look like this for the "Scrambled Eggs" column:

=$I2*B2+$I3*B3+$I4*B4+$I5*B5+$I6*B6+$I7*B7+$I8*B8
The $ symbol before column I (the average prices) tells Sheets that no matter
where we put the formula in our spreadsheet, we always want to reference the I
column.

4. Share, Protect, and Move Your Data :What makes Sheets so powerful is
how
"in sync" you'll feel with your coworkers. Jointly editing a spreadsheet is one of
the critical functions of Sheets, and Google has made it a seamless experience.
Here’s how it works:
1. Click either FILE > SHARE or use the blue "Share" button in the top right
2. Click "advanced", then enter emails of who can view or edit your
spreadsheet
3. Select any other privacy options and hit done.

Conclusion:
Thus we performed various operation like sum, count, average etc. in the google
drive spreadsheet.
10

EXPERIMENT NO. – 2

Experiment Title: Using Google Drive to perform various operation in google


docs.

Aim: Performing operation in google docs using google drive.

Theory :

Google Docs is a word processor included as part of a free, web-


based software office suite offered by Google within its Google Drive service. This
service also includes Google Sheets and Google Slides,
a spreadsheet and presentation program respectively. Google Docs is available as
a web application, mobile app for Android, iOS, Windows, BlackBerry, and as
a desktop application on Google's ChromeOS. The app is compatible
with Microsoft Office file formats.[1] The application allows users to create and
edit files online while collaborating with other users in real-time. Edits are tracked
by user with a revision history presenting changes. An editor's position is
highlighted with an editor-specific color and cursor. A permissions system
regulates what users can do. Updates have introduced features using machine
learning, including "Explore", offering search results based on the contents of a
document, and "Action items", allowing users to assign tasks to other users.

Features :
Editing :
Collaboration and revision history
Google Docs and the other apps in the Google Drive suite serve as
a collaborative tool for cooperative editing of documents in real-time. Documents
can be shared, opened, and edited by multiple users simultaneously and users are
able to see character-by-character changes as other collaborators make edits.
Changes are automatically saved to Google's servers, and a revision history is
automatically kept so past edits may be viewed and reverted to.[19] An editor's
current position is represented with an editor-specific color/cursor, so if another
editor happens to be viewing that part of the document they can see edits as they
11

occur. A sidebar chat functionality allows collaborators to discuss edits. The


revision history allows users to see the additions made to a document, with each
author distinguished by color.Only adjacent revisions can be and users cannot
control how frequently revisions are saved. Files can be exported to a user's local
computer in a variety of formats (ODF, HTML, PDF, RTF, Text, Office Open
XML). Files can be tagged and archived for organizational purposes.

Explore
In March 2014. Google introduced add-ons;. new tools from third-party developers that
add more features for Google Docs. In order to view and edit documents offline on a
computer, users need to be using the Google Chrome web browser. A Chrome extension
Google Docs Offline', allows users to enable offline support for Docs files on the Google
Drive website. The Android and iOS apps natively support offline editing.
In June 2014, Google introduced "Suggested edits" in Google Docs; as part of the
"commenting access" permission, participants can come up with suggestions for edits that
the author can accept or reject, in contrast to full editing ability. In October 2016, Google
announced "Action items" for Docs. If a user writes phrases such as "Ryan to follow up
on the keynote script", the service will intelligently assign
that action to "Ryan". Google states this will make it easier for other collaborators to see
which person
is responsible for what task. When a user visits Google Drive, Docs, Sheets or Slides, any
files with
tasks assigned to them will be highlighted with a badge.
A basic research tool was introduced in 2012. later expanded into "Explore", launched in
September 2016, enabling additional functionality through machine learning. In Google
Docs, Explore shows relevant Google search results based on information in the
document, simplifying information gathering. Users can also mark specific document
text, press Explore and see search results based on the marked text only.
In December 2016, Google introduced a quick citations feature to Google Docs. The
quick citation tool allows users to "insert citations as footnotes with the click of a button"
on the web through the Explore feature introduced in September. The citation feature also
marked the launch of the Explore functionalities in G Suite for Education accounts.
Files :
Supported file formats:
Files in the following formats can be viewed and converted to the Docs format:
12

● For documents: .doc (if newer than Microsoft Office 95), .docx, .docm .dot,
.dotx, .dotm, .html, plain text (.txt), .rtf, .odt
File limits
Limits to insertable file sizes, overall document length and size are listed below:

● Up to 1.02 million characters, regardless of the number of pages or font size.


Document files converted to .gdoc Docs format cannot be larger than 50 MB.
Images inserted cannot be
● larger than 50 MB, and must be in either .jpg, .png, or non-animated .gif
formats.
G Suite
Google Docs and the Drive suite are free of charge for use by individuals, but are
also available as part of Google's business-centered G Suite, enabling additional
business-focused functionality on payment of a monthly subscription.
The various steps that are involved in google docs creation are :
Step 1: Create a document
To create a new document:

1. On your computer, open the Docs home screen at docs.google.com.


2. In the top left, under "Start a new document," click New .

You can also create new documents from the URL docs.google.com/create.

Step 2: Edit and format


To edit a document:
13

1. On your computer, open a document in Google Docs.


2. To select a word, double-click it or use your cursor to select the text you
want to change.
3. Start editing.
4. To undo or redo an action, at the top, click Undo or Redo .

Note: To edit a document on a touchscreen device, like a Pixel Book, double-tap


the document to start typing.
You can add and edit text, paragraphs, spacing, and more in a document.

● Format paragraphs or font


● Add a title, heading, or table of contents.

Step 3: Share & work with others


You can share files and folders with people and choose whether they can view,
edit, or comment on them.
14

Conclusion:
Thus we performed different operation like creation, deletion, sharing of file and
docs in the google drive.
15

EXPERIMENT NO. – 3

Experiment Title: Implementation of installing Google app Engine to develop


application.

Aim: Performing installation of google app engine.

Theory :

Fully managed serverless application platform


Build and deploy applications on a fully managed platform. Scale your
applications seamlessly from zero to planet scale without having to worry about
managing the underlying infrastructure. With zero server management and zero
configuration deployments, developers can focus only on building great
applications without the management overhead. App Engine enables developers to
stay more productive and agile by supporting popular development languages and
a wide range of developer tools.

Open and familiar languages and tools


Quickly build and deploy applications using many of the popular languages like
Java™, PHP, Node.js, Python, C#, .Net, Ruby, and Go or bring your own language
runtimes and frameworks if you choose. Get started quickly with zero
configuration deployments in App Engine. Manage resources from the command
line, debug source code in production, and run API backends easily, using
industry-leading tools such as Cloud SDK, Cloud Source Repositories, IntelliJ
IDEA, Visual Studio, and PowerShell.

Comparing the flexible environment to Compute Engine


The App Engine flexible environment has the following differences to Compute
Engine:
● Flexible environment VM instances are restarted on a weekly basis. During
restarts, Google's management services apply any necessary operating system and
security updates.
● You always have root access to Compute Engine VM instances. By default, SSH
access to the VM instances in the flexible environment is disabled. If you choose,
you can enable root access
to your app's VM instances.
16

● Code deployments can take longer as container images are built by using the Cloud
Build
service.
● The geographical region of a flexible environment VM instance is determined by
the location
that you specify for the App Engine application of your Cloud project. Google's
management services ensures that the VM instances are co-located for optimal
performance.

Installation of google app engine on windows:


The installation process have few requirement that must be fulfilled in order to
successfully
Install google app engine. The first is the computer must have a python or java or
node any language in the system preinstalled before, thus there will be no problem
in installation process.

Open the chrome and serch for google cloud app engine.move to the given link and
page.
https://cloud.google.com/appengine/docs.
Then choose the programming language suitable for your project.

The user need to specify the environment suitable for project development. It
consist of two major environment.
1. Standard environment.
17

2. Flexible environment.

● Standard environment : The Python 3.7 runtime is capable of running any


framework, library, or binary.
● The Python 2.7 runtime does not allow user provided libraries with C code, and
has proprietary APIs.
● Optimized to scale nearly instantaneously to handle huge traffic spikes.

● Flexible environment: Open source runtimes capable of running any framework,


library, or binary.
● Greater CPU and memory instance types.
● Can access resources in the same Compute Engine network.
● Python 2.7 and 3.6
● No free tier. Application always has a minimum number of running instances.
Most cost-effective for applications that serve traffic continuously.

To install the Google Cloud SDK, initialize it, and run core gcloud commands
from the command-line.
1. Create a Google Cloud Platform project, if you don't have one already.
2. Download the Google Cloud SDK installer.
3. Launch the installer and follow the prompts.
18

4. After installation has completed, the installer presents several options:

After successful installation of cloud SDK there will be a command shell of google
cloud app engine available. Open the shell and use the keyword “y” to proceed
forward and create a project name which will run the program required.

Run core gcloud commands -

Run these gcloud commands to view information about your SDK installation:

1. To list accounts whose credentials are stored on the local system:


gcloud auth list
gcloud displays a list of credentialed accounts:
Credentialed Accounts
ACTIVE ACCOUNT
* [email protected]
[email protected]
2. To list the properties in your active SDK configuration:
gcloud config list
gcloud displays the list of properties:
[core]
account = [email protected]
19

disable_usage_reporting = False
project = example-project

3. To view information about your Cloud SDK installation and the active SDK
configuration:
gcloud info
gcloud displays a summary of information about your Cloud SDK installation.
This includes information about your system, the installed SDK components, the
active user account and current project, and the properties in the active SDK
configuration.
4. To view information about gcloud commands and other topics from the command
line:
gcloud help
For example, to view the help for gcloud compute instances create:
gcloud help compute instances create
gcloud displays a help topic that contains a description of the command, a list of
command flags and arguments, and examples of how to use it.
Conclusion:
Thus we successfully installed the google app engine using the google cloud SDK
which is used as development platform for different programming language.
20

EXPERIMENT NO. – 4

Experiment Title: Use of Google app Engine to develop application in python.

Aim: Working on the google app engine.

Theory :

Fully managed serverless application platform


Build and deploy applications on a fully managed platform. Scale your
applications seamlessly from zero to planet scale without having to worry about
managing the underlying infrastructure. With zero server management and zero
configuration deployments, developers can focus only on building great
applications without the management overhead. App Engine enables developers to
stay more productive and agile by supporting popular development languages and
a wide range of developer tools.

Open and familiar languages and tools


Quickly build and deploy applications using many of the popular languages like
Java™, PHP, Node.js, Python, C#, .Net, Ruby, and Go or bring your own language
runtimes and frameworks if you choose. Get started quickly with zero
configuration deployments in App Engine. Manage resources from the command
line, debug source code in production, and run API backends easily, using
industry-leading tools such as Cloud SDK, Cloud Source Repositories, IntelliJ
IDEA, Visual Studio, and PowerShell.

Comparing the flexible environment to Compute Engine


The App Engine flexible environment has the following differences to Compute
Engine:
● Flexible environment VM instances are restarted on a weekly basis. During
restarts, Google's management services apply any necessary operating system and
security updates.
● You always have root access to Compute Engine VM instances. By default, SSH
access to the VM instances in the flexible environment is disabled. If you choose,
you can enable root access
to your app's VM instances.
● Code deployments can take longer as container images are built by using the Cloud
Build
service.
21

● The geographical region of a flexible environment VM instance is determined by


the location
that you specify for the App Engine application of your Cloud project. Google's
management services ensures that the VM instances are co-located for optimal
performance.
To run a python file on the google app engine we need to configure and install
cloud SDK. To install the Google Cloud SDK, initialize it, and run
core gcloud commands from the command-line.

● Create a Google Cloud Platform project, if you don't have one already.
● Download the Google Cloud SDK installer.
● Launch the installer and follow the prompts.
● After installation has completed, the installer presents several options:

After successful installation of cloud SDK there will be a command shell of google
cloud app engine available. Open the shell and use the keyword “y” to proceed
forward and create a project name which will run the program required which will
help in the setting of the whole shell and environment of google app engine.
22

Then the bin file is used to execute the dev_appserver.py file, which a python file
used for the initial setup of the environment and it will also automatically update
and remove any old file in the python library as well as environment.

The file which the user want to execute is placed in the bin folder or it can be
placed anywhere which can be executed using the cloud services of the google app
engine. Now to show you execution of a python program that will execute the
program of GUI calculator.
23

Similarly the google app engine can be used to create enormous amount of apps
and can be used in other heavy computation models as well as in the machine
learning areas.
Conclusion :
Thus we have executed the python program using the google app engine.
24

Experiment No. 5

Experiment Title: Implementation of Para-Virtualization


using VM Ware‘s Workstation/ Oracle‘s Virtual Box and
Guest O.S.

Aim: Implementation of Virtual Box for Virtualization of any OS.

Theory:

Virtual Box is a cross-platform virtualization application. What


does that mean? For one thing, it installs on your existing Intel or
AMD-based computers, whether they are running Windows, Mac,
Linux or Solaris operating systems. Secondly, it extends the
capabilities of your existing computer so that it can run multiple
operating systems (inside multiple virtual machines) at the same
time. So, for example, you can run Windows and Linux on your
Mac, run Windows Server 2008 on your Linux server, run Linux on
your Windows PC, and so on, all alongside your existing
applications. You can install and run as many virtual machines as
you like the only practical limits are disk space and memory. Virtual
Box is deceptively simple yet also very powerful. It can run
everywhere from small embedded systems or desktop class
machines all the way up to datacenter deployments and even Cloud
environments.

The techniques and features that Virtual Box provides are useful for several
scenarios:

● Running multiple operating systems simultaneously.


Virtual Box allows you to run more than one operating
system at a time. This way, you can run software written for
one operating system on another (for example, Windows
software on Linux or a Mac) without having to reboot to use
25

it. Since you can configure what kinds of "virtual" hardware


should be presented to each such operating system, you can
install an old operating system such as DOS or OS/2 even if
your real computer's hardware is no longer supported by that
operating system.

● Easier software installations. Software vendors can use


virtual machines to ship entire software configurations. For
example, installing a complete mail server solution on a real
machine can be a tedious task. With Virtual Box, such a
complex setup (then often called an "appliance") can be
packed into a virtual machine. Installing and running a mail
server becomes as easy as importing such an appliance into
Virtual Box.

Testing and disaster recovery. Once installed, a virtual


machine and its virtual hard disks can be considered a
"container" that can be arbitrarily frozen, woken up, copied,
backed up, and transported between hosts.
Infrastructure consolidation. Virtualization can significantly reduce
hardware and electricity costs. Most of the time, computers today only
use a fraction of their potential power and run with low average system
loads. A lot of hardware resources as well as electricity is thereby wasted.
So, instead of running many such physical computers that are only
partially used, one can pack many virtual machines onto a few powerful
hosts and balance the loads between them.

Some Terminologies used:

When dealing with virtualization (and also for understanding the following
chapters of this documentation), it helps to acquaint oneself with a bit of crucial
terminology, especially the following terms:

Host operating system (host OS). This is the operating system of the
physical computer on which Virtual Box was installed. There are versions of
26

Virtual Box for Windows, Mac OS X, Linux and Solaris hosts.

Guest operating system (guest OS).This is the operating system that is


running inside the virtual machine. Theoretically, Virtual Box can run any x86
operating system (DOS, Windows, OS/2, FreeBSD, Open BSD), but to achieve
near-native performance of the guest code on your machine, we had to go
through a lot of optimizations that are specific to certain operating systems. So
while your favorite operating system may run as a guest, we officially support
and optimize for a select few (which, however, include the most common ones).

Virtual machine (VM). This is the special environment that Virtual Box
creates for your guest operating system while it is running. In other words, you
run your guest operating system "in" a VM. Normally, a VM will be shown as a
window on your computers desktop, but depending on which of the various
frontends of VirtualBox you use, it can be displayed in full screen mode or
remotely on another computer. In a more abstract way, internally, VirtualBox
thinks of a VM as a set of parameters that determine its behavior. They include

hardware settings (how much memory the VM should have, what hard disks
VirtualBox should virtualize through which container files, what CDs are
mounted etc.) as well as state information (whether the VM
is currently running, saved, its snapshots etc.). These settings are mirrored in the
VirtualBox Manager window as well as the VBoxManage command line
program.
27

Guest Additions. This refers to special software packages which are shipped with
VirtualBox but designed to be installed inside a VM to improve performance of the guest OS
and to add extra features.

Starting Virtual Box:

After installation, you can start VirtualBox as follows:

On a Windows host, in the standard "Programs" menu, click on the item


in the "VirtualBox" group. On Vista or Windows 7, you can also type
"VirtualBox" in the search box of the "Start" menu.

On a Mac OS X host, in the Finder, double-click on the "VirtualBox" item


in the "Applications" folder. (You may want to drag this item onto your
Dock.)

On a Linux or Solaris host, depending on your desktop environment, a


"VirtualBox" item may have been placed in either the "System" or
"System Tools" group of your "Applications" menu. Alternatively, you
can type VirtualBox in a terminal.

When you start VirtualBox for the first time, a window like the following should come up:
28
29

This window is called the "VirtualBox Manager". On the left, you can see a
pane that will later list all your virtual machines. Since you have not created any,
the list is empty. A row of buttons above it allows you to create new VMs and
work on existing VMs, once you have some. The pane on the right displays the
properties of the virtual machine currently selected, if any. Again, since you don't
have any machines yet, the pane displays a welcome message.

To give you an idea what VirtualBox might look like later, after you have created
many machines, here's another example:

Creating your first virtual machine:

Click on the "New" button at the top of the VirtualBox Manager window. A
wizard will pop up to guide you through setting up a new virtual machine (VM)
30

On the following pages, the wizard will ask you for the bare minimum of
information that is needed to

create a VM, in particular:

The VM name will later be shown in the VM list of the VirtualBox


Manager window, and it will be used for the VM's files on disk. Even
though any name could be used, keep in mind that once you have created
a few VMs, you will appreciate if you have given your VMs rather
informative names; "My VM" would thus be less useful than "Windows
XP SP2 with OpenOffice".

For "Operating System Type", select the operating system that you
want to install later. The supported operating systems are grouped; if you
want to install something very unusual that is not listed, select "Other".
Depending on your selection, Virtual Box will enable or disable certain
VM settings that your guest operating system may require. This is
particularly important for 64-bit guests (see Section 3.1.2,64-bit guests).
31

It is therefore recommended to always set it to the correct value.


32

On the next page, select the memory (RAM) that Virtual Box should
allocate every time the virtual machine is started. The amount of memory
given here will be taken away from your host machine and presented to
the guest operating system, which will report this size as the (virtual)
computer's installed RAM.

A Windows XP guest will require at least a few hundred MB RAM to run


properly, and Windows Vista will even refuse to install with less than 512
MB. Of course, if you want to run graphics-intensive applications in your
VM, you may require even more RAM.

So, as a rule of thumb, if you have 1 GB of RAM or more in your host


computer, it is usually safe to allocate 512 MB to each VM. But, in any
case, make sure you always have at least 256 to 512 MB of RAM left on
your host operating system. Otherwise you may cause your host OS to
excessively swap out memory to your hard disk, effectively bringing your
host system to a standstill. As with the other settings, you can change this
setting later, after you have created the VM.

Next, you must specify a virtual hard disk for your VM. There are many and
potentially complicated ways in which VirtualBox can provide hard disk space
to a VM (see Chapter 5, Virtual storage for details), but the most common way
is to use a large image file on your "real" hard disk, whose contents VirtualBox
presents to your VM as if it were a complete hard disk. This file represents an
entire hard disk then, so you can even copy it to another host and use it with
another VirtualBox installation.
33

The wizard shows you the following window:

Here you have the following options:


34

● To create a new, empty virtual hard disk, press the "New" button.

● You can pick an existing disk image file. The drop-down list presented
in the window contains all disk images which are currently remembered
by VirtualBox, probably because they are currently attached to a virtual
machine (or have been in the past). Alternatively, you can click on the
small folder button next to the drop-down list to bring up a standard file
dialog, which allows you to pick any disk image file on your host disk.

Most probably, if you are using VirtualBox for the first time, you will want
to create a new disk image. Hence, press the "New" button. This brings up
another window, the "Create New Virtual Disk Wizard", which helps you
create a new disk image file in the new virtual machine's folder.

VirtualBox supports two types of image files:

● A dynamically allocated file will only grow in size when the guest
actually stores data on its virtual hard disk. It will therefore initially
be small on the host hard drive and only later grow to the size
specified as it is filled with data.

● A fixed-size file will immediately occupy the file specified, even if


only a fraction of the virtual hard disk space is actually in use. While
occupying much more space, a fixed-size file incurs less overhead and
is therefore slightly faster than a dynamically allocated file.

For details about the differences, please refer to Section 5.2,Disk image
files (VDI, VMDK, VHD, HDD.

After having selected or created your image file, again press "Next" to go to the next
page.

After clicking on "Finish", your new virtual machine will be created. You
will then see it in the list on the left side of the Manager window, with the
name you entered initially.
35

Running your virtual machine: To start a virtual machine, you have several options:

Double-click on its entry in the list within the Manager window or

select its entry in the list in the Manager window it and press the "Start"
button at the top or

for virtual machines created with VirtualBox 4.0 or later, navigate to the
"VirtualBox VMs" folder in your system user's home directory, find the
subdirectory of the machine you want to start and double-click on the
machine settings file (with a .vbox file extension). This opens up a new
window, and the virtual machine which you selected will boot up.
Everything which would normally be seen on the virtual system's monitor
is shown in the window. In general, you can use the virtual machine much
like you would use a real computer. There are couple of points worth
mentioning however.

Saving the state of the machine: When you click on the "Close" button of
36

your virtual machine window (at the top right of the window, just like you
would close any other window on you
37

system), VirtualBox asks you whether you want to "save" or "power off" the
VM. (As a shortcut, you can also press the Host key together with "Q".)

The difference between these three options is crucial. They mean:

Save the machine state: With this option, VirtualBox "freezes" the
virtual machine by completely saving its state to your local disk. When
you start the VM again later, you will find that the VM continues exactly
where it was left off. All your programs will still be open, and your
computer resumes operation. Saving the state of a virtual machine is thus
in some ways similar to suspending a laptop computer (e.g. by closing its
lid).

Send the shutdown signal. This will send an ACPI shutdown signal to
the virtual machine, which has the same effect as if you had pressed the
power button on a real computer. So long as the VM is running a fairly
modern operating system, this should trigger a proper shutdown
mechanism from within the VM.

Power off the machine: With this option, VirtualBox also stops running
the virtual machine, but without saving its state. As an exception, if your
virtual machine has any snapshots (see the next chapter), you can use this
option to quickly restore the current snapshot of the virtual.

Machine. In that case, powering off the machine will not disrupt its
state, but any changes made since that snapshot was taken will be lost.
The "Discard" button in the VirtualBox.
Manager window discards a virtual machine's saved state. This has the
38

same effect as powering it off, and the same warnings apply.

Importing and exporting virtual machines

VirtualBox can import and export virtual machines in the industry-standard Open
Virtualization Format (OVF). OVF is a cross-platform standard supported by
many virtualization products which allows for creating ready-made virtual
machines that can then be imported into a virtualizer such as VirtualBox.
VirtualBox makes OVF import and export easy to access and supports it from the
Manager window as well as its command-line interface. This allows for
packaging so-called virtual appliances: disk images together with configuration
settings that can be distributed easily. This way one can offer complete ready-to-
use software packages (operating systems with applications) that need no
configuration or installation except for importing into VirtualBox.

Appliances in OVF format can appear in two variants:

They can come in several files, as one or several disk images, typically in
the widely- used VMDK format (see Section 5.2,Disk image files (VDI,
VMDK, VHD, HDD)‖) and a textual description file in an XML dialect
with an .ovf extension. These files must then reside in the same directory
for Virtual Box to be able to import them.

Alternatively, the above files can be packed together into a single archive
file, typically with an .ova extension. (Such archive files use a variant of
the TAR archive format and can therefore be unpacked outside of Virtual
Box with any utility that can unpack standard TAR files.)

Select "File" -> "Export appliance". A different dialog window shows up that
allows you to combine several virtual machines into an OVF appliance. Then,
select the target location where the target files should be stored, and the
conversion process begins. This can again take a while.

Conclusion:
Thus we have studied use of Multiple OS using Virtual Box by virtualizing.
39

Experiment No. - 6

Aim: Installation and Configuration of Hadoop.

Theory:

Hadoop-1.2.1 Installation Steps for Single-Node Cluster (On Ubuntu 12.04)

Download and install VMware Player depending on


your Host OS (32 bit or 64 bit
https://my.vmware.com/web/vmware/free#desktop_end_
user_computing/vmware_play er/6_0

Download the .iso image file of Ubuntu 12.04 LTS (32-bit or


64-bit depending on your requirements)
http://www.ubuntu.com/download/desktop

Install Ubuntu from image in VMware. (For efficient use,


configure the Virtual Machine to have at least 2GB (4GB
preferred) of RAM and at least 2 cores of processor

JAVA INSTALLATION

sudo mkdir -p /usr/local/java

cd ~/Downloads

sudo cp -r jdk-8-linux-i586.tar.gz /usr/local/java


40

sudo cp -r jre-8-linux-i586.tar.gz /usr/local/java

cd /usr/local/java

sudo tar xvzf jdk-8-linux-i586.tar.gz

sudo tar xvzf jre-8-linux-i586.tar.gz

ls a jdk1.8.0 jre1.8.0 jdk-8-linux-i586.tar.gz jre-8-linux-i586.tar.gz


41

sudo gedit /etc/profile

JAVA_HOME=/usr/local/j
ava/jdk1.7.0_4
PATH=$PATH:$HOME/bi
n:$JAVA_HOME
/binJRE_HOME=/usr/local/j
ava/jdk1.7.0_45/j
rePATH=$PATH:$HOME/b
in:$JRE_HOME/
binHADOOP_HOME=/hom
e/hadoop/adoop- 1.2.1
PATH=$PATH:$HADOOP
_HOME/binexpor t
JAVA_HOME export
JRE_HOME export PATH

sudo update-alternatives --install "/usr/bin/java" "java"


"/usr/local/java/jdk1.8.0/jre/bin/java" 1

sudo update-alternatives --install "/usr/bin/javac" "javac"


"/usr/local/java/jdk1.8.0/bin/javac" 1 13.sudo update-alternatives --install
"/usr/bin/javaws" "javaws" "/usr/local/java/jdk1.8.0/bin/javaws" 1

sudo update-alternatives --set java /usr/local/java/jdk1.8.0/jre/bin/java

sudo update-alternatives --set javac /usr/local/java/jdk1.8.0/bin/javac

sudo update-alternatives --set javaws /usr/local/java/jdk1.8.0/bin/javaws

. /etc/profile

java -version

java version "1.8.0"

Java(TM) SE Runtime Environment (build 1.8.0-b132)

Java HotSpot(TM) Client VM (build 25.0-b70, mixed mode)

HADOOP INSTALLATION
42

open Home

create a floder hadoop

copy from downloads hadoop-1.2.1.tar.gz to hadoop

right click on hadoop-1.2.1.tar.gz and Extract Here

cd hadoop/
ls -a

. .. hadoop-1.2.1 hadoop-
1.2.1.tar.gz 25. edit the file
conf/hadoop-env.sh

# The java implementation to


use. Required. export
JAVA_HOME=/usr/local/java/jdk
1.8.0

26. cd hadoop-1.2.1

------------------STANDALONE OPERATION----------------

mkdir input

cp conf/*.xml input

bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

cat output/*

----------------PSEUDO DISTRIBUTED OPERATION //WORDCOUNT

conf/core-site.xml:
<configuration> <property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>
43

conf/hdfs-site.xml:
<configuration> <property>

<name>dfs.replication</name>

<value>1</value>

</property></configuration>
44

● conf/mapred-site.xml:
<configuration> <property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

ssh localhost

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

bin/hadoop namenode -format

bin/start-all.sh

Run the following command to verify that hadoop services

are running $ jps If everything was successful, you should

see following services running

2583 DataNode

2970 ResourceManager

3461 Jps

3177 NodeManager

2361 NameNode

2840 SecondaryNameNode

Conclusion:
45

Thus we have studied how to install and configure hadoop on Ubuntu


operating system.
46

Experiment No. - 7

Aim: Create an application (Ex: Word Count) using Hadoop Map/Reduce.

Theory:

THE MAPREDUCE MODEL

Traditional parallel computing algorithms were developed for systems with a


small number of processors, dozens rather than thousands. So it was safe to
assume that processors would not fail during a computation. At significantly
larger scales this assumption breaks down, as was experienced at Google in the
course of having to carry out many large-scale computations similar to the one
in our word counting example. The MapReduce parallel programming
abstraction was developed in response to these needs, so that it could be used
by many different parallel applications while leveraging a common underlying
fault-tolerant implementation that was transparent to application developers.
Figure 11.1 illustrates MapReduce using the word counting example where we
needed to count the occurrences of each word in a collection of documents.

MapReduce proceeds in two phases, a distributed ‗map‘ operation


followed by a distributed ‗reduce‘ operation; at each phase a configurable
number of M ‗mapper‘ processors and R ‗reducer‘ processors are assigned to
work on the problem (we have usedM = 3 and R = 2 in the illustration). The
computation is coordinated by a single master process (not shown in the figure).

A MapReduce implementation of the word counting task proceeds as


follows: In the map phase each mapper reads approximately 1/M th of the input
(in this case documents), from the global file system, using locations given to
it by the master. Each mapped then performs a
‗map‘ operation to compute word frequencies for its subset of documents.
These frequencies are sorted by the words they represent and written to the
local file system of the mapper. At the next phase reducers are each assigned a
subset of words; in our illustration
47

the first reducer is assigned w1 and w2 while the second one handles w3 and
w4. In fact during the map
48

phase itself each mapper writes one file per reducer, based on the words assigned
to each reducer, and keeps the master informed of these file locations. The master
in turn informs the reducers where the partial counts for their words have been
stored on the local files of respective mappers; the reducers then make remote
procedure call requests to the mappers to fetch these. Each reducer performs a
reduce‘ operation that sums up the frequencies for each word, which are finally
written back to the GFS file system

The MapReduce programming model generalizes the computational structure of


the above example. Each map operation consists of transforming one set of key-
value pairs to another:

Map: (k1, v1) → [(k2,


v2)]………………………………
(11.4)

In our example each map operation takes a document indexed by its id and
49

emits a list if word- count pairs indexed by word-id: (dk, [w1 wn]) → [(wi, ci)].
The reduce operation groups the
results of the map step using the same key k2 and performs a function f on the
list of values that correspond to each.
50

Reduce: (k2, [v2]) → (k2, f


([v2]))……………………….
(11.5)

The implementation also generalizes. Each mapper is assigned an input-key range


(set of values for k1) on which map operations need to be performed. The mapper
writes results of its map operations to its local disk in R partitions, each
corresponding to the output-key range (values of k2) assigned to a particular
reducer, and informs the master of these locations. Next each reducer fetches
these pairs from the respective mappers and performs reduce operations for each
key k2 assigned to it. If a processor fails during the execution, the master detects
this through regular heartbeat communications it maintains with each worker,
wherein updates are also exchanged regarding the status of tasks assigned to
workers.

If a mapper fails, then the master reassigns the key-range designated to it


to another working node for re-execution. Note that re-execution is required even
if the mapper had completed some of its map operations, because the results were
written to local disk rather than the GFS. On the other hand if a reducer fails only
its remaining tasks (values k2) are reassigned to another node, since the
completed tasks would already have been written to the GFS.

Finally, heartbeat failure detection can be fooled by a wounded task that


has a heartbeat but is making no progress: Therefore, the master also tracks the
overall progress of the computation and if results from the last few processors in
either phase are excessively delayed, these tasks are duplicated and assigned to
processors who have already completed their work. The master declares the task
completed when any one of the duplicate workers complete.

Such a fault-tolerant implementation of the MapReduce model has been


implemented and is widely used within Google; more importantly from an
enterprise perspective, it is also available as an open source implementation
through the Hadoop project along with the HDFS distributed file system.

The MapReduce model is widely applicable to a number of parallel


computations, including database-oriented tasks which we cover later. Finally we
51

describe one more example, that of indexing a large collection of documents, or, for
that matter any data including database records: The map task consists of emitting a
word-document/record id pair for each word: (dk, [w1 . . .wn]) → [(wi, dk)]. The
reduce step groups the pairs by word and creates an index entry for each word: [(wi,
dk)] → (wi, [di1 . . .

dim]).

Indexing large collections is not only important in web search, but also a critical
aspect of handling structured data; so it is important to know that it can be
executed efficiently in parallel using . Traditional parallel databases focus on
rapid query execution against data warehouses that are updated infrequently; as
a result these systems often do not parallelize index creation sufficiently well.

Open in any Browser

Open in any Browser NameNode - http://localhost:50070/

Open in any Browser JobTracker - http://localhost:50030/

open hadoop/hadoop-1.2.1 create a document type something in that


document and save it as test.txt

bin/hadoop fs -ls /

Found 1 items

drwxr-xr-x - vishal supergroup 0 2014-04-15 01:13 /tmp

bin/hadoop fs -mkdir example

bin/hadoop fs -ls /user/vishal/

Found 1 items

drwxr-xr-x - vishal supergroup /user/vishal/example


52

bin/hadoop fs -copyFromLocal test.txt /user/vishal/example

bin/hadoop jar hadoop-examples-1.2.1.jar wordcount /user/vishal/example/test.txt /hello


package com.WordCount.Example;

imp
ort
jav
a.io
.IO
Exc
epti
on;
imp
ort
jav
a.ut
il.*
;

import
org.apache.had
oop.fs.Path;
import
org.apache.had
oop.conf.*;
import
org.apache.had
oop.io.*;
import
org.apache.had
oop.mapred.*;
import
org.apache.had
oop.util.*;

public class WCE


53

public static class Map extends MapReduceBase implements


Mapper<LongWritable, Text, Text, IntWritable>

private final static IntWritable


one = new IntWritable(1);
private Text word = new
Text();

public void map(LongWritable key, Text value, OutputCollector<Text,


IntWritable> output, Reporter reporter) throws IOException

{
String line = value.toString();

StringTokeniz
er tokenizer =
new
StringTokenize
r(line); While
(tokenizer.has
MoreTokens())

word.set(token
izer.nextToke
n());
output.collect(
word, one);

public static class Reduce extends MapReduceBase implements


Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,


OutputCollector<Text, IntWritable> output, Reporter reporter)
54

throws IOException {

int sum = 0;

while (values.hasNext())

sum += values.next().get();

output.collect(key, new IntWritable(sum));

}}

public static void main(String[] args) throws Exception


{ JobConf conf =

new

JobConf(WCE.clas

s);

conf.setJobName("

wordcount");

conf.setOutputKey

Class(Text.class);

conf.setOutputValueClass(Int

Writable.class);

conf.setMapperClass(Map.clas

s);

conf.setCombinerClass(Reduce

.class);

conf.setReducerClass(Reduce.class);
FileInputFormat.addInputPath(conf,
new Path(args[0]));
55

FileOutputFormat.setOutputPath(conf,
new Path(args[1]));
JobClient.runJob(conf);

}}

● Right Click on project Name → New File sample → type something in the
sample fle.

● Right click on project →Export →Click on java → Provide a JAR file name →
Select the Location where to save the jar file.
● Right Click on project Name →Run as → Run Configuration→ Java Application
→new→ In main→
WordCount→click on Search and click on the JAR File which you have created→
Click on Arguments → Provide under Program arguments → sample output
→Click on Run.
● Right click on the project → Refresh→ An output file is created in your project

Conclusion:
Hence we have implemented Map Reduce example such as Word
Count program on an file which will count the no.of times a word
repeats in the given file.
56

Experiment No. - 8

Aim: Case Study: PAAS (Face book, Google App Engine).

Theory:
Platform-as-a-Service (PaaS):

Cloud computing has evolved to include platforms for building and running
custom web-based applications, a concept known as Platform-as-a- Service.
PaaS is an outgrowth of the SaaS application delivery model. The PaaS model
makes all of the facilities required to support the complete life cycle of building
and delivering web applications and services entirely available from the
Internet, all with no software downloads or installation for developers, IT
managers, or end users. Unlike the IaaS model, where developers may create a
specific operating system instance with homegrown applications running, PaaS
developers are concerned only with webbased development and generally do
not care what operating system is used. PaaS services allow users to focus on
innovation rather than complex infrastructure. Organizations can redirect a
significant portion of their budgets to creating applications that provide real
business value instead of worrying about all the infrastructure issues in a roll-
your-own delivery model. The PaaS model is thus driving a new era of mass
innovation. Now, developers around the world can access unlimited computing
power. Anyone with an Internet connection can build powerful applications
and easily deploy them to users globally.

Google App Engine:

Architecture :

The Google App Engine (GAE) is Google`s answer to the ongoing trend of
Cloud Computing offerings within the industry. In the traditional sense, GAE is
a web application hosting service, allowing for development and deployment of
web-based applications within a pre- defined runtime environment. Unlike other
cloud-based hosting offerings such as Amazon Web Services that operate on an
IaaS level, the GAE already provides an application infrastructure on the PaaS
level. This means that the GAE
57

abstracts from the underlying hardware and operating system layers by providing
the hosted application with a set of application-oriented services. While this
approach is very convenient for
developers of such applications, the rationale behind the GAE is its focus on
scalability and usage-based infrastructure as well as payment.

Costs :

Developing and deploying applications for the GAE is generally free of charge
but restricted to a
certain amount of traffic generated by the deployed application. Once this limit
is reached within a certain time period, the application stops working. However,
this limit can be waived when switching to a billable quota where the developer
can enter a maximum budget that can be spent on an application per day.
Depending on the traffic, once the free quota is reached the application will
continue to work until the maximum budget for this day is reached. Table 1
summarizes some of the in our opinion most important quotas and corresponding
amount per unit that is charged when free resources are depleted and additional,
billable quota is desired.

Features :

a Runtime Environment, the Data store and the App Engine services, the
GAE can be divided into three parts.With

Runtime Environment

The GAE runtime environment presents itself as the place where the actual
application is executed. However, the application is only invoked once an HTTP
request is processed to the GAE via a web browser or some other interface,
meaning that the application is not constantly running if no invocation or
processing has been done. In case of such an HTTP request, the request handler
forwards the request and the GAE selects one out of many possible Google
servers where the application is then instantly deployed and executed for a certain
amount of time (8). The application may then do some computing and return the
result back to the GAE request handler which forwards an HTTP response to the
client. It is important to understand that the application runs completely
embedded in this described sandbox environment but only as long as requests are
still coming in or some processing is done within the application. The reason for
this is simple: Applications should only run when they are actually computing,
58

otherwise they would allocate precious computing power and memory without
need. This paradigm shows already the GAE‘s potential in terms of scalability.
Being able to run multiple instances of one application independently on
different servers guarantees for a decent level of scalability. However, this highly
flexible and stateless application execution paradigm has its limitations.

Requests are processed no longer than 30 seconds after which the response has
to be returned to
the client and the application is removed from the runtime environment again.
Obviously this
method optimizing for several subsequent requests to the same application.
The type of runtime environment on the Google servers is dependent on the
programming language used.
59

For Java or other languages that have support for Java-based compilers (such as
JRuby, Rhino and Groovy) a Java-based Java Virtual Machine (JVM) is
provided. Also, GAE fully supports the Google Web Toolkit (GWT), a
framework for rich web applications. For Python and related frameworks a
Python-based environment is used.

Persistence and the datastore

As previously discussed, the stateless execution of applications creates the need


for a datastore that provides a proper way for persistence. Traditionally, the most
popular way of persisting data in web applications has been the use of relational
databases. However, setting the focus on high flexibility and scalability, the GAE
uses a different approach for data persistence, called Bigtable (14). Instead of
rows found in a relational database, in Google‘s Bigtable data is stored in entities.
Entities are always associated with a certain kind. These entities have properties,
resembling columns in relational database schemes. But in contrast to relational
databases, entities are actually schemaless, as two entities of the same kind not
necessarily have to have the same properties or even the same type of value for a
certain property.

The most important difference to relational databases is however the


querying of entities withina Bigtable datastore. In relational databases
queries are processed and executed against a database at application
runtime. GAE uses a different approach here. Instead of processing a
query at application runtime, queries are pre-processed during
compilation time when a corresponding index is created. This index is
60

later used at application runtime when the actual query is executed.


Thanks to the index, each query is only a simple table scan where only
the exact filter value is searched. This method makes queries very fast
compared to relational databases while updating entities is a lot more
expensive.
61

Transactions are similar to those in relational databases. Each transaction


is atomic, meaning that it either fully succeeds or fails. As described above, one
of the advantages of the GAE is its scalability through concurrent instances of the
same application. But what happens when two instances try to start transactions
trying to alter the same entity? The answer to this is quite simple: Only the first
instance gets access to the entity and keeps it until the transaction is completed
or eventually failed. In this case the second instance will receive a concurrency
failure exception. The GAE uses a method of handling such parallel transactions
called optimistic concurrency control. It simply denies more than one altering
transaction on an entity and implicates that an application running within the
GAE should have a mechanism trying to get write access to an entity multiple
times before finally giving up.

Heavily relying on indexes and optimistic concurrency control, the GAE


allows performing queries very fast even at higher scales while assuring data
consistency.

Services

As mentioned earlier, the GAE serves as an abstraction of the underlying


hardware and operating system layers. These abstractions are implemented as
services that can be directly called from the actual application. In fact, the
datastore itself is as well a service that is controlled by the runtime environment
of the application.

MEM CACHE

The platform innate memory cache service serves as a short-term storage. As its
name suggests, it stores data in a server‘s memory allowing for faster access
compared to the datastore. Memcache is a non-persistent data store that should
only be used to store temporary data within a series of computations. Probably
the most common use case for Memcache is to store session specific data (15).
Persisting session information in the datastore and executing queries on every
page interaction is highly inefficient over the application lifetime, since session-
owner instances are unique per session (16). Moreover, Memcache is well suited
to speed up common datastore queries (8). To interact with the Memcache

GAE supports JCache, a proposed interface standard for memory caches (17).

URL FETCH
62

Because the GAE restrictions do not allow opening sockets (18), a URL Fetch
service can be used to send HTTP or HTTPS requests to other servers on the
Internet. This service works asynchronously, giving the remote server some time
to respond while the request handler can do
63

other things in the meantime. After the server has answered, the URL Fetch
service returns response code as well as header and body. Using the Google
Secure Data Connector an application can even access servers behind a
company‘s firewall (8).

MAIL

The GAE also offers a mail service that allows sending and receiving email
messages. Mails can be sent out directly from the application either on behalf of
the application‘s administrator or on behalf of userswith Google Accounts.
Moreover, an application can receive emails in the form of HTTP requests
initiated by the App Engine and posted to the app at multiple addresses. In
contrast to incoming emails, outgoing messages may also have an attachment up
to 1 MB (8).

XMPP

In analogy to the mail service a similar service exists for instant messaging,
allowing an application to send and receive instant messages when deployed to
the GAE. The service allows communication to and from any instant messaging
service compatible to XMPP (8), a set of open technologies for instant messaging
and related tasks (19).

IMAGES

Google also integrated a dedicated image manipulation service into the App
Engine. Using this service images can be resized, rotated, flipped or cropped (18).
Additionally it is able to combine several images into a single one, convert
between several image formats and enhance photographs. Of course the API also
provides information about format, dimensions and a histogram of color values
(8).

USERS

User authentication with GAE comes in two flavors. Developers can roll their
own authentication service using custom classes, tables and Memcache or simply
plug into Google‘s Accounts service.

Since for most applications the time and effort of creating a sign-up
64

page and store user passwords is

not worth the trouble (18), the User service is a very convenient functionality
which gives an easy method for authenticating users within applications. As
byproduct thousands of Google Accounts are leveraged. The User service detects
if a user has signed in and otherwise redirect

the user to a sign-in page. Furthermore, it can detect whether the current user is
an administrator, which facilitates implementing admin-only areas within the
application (8).
OAUTH
The general idea behind OAuth is to allow a user to grant a third party limited
permission to access protected data without sharing username and password with
the third party. The OAuth specification separates between a consumer, which is
the application that seeks permission on accessing protected data, and the service
provider who is storing protected data on his users' behalf (20). Using Google
Accounts and the GAE API, applications can be an OAuth service provider (8).

SCHEDULED TASKS AND TASK QUEUES

Because background processing is restricted on the GAE platform, Google


introduced task queues as another built-in functionality (18). When a client
requests an application to do certain steps, the application might not be able to
process them right away. This is where the task queues come into play. Requests
that cannot be executed right away are saved in a task queue that controls the
correct sequence of execution. This way, the client gets a response to its request
right away, possibly with the indication that the request will be executed later
(13). Similar to the concept of task queues are corn jobs. Borrowed from the
UNIX world, a GAE cron job is a scheduled job that can invoke a request handler
at a pre-specified time (8).

BLOBSTORE

The general idea behind the blobstore is to allow applications to handle objects
that are much larger than the size allowed for objects in the datastore service.
Blob is short for binary large object and is designed to serve large files, such as
video or high quality images. Although blobs can have up to 2 GB they have to
be processed in portions, one MB at a time. This restriction was introduced to
smooth the curve of datastore traffic. To enable queries for blobs, each has a
corresponding blob info record which is persisted in the datastore (8), e. g. for
65

creating an image database.

ADMINISTRATION CONSOLE

The administration console acts as a management cockpit for GAE applications.


It gives the developer real-time data and information about the current
performance of the deployed application and is used to upload new versions of
the source code. At this juncture it is possible to test new versions of the
application and switch the versions presented to the user. Furthermore, access
data and logfiles can be viewed. It also enables analysis of traffic so that quota
can be adapted when needed. Alsothe status of scheduled tasks can be checked
and the administrator
is able to browse the applications datastore and manage indices (8).
66

App Engine for Business

While the GAE is more targeted towards independent developers in need for a
hosting platform for their medium-sized applications, Google`s recently launched
App Engine for Business tries to target the corporate market. Although
technically mostly relying on the described GAE, Google added some enterprise
features and a new pricing scheme to make their cloud computing platform more
attractive for enterprise customers (21). Regarding the features, App Engine for
Business includes a central development manager that allows a central
administration of all applications deployed within one company including access
control lists. In addition to that Google now offers a 99.9% service level
agreement as well as premium developer support. Google also adjusted the
pricing scheme for their corporate customers by offering a fixed price of $8 per
user per application, up to a maximum of $1000, per month. Interestingly, unlike
the pricing scheme for the GAE, this offer includes unlimited processing power
for a fixed price of
$8 per user, application and month. From a technical point of view, Google tries
to accommodate for established industry standards, by now offering SQL
database support in addition to the existing Bigtable datastore described above
(8).

APPLICATION DEVELOPMENT USING GOOGLE APP ENGINE

General Idea

In order to evaluate the flexibility and scalability of the GAE we tried to come up
with an application that relies heavily on scalability, i.e. collects large amounts
of data from external sources. That way we hoped to be able to test both
persistency and the gathering of data from external sources at large scale.
Therefore our idea has been to develop an application that connects people`s
delicious bookmarks with their respective Facebook accounts. People using our
application should be able to see what their

Facebook friends‘ delicious bookmarks are, provided their Facebook friends have
such a delicious account. This way a user can get a visualization of his friends‘
latest topics by looking at a generated tag cloud giving him a clue about the most
common and shared interests.
67

PLATFORM AS A SERVICE: GOOGLE APP ENGINE:--

The Google cloud, called Google App Engine, is a ‗platform as a service‘ (PaaS)
offering. In contrast with the Amazon infrastructure as a service cloud, where
users explicitly provision virtual machines and control them fully, including
installing, compiling and running software on
68

platform is provided along with an SDK, using which users develop applications
and deploy them on the cloud. The PaaS platform is responsible for executing the
applications, including servicing external service requests, as well as running
scheduled jobs included in the application. By making the actual execution
servers transparent to the user, a PaaS platform is able to share application servers
across users who need lower capacities, as well as automatically scale resources
allocated to applications that experience heavy loads. Figure 5.2 depicts a user
view of Google App Engine. Users upload code, in either Java or Python, along
with related files, which are stored on the Google File System, a very large scale
fault tolerant and redundant storage system. It is important to note that an
application is immediately available on the internet as soon as it is successfully
uploaded (no virtual servers need to be explicitly provisioned as in IaaS).

Resource usage for an application is metered in terms of web requests


served and CPU- hours actually spent executing requests or batch jobs. Note that
this is very different from the IaaS model: A PaaS application can be deployed
and made globally available 24×7, but charged only when accessed

(or if batch jobs run); in contrast, in an IaaS model merely making an application
continuously available incurs the full cost of keeping at least some of the servers
running all the time. Further, deploying applications in Google App Engine is
free, within usage limits; thus applications can be developed and tried out free
and begin to incur cost only when actually accessed by a sufficient volume of
requests. The PaaS model enables Google to provide such a free service because
69

applications do not run indedicated virtual machines; a deployed application that


is not accessed merely consumes storage for its code and data and expends no
CPU cycles.
70

GAE applications are served by a large number of web servers in


Google‘s data centers that execute requests from end-users across the globe. The
web servers load code from the GFS into memory and serve these requests. Each
request to a particular application is served by any one of GAE‘s web servers;
there is no guarantee that the same server will serve requests to any two requests,
even from the same HTTP session. Applications can also specify some functions
to be executed as batch jobs which are run by a scheduler.

Google Datastore:--

Applications persist data in the Google Datastore, which is also (like Amazon
SimpleDB) a non- relational database. The Datastore allows applications to
define structured types (called ‗kinds‘) and store their instances (called
‗entities‘) in a distributed manner on the GFS file system. While one can view
Datastore ‗kinds‘ as table structures and entities as records, there are important
differences between a relational model and the Datastore, some of which are also
illustrated in Figure 5.3.

Unlike a relational schema where all rows in a table have the same set of columns, all entities of
a
‗kind‘ need not have the same properties. Instead, additional properties can be added to any
71

entity.
This feature is particularly useful in situations where one cannot foresee all the potential
properties in
72

a model, especially those that occur occasionally for only a small subset of records.
For example, a model storing

‗products‘ of different types (shows, books, etc.) would need to allow each
product to have a different set of features. In a relational model, this would
probably be implemented using a separate FEATURES table, as shown on the
bottom left of Figure 5.3. Using the Datastore, this table (‗kind‘) is not required;
instead, each product entity can be assigned a different set of properties at
runtime. The Datastore allows simple queries with conditions, such as the first
query shown in Figure 5.3 to retrieve all customers having names in some
lexicographic range. The query syntax (called GQL) is essentially the same as
SQL, but with some restrictions. For example, all inequality conditions in a
querymust be on a single property; so a query that also filtered customers on, say,
their ‗type‘, would be illegal in GQL but allowed in SQL.

Relationships between tables in a relational model are modeled using


foreign keys. Thus, each account in the ACCTS table has a pointer ckey to the
customer in the CUSTS table that it belongs to. Relationships are traversed via
queries using foreign keys, such as retrieving all accounts for a particular
customer, as shown. The Datastore provides a more object-oriented approach to
relationships in persistent data. Model definitions can include references to other
models; thus each entity of the Accts

‗kind‘ includes a reference to its customer, which is an entity of the Custs ‗kind.‘
Further, relationships defined by such references can be traversed in both
directions, so not only can one directly access the customer of an account, but
also all accounts of a given customer, without executing any query operation, as
shown in the figure.

GQL queries cannot execute joins between models. Joins are critical
when using SQL to efficiently retrieve data from multiple tables. For example,
the query shown in the figure retrieves details of all products bought by a
particular customer, for which it needs to join data from the transactions (TXNS),
products (PRODS) and product features (FEATURES) tables. Even though GQL
does not allow joins, its ability to traverse associations between entities often
enables joins to be avoided, as shown in the figure for the above example: By
storing references to customers and products in the Txns model, it is possible to
retrieve all transactions for a given customer through a reverse traversal of the
customer reference. The product references in each transaction then yield all
products and their features (as discussed earlier, a separate Features model is not
73

required because of schema

flexibility). It is important to note that while object relationship traversal can be


used as an alternative to joins, this is not always possible, and when required joins
may need to be explicitly executed by application code.
74

The Google Datastore is a distributed object store where objects (entities)


of all GAE applications are maintained using a large number of servers and the
GFS distributed file system. From a user perspective, it is important to ensure
that in spite of sharing a distributed storage scheme with many other users,
application data is (a) retrieved efficiently and (b) atomically updated. The
Datastore provides a mechanism to group entities from different ‗kinds‘ in a
hierarchy that is used for both these purposes. Notice that in Figure 5.3entities of
the Accts and Txns ‗kinds‘ are instantiated with a parameter ‗parent‘ that
specifies a particular customer entity, thereby linking these three entities in an
‗entity group‘. The Datastore ensures that all entities belonging to a particular
group are stored close together in the distributed file system (we shall see how in
Chapter 10). The Datastore allows processing steps to be grouped into
transactions wherein updates to data are guaranteed to be

atomic; however this also requires that each transaction only manipulates entities
belonging to the same entity group. While this transaction model suffices for most
on line applications, complex batch updates that update many unrelated entities
cannot execute atomically, unlike in a relational database where there are no such
restrictions.

Amazon SimpleDB:--

Amazon SimpleDB is also a nonrelational database, in many ways similar


to the Google Datastore.

SimpleDB‗domains‘ correspond to ‗kinds‘, and ‗items‘ to entities; each item


can have a number of attribute-value pairs, and different items in a domain can
have different sets of attributes, similar to Datastore entities. Queries on
SimpleDB domains can include conditions, including inequality conditions, on
any number of attributes. Further, just as in the Google Datastore, joins are not
permitted. However, SimpleDB does not support object relationships as in
Google Datastore, nor does it support transactions. It is important to note that all
data in SimpleDB is replicated for redundancy, just as in

GFS. Because of replication, SimpleDB features an ‗eventual consistency‘


model, wherein data is guaranteed to be propagated to at least one replica and
will eventually reach all replicas, albeit with some delay. This can result in
perceived inconsistency, since an immediate read following a write may not
always yield the result written. In the case of Google Datastore on the other hand,
writes succeed only when all replicas are updated; this avoids inconsistency but
75

also makes writes slower.

PAAS CASE STUDY: FACEBOOK


76

Facebook provides some PaaS capabilities to application developers:--



Web services remote APIs that allow access to social network properties, data,Like
button,

etc.

Many third-parties run their apps off Amazon EC2, and interface to Facebook via its
APIs
PaaS
IaaS

Facebook itself makes heavy use of PaaS services for their own private cloud

Key problems: how to analyze logs, make suggestions, determine which ads to place.

Facebook API: Overview:--

What you can do:



Read data from profiles and pages

Navigate the graph (e.g., via friends lists)

Issue queries (for posts, people, pages, ...)

Facebook API: The Graph API :


{

"
i
d
"
:
"
1
0
77

7
4
7
2
4
7
1
2
"
,
"age_range": {

"min": 21

},

"locale":
"en_US",
"location": {

"id": "101881036520836",

"name": "Philadelphia,

Pennsylvania"
}
78

Requests are mapped directly to HTTP:

https://graph.facebook.com/(identifier)?fields=(fieldList)

Response is in JSON

Uses several HTTP methods:


GET for reading

POST for adding or modifying

DELETE for removing

IDs can be numeric or names

/1074724712 or /andreas.haeberlen

Pages also have IDs

Authorization is via 'access tokens'

Opaque string; encodes specific permissions (access user location, but not interests,
etc.)

Has an expiration date, so may need to be refreshed

Facebook Data Management / Warehousing Tasks

Main tasks for “cloud” infrastructure:



Summarization (daily, hourly)


79
to help guide development on different components


to report on ad performance


recommendations

Ad hoc analysis:
Answer questions on historical data – to help with managerial decisions

Archival of logs

Spam detection

Ad optimization

Initially used Oracle DBMS for this

But eventually hit scalability, cost, performance bottlenecks just


like Salesforce does now
80

Data Warehousing at Facebook:

PAAS AT FACEBOOK:
Scribe – open source logging, actually records the data that will be analyzed by
Hadoop

Hadoop (MapReduce – discussed next time) as batch processing engine for data
analysis

As of 2009: 2nd largest Hadoop cluster in the world, 2400 cores, > 2PB data
with
> 10TB added every day

Hive – SQL over Hadoop, used to write the data analysis queries

Federated MySQL, Oracle – multi-machine DBMSs to store query results

Example Use Case 1: Ad Details


Advertisers need to see how their ads are performing

Cost-per-click (CPC), cost-per-1000-impressions (CPM)

Social ads – include info from friends

Engagement ads – interactive with video

Performance numbers given:

Number unique users, clicks, video views, …


81

Main axes:

Account, campaign, ad

Time period

Type of interaction

Users

Summaries are computed using Hadoop via Hive

Use Case 2: Ad Hoc analysis, feedback


Engineers, product managers may need to understand what is going on

e.g., impact of a new change on some sub-population

Again, Hive-based, i.e., queries are in SQL with database joins

Combine data from several tables, e.g., click-through rate =


views combined with clicks

Sometimes requires custom analysis code with sampling

Conclusion :

Cloud Computing remains the number one hype topic within the IT industry at
present. Our evaluation of the Google App Engine and facebook has shown both
functionality and limitations of the platform. Developing and deploying an
application within the GAE is in fact quite easy and in a way shows the progress
that software development and deployment has made. Within our application we
were able to use the abstractions provided by the GAE without problems,
although the concept of Bigtable requires a big change in mindset when
developing. Our scalability testing showed the limitations of the GAE at this point
in time. Although being an extremely helpful feature and a great USP for the
GAE, the built-in scalability of the GAE suffers from both purposely-set as well
as technical restrictions at the moment. Coming back to our motivation of
evaluating the GAE in terms of its sufficiency for serious large-scale applications
in a professional environment, we have to conclude that the GAE not (yet) fulfills
business needs for enterprise applications at present.
82

Experiment No. - 9

Aim: AWS Case Study: Amazon.com.

Theory: About AWS.


🡪
Launched in 2006, Amazon Web Services (AWS) began exposing key infrastructure
services to businesses in the form of web services -- now widely known as cloud
computing.

🡪
The ultimate benefit of cloud computing, and AWS, is the ability to leverage a new
business
model and turn capital infrastructure expenses into variable costs.

Businesses no longer need to plan and procure servers and other IT resources weeks or
months inadvance.

Using AWS, businesses can take advantage of Amazon's expertise and economies of
scale
to access resources when their business needs them, delivering results
faster and at a lower cost.

Today, Amazon Web Services provides a highly reliable, scalable, low-cost


infrastructure
platform in the cloud that powers hundreds of thousands of businesses in 190
countries around the world.

Amazon.com is the world‘s largest online retailer. In 2011, Amazon.com switched


from tape backup to using Amazon Simple Storage Service (Amazon
S3) for backing up the majority of its

Oracle databases. This strategy reduces complexity and capital


expenditures, provides faster backup and restore performance,
83

eliminates tape capacity planning for backup and archive, and frees up
administrative staff for higher value operations. The company was able
to replace their backup tape infrastructure with cloud-based Amazon S3
storage, eliminate backup software, and experienced a 12X
performance improvement, reducing
restore time from around 15 hours to 2.5 hours in select scenarios.
84

With data center locations in the U.S., Europe, Singapore, and Japan, customers across
all
industries
are taking advantage of the following benefits:

Low Cos

Agility and Instant Elasticity

Open and Flexible

Secure

The Challenge

As Amazon.com grows larger, the sizes of their Oracle databases continue to


grow, and so does the sheer number of databases they maintain. This has caused
growing pains related to backing up legacy Oracle databases to tape and led to
the consideration of alternate strategies including the use of Cloud services of
Amazon Web Services (AWS), a subsidiary of Amazon.com. Some of the
business challenges Amazon.com faced included:

Utilization and capacity planning is complex, and time and capital expense
budget are at a premium. Significant capital expenditures were required over
the years for tape hardware, data center space for this hardware, and enterprise
licensing fees for tape software. During that time, managing tape
infrastructure required highly skilled staff to spend time with setup,
certification and engineering archive planning instead of on higher value
projects. And at the end of every fiscal year, projecting future capacity
requirements required time consuming audits, forecasting, and budgeting.

The cost of backup software required to support multiple tape devices sneaks
up on you. Tape robots provide basic read/write capability, but in order to
fully utilize them, you must invest in proprietary tape backup software. For
Amazon.com, the cost of the software had been high, and added significantly
to overall backup costs. The cost of this software was an ongoing budgeting
pain point, but one that was difficult to address as long as backups needed to
be written to tape devices.
85

Maintaining reliable backups and being fast and efficient when retrieving data
requires a lot of time and effort with tape. When data needs to be durably
stored on tape, multiple copies are required. When everything is working
correctly, and there is minimal contention for tape resources, the tape robots
and backup software can easily find the required data. However, if there is a
hardware failure, human intervention is necessary to restore from tape.
Contention for tape drives resulting from multiple users‘ tape requests slows
down restore processes even more. This adds to the recovery time objective
(RTO) and makes achieving it more challenging compared to backing up to
Cloud storage.

Why Amazon Web Services?

Amazon.com initiated the evaluation of Amazon S3 for economic and


performance improvements related to data backup. As part of that evaluation,
they considered security, availability, and performance aspects of Amazon S3
backups. Amazon.com also executed a cost-benefit analysis to ensure that a
migration to Amazon S3 would be financially worthwhile. That cost benefit
analysis included the following elements:

Performance advantage and cost competitiveness. It was important that the


overall costs of the backups did not increase. At the same time, Amazon.com
required faster backup and recovery performance. The time and effort
required for backup and for recovery operations proved to be a significant
improvement over tape, with restoring from Amazon S3 running from two to
twelve times faster than a similar restore from tape. Amazon.com required
any new backup medium to provide improved performance while maintaining
or reducing overall costs. Backing up to on-premises disk based storage
would have improved performance, but missed on cost competitiveness.
Amazon S3 Cloud based storage met both criteria.

Greater durability and availability. Amazon S3 is designed to provide


99.999999999% durability and 99.99% availability of objects over a given
year. Amazon.com compared these figures with those observed from their
tape infrastructure, and determined that Amazon S3 offered significant
improvement.

Less operational friction. Amazon.com DBAs had to evaluate whether


86

Amazon S3 backups would be viable for their database backups. They


determined that using Amazon S3 for backups was easy to implement because
it worked seamlessly with Oracle RMAN.

Strong data security. Amazon.com found that AWS met all of their
requirements for physical security, security accreditations, and security
processes, protecting data in flight, data at rest, and utilizing suitable
encryption standards.
87

Strong data security. Amazon.com found that AWS met all of their
requirements for physical security, security accreditations, and security
processes, protecting data in flight, data at rest, and utilizing suitable
encryption standards.

The Benefits

With the migration to Amazon S3 well along the way to completion,


Amazon.com has realized several

benefits, including:

Elimination of complex and time-consuming tape capacity planning.


Amazon.com is growing larger

and more dynamic each year, both organically and as a result of acquisitions.
AWS has enabled Amazon.com to keep pace with this rapid expansion, and
to do so seamlessly. Historically, Amazon.com business groups have had to
write annual backup plans, quantifying the amount of tape storage that they
plan to use for the year and the frequency with which they will use the tape
resources. These plans are then used to charge each organization for their tape
usage, spreading the cost among many teams. With Amazon S3, teams simply
pay for what they use, and are billed for their usage as they go. There are
virtually no upper limits as to how much data can be stored in Amazon S3,
and so there are no worries about running out of resources. For teams adopting
Amazon S3 backups, the need for formal planning has been all but eliminated.

Reduced capital expenditures. Amazon.com no longer needs to acquire tape


robots, tape drives, tape inventory, data center space, networking gear,
enterprise backup software, or predict future tape consumption. This
eliminates the burden of budgeting for capital equipment well in advance as
well as the capital expense.

Immediate availability of data for restoring – no need to locate or retrieve


physical tapes. Whenever a DBA needs to restore data from tape, they face
delays. The tape backup software needs to read the tape catalog to find the
correct files to restore, locate the correct tape, mount the tape, and read the
data from it. In almost all cases the data is spread across multiple tapes,
resulting in further delays. This, combined with contention for tape drives
resulting from multiple users‘ tape requests, slows the process down even
more. This is especially severe during critical events such as a data center
88

outage, when many databases must be restored simultaneously and as soon as


possible. None of these problems occur with Amazon S3. Data restores can
begin immediately, with no waiting or tape queuing – and that means the
database can be recovered much faster.
89

Backing up a database to Amazon S3 can be two to twelve times faster than


with tape drives. As one example, in a benchmark test a DBA was able to
restore 3.8 terabytes in 2.5 hours over gigabit Ethernet. This amounts to 25
gigabytes per minute, or 422MB per second. In addition, since Amazon.com
uses RMAN data compression, the effective restore rate was
3.37 gigabytes per second. This 2.5 hours compares to, conservatively, 10-15
hours that would be required to restore from tape.

Easy implementation of Oracle RMAN backups to Amazon S3. The DBAs


found it easy to start backing up their databases to Amazon S3. Directing
Oracle RMAN backups to Amazon S3 requires

only a configuration of the Oracle Secure Backup Cloud (SBC) module. The
effort required to configure the Oracle SBC module amounted to an hour or
less per database. After this one- time setup, the database backups were
transparently redirected to Amazon S3.

Durable data storage provided by Amazon S3, which is designed for 11 nines
durability. On occasion, Amazon.com has experienced hardware failures with
tape infrastructure – tapes that break, tape drives that fail, and robotic
components that fail. Sometimes this happens when a DBA is trying to restore
a database, and dramatically increases the mean time to recover (MTTR).
With the durability and availability of Amazon S3, these issues are no longer
a concern.

Freeing up valuable human resources. With tape infrastructure, Amazon.com


had to seek out engineers who were experienced with very large tape backup
installations – a specialized, vendor-specific skill set that is difficult to find.
They also needed to hire data center technicians and dedicate them to
problem-solving and troubleshooting hardware issues – replacing drives,
shuffling tapes around, shipping and tracking tapes, and so on. Amazon S3
allowed them to free up these specialists from day-to-day operations so that
they can work on more valuable, business-critical engineering tasks.

Elimination of physical tape transport to off-site location. Any company that


has been storing Oracle backup data offsite should take a hard look at the
costs involved in transporting, securing and storing

their tapes offsite – these costs can be reduced or possibly eliminated by


storing the data in Amazon S3.
90

As the world‘s largest online retailer, Amazon.com continuously innovates in


order to provide improved customer experience and offer products at the lowest
possible prices. One such innovation has been to replace tape with Amazon S3
storage for database backups. This
91

innovation is one that can be easily replicated by other organizations that back
up their Oracle databases to tape.

Products & Services:-

● Compute

● Content Delivery

● Database

● Deployment & Management

● E-Commerce

● Messaging

● Monitoring

● Networking

● Payments & Billing

● Storage

● Support

● Web Traffic

● Workforce

Products & Services

Compute
● Amazon Elastic Compute Cloud (EC2)
Amazon Elastic Compute Cloud delivers scalable, pay-as-you-go compute capacity in
92

the cloud.
● Amazon Elastic MapReduce
Amazon Elastic MapReduce is a web service that enables businesses, researchers,
data
analysts, and developers to easily and cost-effectively process vast amounts of data.
● Auto Scaling
Auto Scaling allows to automatically scale our Amazon EC2capacity up or down
according
to conditions we define.
Content Delivery
● Amazon CloudFront
Amazon CloudFront is a web service that makes it easy to distribute content with
low latency via a global network of edge locations.

Database

● Amazon SimpleDB
Amazon SimpleDB works in conjunction with Amazon S3 and AmazonEC2 to run
queries
on structured data in real time.

● Amazon Relational Database Service (RDS)


Amazon Relational Database Service is a web service that makes it easy to set up,
operate,
and scale a relational database in the cloud.

● Amazon ElastiCache
Amazon ElastiCache is a web service that makes it easy to deploy, operate, and
scale an in-
memory cache in the cloud.

E-Commerce
93

● Amazon Fulfillment Web Service (FWS)


Amazon Fulfillment Web Service allows merchants to deliver products using
Amazon.com‘s

Deployment & Management


● AWS Elastic Beanstalk
AWS Elastic Beanstalk is an even easier way to quickly deploy and manage
applications in
the AWS cloud. We simply upload our application, and Elastic Beanstalk
automatically handles the deployment details of capacity provisioning,
load balancing, auto-scaling, and application health monitoring.
● AWS CloudFormation
AWS CloudFormation is a service that gives developers and businesses an easy way
to create
a collection of related AWS resources and provision them in an orderly
and predictable fashion.
Monitoring
● Amazon CloudWatch
Amazon CloudWatch is a web service that provides monitoring for AWS
cloud resources,
starting with Amazon EC2.
Messaging
● Amazon Simple Queue Service (SQS)
Amazon Simple Queue Service provides a hosted queue for storing messages as
they travel
between computers, making it easy to build automated workflow between Web
services.

● Amazon Simple Notification Service (SNS)


Amazon Simple Notification Service is a web service that makes it easy to set
up, operate,and send notifications from the cloud.
94

● Amazon Simple Email Service (SES)


Amazon Simple Email Service is a highly scalable and cost-effective bulk and
transactional
email-sending service for the cloud.

Workforce
● Amazon Mechanical Turk
Amazon Mechanical Turk enables companies to access thousands of global
workers on
demand and programmatically integrate their work into various business
processes.
Networking
● Amazon Route 53
Amazon Route 53 is a highly available and scalable Domain Name System
(DNS) web service.
● Amazon Virtual Private Cloud (VPC)
Amazon Virtual Private Cloud (Amazon VPC) lets you provision a
private, isolated section
of the Amazon Web Services (AWS) Cloud where we can launch AWS
resources in a virtual

network that you define. With Amazon VPC, we can define a virtual
network topology that closely resembles a traditional network that you
might operate in your own datacenter.

● AWS Direct Connect


AWS Direct Connect makes it easy to establish a dedicated network
connection from your
premise to AWS, which in many cases can reduce our network costs, increase
bandwidth
95

throughput, and provide a more consistent network experience than Internet-


based connections.
● Elastic Load Balancing
Elastic Load Balancing automatically distributes incoming application
traffic across multiple
Amazon EC2 instances.
Payments & Billing
● Amazon Flexible Payments Service (FPS)
Amazon Flexible Payments Service facilitates the digital transfer of
money between any two
entities, humans or computers.
● Amazon DevPay
Amazon DevPay is a billing and account management service which
enables developers to
collect payment for their AWS applications.
Support
● AWS Premium Support
AWS Premium Support is a one-on-one, fast-response support channel to
help you build and run applications on AWS Infrastructure Services.

Web Traffic
● Alexa Web Information Service
Alexa Web Information Service makes Alexa‘s huge repository of data
about structure and
traffic patterns on the Web available to developers.

● Alexa Top Sites


Alexa Top Sites exposes global website traffic data as it is continuously
collected and
updated by Alexa Traffic Rank.
96

Amazon Simple Queue Service (Amazon SQS)


● Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable,
hosted
queue for storing messages as they travel between computers.

● By using Amazon SQS, developers can simply move data between distributed
components of their applications that perform different tasks, without losing
messages or requiring each component to be always available.

● Amazon SQS makes it easy to build an automated workflow, working in close


conjunction with the Amazon Elastic Compute Cloud (Amazon EC2) and the
other AWS infrastructure web.
Amazon Simple Queue Service (Amazon SQS)
● Amazon SQS works by exposing Amazon_s web-scale messaging infrastructure as a
web
service.
● Any computer on the Internet can add or read messages without any installed software
or
special firewall configurations.
● Components of applications using Amazon SQS can run independently, and do not
need
to be on the same network, developed with the same technologies, or
running at the same time.
The Google File System(GFS)
● The Google File System (GFS) is designed to meet the rapidly
growing demands of Google_s data processing needs.
● It provides fault tolerance while running on inexpensive commodity
hardware, and it delivers high aggregate performance to a large
number of clients.
97

● While sharing many of the same goals as previous distributed file


systems, file system has successfully met our storage needs.
● It is widely deployed within Google as the storage platform for the
generation and processing of data used by our service as well as
research and development efforts that require large data sets.
● The largest cluster to date provides hundreds of terabytes of storage
across thousands
of disks on over a thousand machines, and it is concurrently accessed
by hundreds of clients.

Conclusion:
Thus we have studied a case study on amazon web services.

You might also like