Cloud Raghav 1
Cloud Raghav 1
PRACTICAL FILE
CLOUD
COMPUTING
INDEX
EXPERIMENT NO. - 1
Theory :
Google Sheets is a spreadsheet app on steroids. It looks and functions much like
any other spreadsheet tool, but because it's an online app, it offers much more than
most spreadsheet tools. Here are some of the things that make it so much better:
● It’s a web-based spreadsheet that you can use anywhere — no more forgetting
your spreadsheet files at home.
● It works from any device, with mobile apps for iOS and Android along with its
web-based core app.
● Google Sheets is free, and it's bundled with Google Drive, Docs, and Slides to
share files, documents, and presentations online.
● It includes almost all of the same spreadsheet functions—if you know how to
use Excel, you'll feel at home in Google Sheets.
● You can download add-ons, create your own, and write custom code.
● It's online, so you can gather data with your spreadsheet automatically and do
almost anything you want, even when your spreadsheet isn't open.
Whether you’re a spreadsheet novice or an Excel veteran looking for a better way
to collaborate, this book will help you get the most out of Google Sheets. We'll
start out with the basics in this chapter—then keep reading to learn Google Sheets'
advanced features, find its best add-ons, and learn how to build your own.
1. Click the red "NEW" button on your your Google Drive dashboard and
select "Google Sheets"
2. Open the menu from within a spreadsheet and select "File > New
Spreadsheet"
3. Click "Blank" or select a template on the Google Sheets homepage.
6
Print, Undo / Redo, and the Font Settings / Styling function similarly to what you'd
expect from your favorite word processor. The shortcut keys are the same as well,
so just treat it like you’re editing any other document.
● COUNT: counts the values in a range of cells (ex: 1,blank,3,4,5 = 4 total cells
with values)
● MAX: finds the highest value in a range of cells (ex: 1,2,3,4,5 = 5 is the
highest)
● MIN: finds the lowest value in a range of cells (ex: 1,2,3,4,5 = 1 is the lowest)
● Basic Arithmetic: You can also perform functions like addition, subtraction,
and multiplication directly in a cell without calling a formula.
I’ve added some faux minimum and maximum prices per unit on my ingredients
list to the right of my breakfast options. We’ll want to get an average price for each
ingredient using low and high rates, then multiply the resulting average price of
the ingredient by respective unit count in each recipe.
I’ll start by highlighting the range of values (in this case it’s two side-by-side
rather than a vertical range) and selecting the AVERAGE formula from the
toolbar.
manually type a formula into the "Avg Price" row.Our basic arithmetic formula
would look like this for the "Scrambled Eggs" column:
=$I2*B2+$I3*B3+$I4*B4+$I5*B5+$I6*B6+$I7*B7+$I8*B8
The $ symbol before column I (the average prices) tells Sheets that no matter
where we put the formula in our spreadsheet, we always want to reference the I
column.
4. Share, Protect, and Move Your Data :What makes Sheets so powerful is
how
"in sync" you'll feel with your coworkers. Jointly editing a spreadsheet is one of
the critical functions of Sheets, and Google has made it a seamless experience.
Here’s how it works:
1. Click either FILE > SHARE or use the blue "Share" button in the top right
2. Click "advanced", then enter emails of who can view or edit your
spreadsheet
3. Select any other privacy options and hit done.
Conclusion:
Thus we performed various operation like sum, count, average etc. in the google
drive spreadsheet.
10
EXPERIMENT NO. – 2
Theory :
Features :
Editing :
Collaboration and revision history
Google Docs and the other apps in the Google Drive suite serve as
a collaborative tool for cooperative editing of documents in real-time. Documents
can be shared, opened, and edited by multiple users simultaneously and users are
able to see character-by-character changes as other collaborators make edits.
Changes are automatically saved to Google's servers, and a revision history is
automatically kept so past edits may be viewed and reverted to.[19] An editor's
current position is represented with an editor-specific color/cursor, so if another
editor happens to be viewing that part of the document they can see edits as they
11
Explore
In March 2014. Google introduced add-ons;. new tools from third-party developers that
add more features for Google Docs. In order to view and edit documents offline on a
computer, users need to be using the Google Chrome web browser. A Chrome extension
Google Docs Offline', allows users to enable offline support for Docs files on the Google
Drive website. The Android and iOS apps natively support offline editing.
In June 2014, Google introduced "Suggested edits" in Google Docs; as part of the
"commenting access" permission, participants can come up with suggestions for edits that
the author can accept or reject, in contrast to full editing ability. In October 2016, Google
announced "Action items" for Docs. If a user writes phrases such as "Ryan to follow up
on the keynote script", the service will intelligently assign
that action to "Ryan". Google states this will make it easier for other collaborators to see
which person
is responsible for what task. When a user visits Google Drive, Docs, Sheets or Slides, any
files with
tasks assigned to them will be highlighted with a badge.
A basic research tool was introduced in 2012. later expanded into "Explore", launched in
September 2016, enabling additional functionality through machine learning. In Google
Docs, Explore shows relevant Google search results based on information in the
document, simplifying information gathering. Users can also mark specific document
text, press Explore and see search results based on the marked text only.
In December 2016, Google introduced a quick citations feature to Google Docs. The
quick citation tool allows users to "insert citations as footnotes with the click of a button"
on the web through the Explore feature introduced in September. The citation feature also
marked the launch of the Explore functionalities in G Suite for Education accounts.
Files :
Supported file formats:
Files in the following formats can be viewed and converted to the Docs format:
12
● For documents: .doc (if newer than Microsoft Office 95), .docx, .docm .dot,
.dotx, .dotm, .html, plain text (.txt), .rtf, .odt
File limits
Limits to insertable file sizes, overall document length and size are listed below:
You can also create new documents from the URL docs.google.com/create.
Conclusion:
Thus we performed different operation like creation, deletion, sharing of file and
docs in the google drive.
15
EXPERIMENT NO. – 3
Theory :
● Code deployments can take longer as container images are built by using the Cloud
Build
service.
● The geographical region of a flexible environment VM instance is determined by
the location
that you specify for the App Engine application of your Cloud project. Google's
management services ensures that the VM instances are co-located for optimal
performance.
Open the chrome and serch for google cloud app engine.move to the given link and
page.
https://cloud.google.com/appengine/docs.
Then choose the programming language suitable for your project.
The user need to specify the environment suitable for project development. It
consist of two major environment.
1. Standard environment.
17
2. Flexible environment.
To install the Google Cloud SDK, initialize it, and run core gcloud commands
from the command-line.
1. Create a Google Cloud Platform project, if you don't have one already.
2. Download the Google Cloud SDK installer.
3. Launch the installer and follow the prompts.
18
After successful installation of cloud SDK there will be a command shell of google
cloud app engine available. Open the shell and use the keyword “y” to proceed
forward and create a project name which will run the program required.
Run these gcloud commands to view information about your SDK installation:
disable_usage_reporting = False
project = example-project
3. To view information about your Cloud SDK installation and the active SDK
configuration:
gcloud info
gcloud displays a summary of information about your Cloud SDK installation.
This includes information about your system, the installed SDK components, the
active user account and current project, and the properties in the active SDK
configuration.
4. To view information about gcloud commands and other topics from the command
line:
gcloud help
For example, to view the help for gcloud compute instances create:
gcloud help compute instances create
gcloud displays a help topic that contains a description of the command, a list of
command flags and arguments, and examples of how to use it.
Conclusion:
Thus we successfully installed the google app engine using the google cloud SDK
which is used as development platform for different programming language.
20
EXPERIMENT NO. – 4
Theory :
● Create a Google Cloud Platform project, if you don't have one already.
● Download the Google Cloud SDK installer.
● Launch the installer and follow the prompts.
● After installation has completed, the installer presents several options:
After successful installation of cloud SDK there will be a command shell of google
cloud app engine available. Open the shell and use the keyword “y” to proceed
forward and create a project name which will run the program required which will
help in the setting of the whole shell and environment of google app engine.
22
Then the bin file is used to execute the dev_appserver.py file, which a python file
used for the initial setup of the environment and it will also automatically update
and remove any old file in the python library as well as environment.
The file which the user want to execute is placed in the bin folder or it can be
placed anywhere which can be executed using the cloud services of the google app
engine. Now to show you execution of a python program that will execute the
program of GUI calculator.
23
Similarly the google app engine can be used to create enormous amount of apps
and can be used in other heavy computation models as well as in the machine
learning areas.
Conclusion :
Thus we have executed the python program using the google app engine.
24
Experiment No. 5
Theory:
The techniques and features that Virtual Box provides are useful for several
scenarios:
When dealing with virtualization (and also for understanding the following
chapters of this documentation), it helps to acquaint oneself with a bit of crucial
terminology, especially the following terms:
Host operating system (host OS). This is the operating system of the
physical computer on which Virtual Box was installed. There are versions of
26
Virtual machine (VM). This is the special environment that Virtual Box
creates for your guest operating system while it is running. In other words, you
run your guest operating system "in" a VM. Normally, a VM will be shown as a
window on your computers desktop, but depending on which of the various
frontends of VirtualBox you use, it can be displayed in full screen mode or
remotely on another computer. In a more abstract way, internally, VirtualBox
thinks of a VM as a set of parameters that determine its behavior. They include
hardware settings (how much memory the VM should have, what hard disks
VirtualBox should virtualize through which container files, what CDs are
mounted etc.) as well as state information (whether the VM
is currently running, saved, its snapshots etc.). These settings are mirrored in the
VirtualBox Manager window as well as the VBoxManage command line
program.
27
Guest Additions. This refers to special software packages which are shipped with
VirtualBox but designed to be installed inside a VM to improve performance of the guest OS
and to add extra features.
When you start VirtualBox for the first time, a window like the following should come up:
28
29
This window is called the "VirtualBox Manager". On the left, you can see a
pane that will later list all your virtual machines. Since you have not created any,
the list is empty. A row of buttons above it allows you to create new VMs and
work on existing VMs, once you have some. The pane on the right displays the
properties of the virtual machine currently selected, if any. Again, since you don't
have any machines yet, the pane displays a welcome message.
To give you an idea what VirtualBox might look like later, after you have created
many machines, here's another example:
Click on the "New" button at the top of the VirtualBox Manager window. A
wizard will pop up to guide you through setting up a new virtual machine (VM)
30
On the following pages, the wizard will ask you for the bare minimum of
information that is needed to
For "Operating System Type", select the operating system that you
want to install later. The supported operating systems are grouped; if you
want to install something very unusual that is not listed, select "Other".
Depending on your selection, Virtual Box will enable or disable certain
VM settings that your guest operating system may require. This is
particularly important for 64-bit guests (see Section 3.1.2,64-bit guests).
31
On the next page, select the memory (RAM) that Virtual Box should
allocate every time the virtual machine is started. The amount of memory
given here will be taken away from your host machine and presented to
the guest operating system, which will report this size as the (virtual)
computer's installed RAM.
Next, you must specify a virtual hard disk for your VM. There are many and
potentially complicated ways in which VirtualBox can provide hard disk space
to a VM (see Chapter 5, Virtual storage for details), but the most common way
is to use a large image file on your "real" hard disk, whose contents VirtualBox
presents to your VM as if it were a complete hard disk. This file represents an
entire hard disk then, so you can even copy it to another host and use it with
another VirtualBox installation.
33
● To create a new, empty virtual hard disk, press the "New" button.
● You can pick an existing disk image file. The drop-down list presented
in the window contains all disk images which are currently remembered
by VirtualBox, probably because they are currently attached to a virtual
machine (or have been in the past). Alternatively, you can click on the
small folder button next to the drop-down list to bring up a standard file
dialog, which allows you to pick any disk image file on your host disk.
Most probably, if you are using VirtualBox for the first time, you will want
to create a new disk image. Hence, press the "New" button. This brings up
another window, the "Create New Virtual Disk Wizard", which helps you
create a new disk image file in the new virtual machine's folder.
● A dynamically allocated file will only grow in size when the guest
actually stores data on its virtual hard disk. It will therefore initially
be small on the host hard drive and only later grow to the size
specified as it is filled with data.
For details about the differences, please refer to Section 5.2,Disk image
files (VDI, VMDK, VHD, HDD.
After having selected or created your image file, again press "Next" to go to the next
page.
After clicking on "Finish", your new virtual machine will be created. You
will then see it in the list on the left side of the Manager window, with the
name you entered initially.
35
Running your virtual machine: To start a virtual machine, you have several options:
select its entry in the list in the Manager window it and press the "Start"
button at the top or
for virtual machines created with VirtualBox 4.0 or later, navigate to the
"VirtualBox VMs" folder in your system user's home directory, find the
subdirectory of the machine you want to start and double-click on the
machine settings file (with a .vbox file extension). This opens up a new
window, and the virtual machine which you selected will boot up.
Everything which would normally be seen on the virtual system's monitor
is shown in the window. In general, you can use the virtual machine much
like you would use a real computer. There are couple of points worth
mentioning however.
Saving the state of the machine: When you click on the "Close" button of
36
your virtual machine window (at the top right of the window, just like you
would close any other window on you
37
system), VirtualBox asks you whether you want to "save" or "power off" the
VM. (As a shortcut, you can also press the Host key together with "Q".)
Save the machine state: With this option, VirtualBox "freezes" the
virtual machine by completely saving its state to your local disk. When
you start the VM again later, you will find that the VM continues exactly
where it was left off. All your programs will still be open, and your
computer resumes operation. Saving the state of a virtual machine is thus
in some ways similar to suspending a laptop computer (e.g. by closing its
lid).
Send the shutdown signal. This will send an ACPI shutdown signal to
the virtual machine, which has the same effect as if you had pressed the
power button on a real computer. So long as the VM is running a fairly
modern operating system, this should trigger a proper shutdown
mechanism from within the VM.
Power off the machine: With this option, VirtualBox also stops running
the virtual machine, but without saving its state. As an exception, if your
virtual machine has any snapshots (see the next chapter), you can use this
option to quickly restore the current snapshot of the virtual.
Machine. In that case, powering off the machine will not disrupt its
state, but any changes made since that snapshot was taken will be lost.
The "Discard" button in the VirtualBox.
Manager window discards a virtual machine's saved state. This has the
38
VirtualBox can import and export virtual machines in the industry-standard Open
Virtualization Format (OVF). OVF is a cross-platform standard supported by
many virtualization products which allows for creating ready-made virtual
machines that can then be imported into a virtualizer such as VirtualBox.
VirtualBox makes OVF import and export easy to access and supports it from the
Manager window as well as its command-line interface. This allows for
packaging so-called virtual appliances: disk images together with configuration
settings that can be distributed easily. This way one can offer complete ready-to-
use software packages (operating systems with applications) that need no
configuration or installation except for importing into VirtualBox.
They can come in several files, as one or several disk images, typically in
the widely- used VMDK format (see Section 5.2,Disk image files (VDI,
VMDK, VHD, HDD)‖) and a textual description file in an XML dialect
with an .ovf extension. These files must then reside in the same directory
for Virtual Box to be able to import them.
Alternatively, the above files can be packed together into a single archive
file, typically with an .ova extension. (Such archive files use a variant of
the TAR archive format and can therefore be unpacked outside of Virtual
Box with any utility that can unpack standard TAR files.)
Select "File" -> "Export appliance". A different dialog window shows up that
allows you to combine several virtual machines into an OVF appliance. Then,
select the target location where the target files should be stored, and the
conversion process begins. This can again take a while.
Conclusion:
Thus we have studied use of Multiple OS using Virtual Box by virtualizing.
39
Experiment No. - 6
Theory:
JAVA INSTALLATION
cd ~/Downloads
cd /usr/local/java
JAVA_HOME=/usr/local/j
ava/jdk1.7.0_4
PATH=$PATH:$HOME/bi
n:$JAVA_HOME
/binJRE_HOME=/usr/local/j
ava/jdk1.7.0_45/j
rePATH=$PATH:$HOME/b
in:$JRE_HOME/
binHADOOP_HOME=/hom
e/hadoop/adoop- 1.2.1
PATH=$PATH:$HADOOP
_HOME/binexpor t
JAVA_HOME export
JRE_HOME export PATH
. /etc/profile
java -version
HADOOP INSTALLATION
42
open Home
cd hadoop/
ls -a
. .. hadoop-1.2.1 hadoop-
1.2.1.tar.gz 25. edit the file
conf/hadoop-env.sh
26. cd hadoop-1.2.1
------------------STANDALONE OPERATION----------------
mkdir input
cp conf/*.xml input
cat output/*
conf/core-site.xml:
<configuration> <property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
43
conf/hdfs-site.xml:
<configuration> <property>
<name>dfs.replication</name>
<value>1</value>
</property></configuration>
44
● conf/mapred-site.xml:
<configuration> <property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
ssh localhost
bin/start-all.sh
2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode
Conclusion:
45
Experiment No. - 7
Theory:
the first reducer is assigned w1 and w2 while the second one handles w3 and
w4. In fact during the map
48
phase itself each mapper writes one file per reducer, based on the words assigned
to each reducer, and keeps the master informed of these file locations. The master
in turn informs the reducers where the partial counts for their words have been
stored on the local files of respective mappers; the reducers then make remote
procedure call requests to the mappers to fetch these. Each reducer performs a
reduce‘ operation that sums up the frequencies for each word, which are finally
written back to the GFS file system
In our example each map operation takes a document indexed by its id and
49
emits a list if word- count pairs indexed by word-id: (dk, [w1 wn]) → [(wi, ci)].
The reduce operation groups the
results of the map step using the same key k2 and performs a function f on the
list of values that correspond to each.
50
describe one more example, that of indexing a large collection of documents, or, for
that matter any data including database records: The map task consists of emitting a
word-document/record id pair for each word: (dk, [w1 . . .wn]) → [(wi, dk)]. The
reduce step groups the pairs by word and creates an index entry for each word: [(wi,
dk)] → (wi, [di1 . . .
dim]).
Indexing large collections is not only important in web search, but also a critical
aspect of handling structured data; so it is important to know that it can be
executed efficiently in parallel using . Traditional parallel databases focus on
rapid query execution against data warehouses that are updated infrequently; as
a result these systems often do not parallelize index creation sufficiently well.
bin/hadoop fs -ls /
Found 1 items
Found 1 items
imp
ort
jav
a.io
.IO
Exc
epti
on;
imp
ort
jav
a.ut
il.*
;
import
org.apache.had
oop.fs.Path;
import
org.apache.had
oop.conf.*;
import
org.apache.had
oop.io.*;
import
org.apache.had
oop.mapred.*;
import
org.apache.had
oop.util.*;
{
String line = value.toString();
StringTokeniz
er tokenizer =
new
StringTokenize
r(line); While
(tokenizer.has
MoreTokens())
word.set(token
izer.nextToke
n());
output.collect(
word, one);
throws IOException {
int sum = 0;
while (values.hasNext())
sum += values.next().get();
}}
new
JobConf(WCE.clas
s);
conf.setJobName("
wordcount");
conf.setOutputKey
Class(Text.class);
conf.setOutputValueClass(Int
Writable.class);
conf.setMapperClass(Map.clas
s);
conf.setCombinerClass(Reduce
.class);
conf.setReducerClass(Reduce.class);
FileInputFormat.addInputPath(conf,
new Path(args[0]));
55
FileOutputFormat.setOutputPath(conf,
new Path(args[1]));
JobClient.runJob(conf);
}}
● Right Click on project Name → New File sample → type something in the
sample fle.
● Right click on project →Export →Click on java → Provide a JAR file name →
Select the Location where to save the jar file.
● Right Click on project Name →Run as → Run Configuration→ Java Application
→new→ In main→
WordCount→click on Search and click on the JAR File which you have created→
Click on Arguments → Provide under Program arguments → sample output
→Click on Run.
● Right click on the project → Refresh→ An output file is created in your project
Conclusion:
Hence we have implemented Map Reduce example such as Word
Count program on an file which will count the no.of times a word
repeats in the given file.
56
Experiment No. - 8
Theory:
Platform-as-a-Service (PaaS):
Cloud computing has evolved to include platforms for building and running
custom web-based applications, a concept known as Platform-as-a- Service.
PaaS is an outgrowth of the SaaS application delivery model. The PaaS model
makes all of the facilities required to support the complete life cycle of building
and delivering web applications and services entirely available from the
Internet, all with no software downloads or installation for developers, IT
managers, or end users. Unlike the IaaS model, where developers may create a
specific operating system instance with homegrown applications running, PaaS
developers are concerned only with webbased development and generally do
not care what operating system is used. PaaS services allow users to focus on
innovation rather than complex infrastructure. Organizations can redirect a
significant portion of their budgets to creating applications that provide real
business value instead of worrying about all the infrastructure issues in a roll-
your-own delivery model. The PaaS model is thus driving a new era of mass
innovation. Now, developers around the world can access unlimited computing
power. Anyone with an Internet connection can build powerful applications
and easily deploy them to users globally.
Architecture :
The Google App Engine (GAE) is Google`s answer to the ongoing trend of
Cloud Computing offerings within the industry. In the traditional sense, GAE is
a web application hosting service, allowing for development and deployment of
web-based applications within a pre- defined runtime environment. Unlike other
cloud-based hosting offerings such as Amazon Web Services that operate on an
IaaS level, the GAE already provides an application infrastructure on the PaaS
level. This means that the GAE
57
abstracts from the underlying hardware and operating system layers by providing
the hosted application with a set of application-oriented services. While this
approach is very convenient for
developers of such applications, the rationale behind the GAE is its focus on
scalability and usage-based infrastructure as well as payment.
Costs :
Developing and deploying applications for the GAE is generally free of charge
but restricted to a
certain amount of traffic generated by the deployed application. Once this limit
is reached within a certain time period, the application stops working. However,
this limit can be waived when switching to a billable quota where the developer
can enter a maximum budget that can be spent on an application per day.
Depending on the traffic, once the free quota is reached the application will
continue to work until the maximum budget for this day is reached. Table 1
summarizes some of the in our opinion most important quotas and corresponding
amount per unit that is charged when free resources are depleted and additional,
billable quota is desired.
Features :
a Runtime Environment, the Data store and the App Engine services, the
GAE can be divided into three parts.With
Runtime Environment
The GAE runtime environment presents itself as the place where the actual
application is executed. However, the application is only invoked once an HTTP
request is processed to the GAE via a web browser or some other interface,
meaning that the application is not constantly running if no invocation or
processing has been done. In case of such an HTTP request, the request handler
forwards the request and the GAE selects one out of many possible Google
servers where the application is then instantly deployed and executed for a certain
amount of time (8). The application may then do some computing and return the
result back to the GAE request handler which forwards an HTTP response to the
client. It is important to understand that the application runs completely
embedded in this described sandbox environment but only as long as requests are
still coming in or some processing is done within the application. The reason for
this is simple: Applications should only run when they are actually computing,
58
otherwise they would allocate precious computing power and memory without
need. This paradigm shows already the GAE‘s potential in terms of scalability.
Being able to run multiple instances of one application independently on
different servers guarantees for a decent level of scalability. However, this highly
flexible and stateless application execution paradigm has its limitations.
Requests are processed no longer than 30 seconds after which the response has
to be returned to
the client and the application is removed from the runtime environment again.
Obviously this
method optimizing for several subsequent requests to the same application.
The type of runtime environment on the Google servers is dependent on the
programming language used.
59
For Java or other languages that have support for Java-based compilers (such as
JRuby, Rhino and Groovy) a Java-based Java Virtual Machine (JVM) is
provided. Also, GAE fully supports the Google Web Toolkit (GWT), a
framework for rich web applications. For Python and related frameworks a
Python-based environment is used.
Services
MEM CACHE
The platform innate memory cache service serves as a short-term storage. As its
name suggests, it stores data in a server‘s memory allowing for faster access
compared to the datastore. Memcache is a non-persistent data store that should
only be used to store temporary data within a series of computations. Probably
the most common use case for Memcache is to store session specific data (15).
Persisting session information in the datastore and executing queries on every
page interaction is highly inefficient over the application lifetime, since session-
owner instances are unique per session (16). Moreover, Memcache is well suited
to speed up common datastore queries (8). To interact with the Memcache
GAE supports JCache, a proposed interface standard for memory caches (17).
URL FETCH
62
Because the GAE restrictions do not allow opening sockets (18), a URL Fetch
service can be used to send HTTP or HTTPS requests to other servers on the
Internet. This service works asynchronously, giving the remote server some time
to respond while the request handler can do
63
other things in the meantime. After the server has answered, the URL Fetch
service returns response code as well as header and body. Using the Google
Secure Data Connector an application can even access servers behind a
company‘s firewall (8).
The GAE also offers a mail service that allows sending and receiving email
messages. Mails can be sent out directly from the application either on behalf of
the application‘s administrator or on behalf of userswith Google Accounts.
Moreover, an application can receive emails in the form of HTTP requests
initiated by the App Engine and posted to the app at multiple addresses. In
contrast to incoming emails, outgoing messages may also have an attachment up
to 1 MB (8).
XMPP
In analogy to the mail service a similar service exists for instant messaging,
allowing an application to send and receive instant messages when deployed to
the GAE. The service allows communication to and from any instant messaging
service compatible to XMPP (8), a set of open technologies for instant messaging
and related tasks (19).
IMAGES
Google also integrated a dedicated image manipulation service into the App
Engine. Using this service images can be resized, rotated, flipped or cropped (18).
Additionally it is able to combine several images into a single one, convert
between several image formats and enhance photographs. Of course the API also
provides information about format, dimensions and a histogram of color values
(8).
USERS
User authentication with GAE comes in two flavors. Developers can roll their
own authentication service using custom classes, tables and Memcache or simply
plug into Google‘s Accounts service.
Since for most applications the time and effort of creating a sign-up
64
not worth the trouble (18), the User service is a very convenient functionality
which gives an easy method for authenticating users within applications. As
byproduct thousands of Google Accounts are leveraged. The User service detects
if a user has signed in and otherwise redirect
the user to a sign-in page. Furthermore, it can detect whether the current user is
an administrator, which facilitates implementing admin-only areas within the
application (8).
OAUTH
The general idea behind OAuth is to allow a user to grant a third party limited
permission to access protected data without sharing username and password with
the third party. The OAuth specification separates between a consumer, which is
the application that seeks permission on accessing protected data, and the service
provider who is storing protected data on his users' behalf (20). Using Google
Accounts and the GAE API, applications can be an OAuth service provider (8).
BLOBSTORE
The general idea behind the blobstore is to allow applications to handle objects
that are much larger than the size allowed for objects in the datastore service.
Blob is short for binary large object and is designed to serve large files, such as
video or high quality images. Although blobs can have up to 2 GB they have to
be processed in portions, one MB at a time. This restriction was introduced to
smooth the curve of datastore traffic. To enable queries for blobs, each has a
corresponding blob info record which is persisted in the datastore (8), e. g. for
65
ADMINISTRATION CONSOLE
While the GAE is more targeted towards independent developers in need for a
hosting platform for their medium-sized applications, Google`s recently launched
App Engine for Business tries to target the corporate market. Although
technically mostly relying on the described GAE, Google added some enterprise
features and a new pricing scheme to make their cloud computing platform more
attractive for enterprise customers (21). Regarding the features, App Engine for
Business includes a central development manager that allows a central
administration of all applications deployed within one company including access
control lists. In addition to that Google now offers a 99.9% service level
agreement as well as premium developer support. Google also adjusted the
pricing scheme for their corporate customers by offering a fixed price of $8 per
user per application, up to a maximum of $1000, per month. Interestingly, unlike
the pricing scheme for the GAE, this offer includes unlimited processing power
for a fixed price of
$8 per user, application and month. From a technical point of view, Google tries
to accommodate for established industry standards, by now offering SQL
database support in addition to the existing Bigtable datastore described above
(8).
General Idea
In order to evaluate the flexibility and scalability of the GAE we tried to come up
with an application that relies heavily on scalability, i.e. collects large amounts
of data from external sources. That way we hoped to be able to test both
persistency and the gathering of data from external sources at large scale.
Therefore our idea has been to develop an application that connects people`s
delicious bookmarks with their respective Facebook accounts. People using our
application should be able to see what their
Facebook friends‘ delicious bookmarks are, provided their Facebook friends have
such a delicious account. This way a user can get a visualization of his friends‘
latest topics by looking at a generated tag cloud giving him a clue about the most
common and shared interests.
67
The Google cloud, called Google App Engine, is a ‗platform as a service‘ (PaaS)
offering. In contrast with the Amazon infrastructure as a service cloud, where
users explicitly provision virtual machines and control them fully, including
installing, compiling and running software on
68
platform is provided along with an SDK, using which users develop applications
and deploy them on the cloud. The PaaS platform is responsible for executing the
applications, including servicing external service requests, as well as running
scheduled jobs included in the application. By making the actual execution
servers transparent to the user, a PaaS platform is able to share application servers
across users who need lower capacities, as well as automatically scale resources
allocated to applications that experience heavy loads. Figure 5.2 depicts a user
view of Google App Engine. Users upload code, in either Java or Python, along
with related files, which are stored on the Google File System, a very large scale
fault tolerant and redundant storage system. It is important to note that an
application is immediately available on the internet as soon as it is successfully
uploaded (no virtual servers need to be explicitly provisioned as in IaaS).
(or if batch jobs run); in contrast, in an IaaS model merely making an application
continuously available incurs the full cost of keeping at least some of the servers
running all the time. Further, deploying applications in Google App Engine is
free, within usage limits; thus applications can be developed and tried out free
and begin to incur cost only when actually accessed by a sufficient volume of
requests. The PaaS model enables Google to provide such a free service because
69
Google Datastore:--
Applications persist data in the Google Datastore, which is also (like Amazon
SimpleDB) a non- relational database. The Datastore allows applications to
define structured types (called ‗kinds‘) and store their instances (called
‗entities‘) in a distributed manner on the GFS file system. While one can view
Datastore ‗kinds‘ as table structures and entities as records, there are important
differences between a relational model and the Datastore, some of which are also
illustrated in Figure 5.3.
Unlike a relational schema where all rows in a table have the same set of columns, all entities of
a
‗kind‘ need not have the same properties. Instead, additional properties can be added to any
71
entity.
This feature is particularly useful in situations where one cannot foresee all the potential
properties in
72
a model, especially those that occur occasionally for only a small subset of records.
For example, a model storing
‗products‘ of different types (shows, books, etc.) would need to allow each
product to have a different set of features. In a relational model, this would
probably be implemented using a separate FEATURES table, as shown on the
bottom left of Figure 5.3. Using the Datastore, this table (‗kind‘) is not required;
instead, each product entity can be assigned a different set of properties at
runtime. The Datastore allows simple queries with conditions, such as the first
query shown in Figure 5.3 to retrieve all customers having names in some
lexicographic range. The query syntax (called GQL) is essentially the same as
SQL, but with some restrictions. For example, all inequality conditions in a
querymust be on a single property; so a query that also filtered customers on, say,
their ‗type‘, would be illegal in GQL but allowed in SQL.
‗kind‘ includes a reference to its customer, which is an entity of the Custs ‗kind.‘
Further, relationships defined by such references can be traversed in both
directions, so not only can one directly access the customer of an account, but
also all accounts of a given customer, without executing any query operation, as
shown in the figure.
GQL queries cannot execute joins between models. Joins are critical
when using SQL to efficiently retrieve data from multiple tables. For example,
the query shown in the figure retrieves details of all products bought by a
particular customer, for which it needs to join data from the transactions (TXNS),
products (PRODS) and product features (FEATURES) tables. Even though GQL
does not allow joins, its ability to traverse associations between entities often
enables joins to be avoided, as shown in the figure for the above example: By
storing references to customers and products in the Txns model, it is possible to
retrieve all transactions for a given customer through a reverse traversal of the
customer reference. The product references in each transaction then yield all
products and their features (as discussed earlier, a separate Features model is not
73
atomic; however this also requires that each transaction only manipulates entities
belonging to the same entity group. While this transaction model suffices for most
on line applications, complex batch updates that update many unrelated entities
cannot execute atomically, unlike in a relational database where there are no such
restrictions.
Amazon SimpleDB:--
etc.
Many third-parties run their apps off Amazon EC2, and interface to Facebook via its
APIs
PaaS
IaaS
Facebook itself makes heavy use of PaaS services for their own private cloud
Key problems: how to analyze logs, make suggestions, determine which ads to place.
"
i
d
"
:
"
1
0
77
7
4
7
2
4
7
1
2
"
,
"age_range": {
"min": 21
},
"locale":
"en_US",
"location": {
"id": "101881036520836",
"name": "Philadelphia,
Pennsylvania"
}
78
https://graph.facebook.com/(identifier)?fields=(fieldList)
Response is in JSON
/1074724712 or /andreas.haeberlen
Opaque string; encodes specific permissions (access user location, but not interests,
etc.)
∙
79
to help guide development on different components
∙
to report on ad performance
∙
recommendations
Ad hoc analysis:
Answer questions on historical data – to help with managerial decisions
Archival of logs
Spam detection
Ad optimization
PAAS AT FACEBOOK:
Scribe – open source logging, actually records the data that will be analyzed by
Hadoop
Hadoop (MapReduce – discussed next time) as batch processing engine for data
analysis
As of 2009: 2nd largest Hadoop cluster in the world, 2400 cores, > 2PB data
with
> 10TB added every day
Hive – SQL over Hadoop, used to write the data analysis queries
Main axes:
Account, campaign, ad
Time period
Type of interaction
Users
Conclusion :
Cloud Computing remains the number one hype topic within the IT industry at
present. Our evaluation of the Google App Engine and facebook has shown both
functionality and limitations of the platform. Developing and deploying an
application within the GAE is in fact quite easy and in a way shows the progress
that software development and deployment has made. Within our application we
were able to use the abstractions provided by the GAE without problems,
although the concept of Bigtable requires a big change in mindset when
developing. Our scalability testing showed the limitations of the GAE at this point
in time. Although being an extremely helpful feature and a great USP for the
GAE, the built-in scalability of the GAE suffers from both purposely-set as well
as technical restrictions at the moment. Coming back to our motivation of
evaluating the GAE in terms of its sufficiency for serious large-scale applications
in a professional environment, we have to conclude that the GAE not (yet) fulfills
business needs for enterprise applications at present.
82
Experiment No. - 9
🡪
The ultimate benefit of cloud computing, and AWS, is the ability to leverage a new
business
model and turn capital infrastructure expenses into variable costs.
Businesses no longer need to plan and procure servers and other IT resources weeks or
months inadvance.
Using AWS, businesses can take advantage of Amazon's expertise and economies of
scale
to access resources when their business needs them, delivering results
faster and at a lower cost.
eliminates tape capacity planning for backup and archive, and frees up
administrative staff for higher value operations. The company was able
to replace their backup tape infrastructure with cloud-based Amazon S3
storage, eliminate backup software, and experienced a 12X
performance improvement, reducing
restore time from around 15 hours to 2.5 hours in select scenarios.
84
With data center locations in the U.S., Europe, Singapore, and Japan, customers across
all
industries
are taking advantage of the following benefits:
Low Cos
Secure
The Challenge
Utilization and capacity planning is complex, and time and capital expense
budget are at a premium. Significant capital expenditures were required over
the years for tape hardware, data center space for this hardware, and enterprise
licensing fees for tape software. During that time, managing tape
infrastructure required highly skilled staff to spend time with setup,
certification and engineering archive planning instead of on higher value
projects. And at the end of every fiscal year, projecting future capacity
requirements required time consuming audits, forecasting, and budgeting.
The cost of backup software required to support multiple tape devices sneaks
up on you. Tape robots provide basic read/write capability, but in order to
fully utilize them, you must invest in proprietary tape backup software. For
Amazon.com, the cost of the software had been high, and added significantly
to overall backup costs. The cost of this software was an ongoing budgeting
pain point, but one that was difficult to address as long as backups needed to
be written to tape devices.
85
Maintaining reliable backups and being fast and efficient when retrieving data
requires a lot of time and effort with tape. When data needs to be durably
stored on tape, multiple copies are required. When everything is working
correctly, and there is minimal contention for tape resources, the tape robots
and backup software can easily find the required data. However, if there is a
hardware failure, human intervention is necessary to restore from tape.
Contention for tape drives resulting from multiple users‘ tape requests slows
down restore processes even more. This adds to the recovery time objective
(RTO) and makes achieving it more challenging compared to backing up to
Cloud storage.
Strong data security. Amazon.com found that AWS met all of their
requirements for physical security, security accreditations, and security
processes, protecting data in flight, data at rest, and utilizing suitable
encryption standards.
87
Strong data security. Amazon.com found that AWS met all of their
requirements for physical security, security accreditations, and security
processes, protecting data in flight, data at rest, and utilizing suitable
encryption standards.
The Benefits
benefits, including:
and more dynamic each year, both organically and as a result of acquisitions.
AWS has enabled Amazon.com to keep pace with this rapid expansion, and
to do so seamlessly. Historically, Amazon.com business groups have had to
write annual backup plans, quantifying the amount of tape storage that they
plan to use for the year and the frequency with which they will use the tape
resources. These plans are then used to charge each organization for their tape
usage, spreading the cost among many teams. With Amazon S3, teams simply
pay for what they use, and are billed for their usage as they go. There are
virtually no upper limits as to how much data can be stored in Amazon S3,
and so there are no worries about running out of resources. For teams adopting
Amazon S3 backups, the need for formal planning has been all but eliminated.
only a configuration of the Oracle Secure Backup Cloud (SBC) module. The
effort required to configure the Oracle SBC module amounted to an hour or
less per database. After this one- time setup, the database backups were
transparently redirected to Amazon S3.
Durable data storage provided by Amazon S3, which is designed for 11 nines
durability. On occasion, Amazon.com has experienced hardware failures with
tape infrastructure – tapes that break, tape drives that fail, and robotic
components that fail. Sometimes this happens when a DBA is trying to restore
a database, and dramatically increases the mean time to recover (MTTR).
With the durability and availability of Amazon S3, these issues are no longer
a concern.
innovation is one that can be easily replicated by other organizations that back
up their Oracle databases to tape.
● Compute
● Content Delivery
● Database
● E-Commerce
● Messaging
● Monitoring
● Networking
● Storage
● Support
● Web Traffic
● Workforce
Compute
● Amazon Elastic Compute Cloud (EC2)
Amazon Elastic Compute Cloud delivers scalable, pay-as-you-go compute capacity in
92
the cloud.
● Amazon Elastic MapReduce
Amazon Elastic MapReduce is a web service that enables businesses, researchers,
data
analysts, and developers to easily and cost-effectively process vast amounts of data.
● Auto Scaling
Auto Scaling allows to automatically scale our Amazon EC2capacity up or down
according
to conditions we define.
Content Delivery
● Amazon CloudFront
Amazon CloudFront is a web service that makes it easy to distribute content with
low latency via a global network of edge locations.
Database
● Amazon SimpleDB
Amazon SimpleDB works in conjunction with Amazon S3 and AmazonEC2 to run
queries
on structured data in real time.
● Amazon ElastiCache
Amazon ElastiCache is a web service that makes it easy to deploy, operate, and
scale an in-
memory cache in the cloud.
E-Commerce
93
Workforce
● Amazon Mechanical Turk
Amazon Mechanical Turk enables companies to access thousands of global
workers on
demand and programmatically integrate their work into various business
processes.
Networking
● Amazon Route 53
Amazon Route 53 is a highly available and scalable Domain Name System
(DNS) web service.
● Amazon Virtual Private Cloud (VPC)
Amazon Virtual Private Cloud (Amazon VPC) lets you provision a
private, isolated section
of the Amazon Web Services (AWS) Cloud where we can launch AWS
resources in a virtual
network that you define. With Amazon VPC, we can define a virtual
network topology that closely resembles a traditional network that you
might operate in your own datacenter.
Web Traffic
● Alexa Web Information Service
Alexa Web Information Service makes Alexa‘s huge repository of data
about structure and
traffic patterns on the Web available to developers.
● By using Amazon SQS, developers can simply move data between distributed
components of their applications that perform different tasks, without losing
messages or requiring each component to be always available.
Conclusion:
Thus we have studied a case study on amazon web services.