0% found this document useful (0 votes)
37 views5 pages

The Case For Cloud Computing

The document discusses cloud computing and describes two types of clouds: those that provide on-demand computing instances and those that provide on-demand computing capacity. It provides examples like Amazon EC2 and Google's MapReduce. The document also discusses advantages like elastic pricing and simplicity compared to previous methods.

Uploaded by

Luis Sousa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views5 pages

The Case For Cloud Computing

The document discusses cloud computing and describes two types of clouds: those that provide on-demand computing instances and those that provide on-demand computing capacity. It provides examples like Amazon EC2 and Google's MapReduce. The document also discusses advantages like elastic pricing and simplicity compared to previous methods.

Uploaded by

Luis Sousa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Cloud Computing

The Case for


Cloud Computing
Robert L. Grossman
University of Illinois at Chicago and Open Data Group

To understand clouds and cloud computing, we must first understand


the two different types of clouds. The author distinguishes between
clouds that provide on-demand computing instances and those that
provide on-demand computing capacity.

C
loud computing doesn’t yet have a of available disk space, and moderate I/O perfor-
standard definition, but a good work- mance. The Eucalyptus system (http://eucalyp-
ing description of it is to say that clouds, tus.cs.ucsb.edu) is an open source option that
or clusters of distributed computers, provides on-demand computing instances and
provide on-demand resources and services over shares the same APIs as Amazon’s EC2 cloud.
a network, usually the Internet, with the scale Google’s MapReduce is an example of the sec-
and reliability of a data center. This article gives a ond category. Recent work gives a glimpse of
quick introduction to cloud computing. It covers it in action—in this case, researchers ran a
several different types of clouds, describes what’s benchmark on a cluster containing approximately
new about cloud computing, and discusses some 1,800 machines,1 each of which had two 2-GHz
of the advantages and disadvantages that clouds Intel Xeon processors, 4 Gbytes memory, and
offer. two 160-Gbyte IDE disks. The researchers used
Two different but related types of clouds are MapReduce on the cluster to run the TeraSort
those that provide computing instances on de- benchmark (http://research.microsoft.com/barc/
mand and those that provide computing capacity SortBenchmark), the goal of which is to sort 1010
on demand. Both use similar machines, but the 100-byte records (roughly 1 Tbyte of data). The
first is designed to scale out by providing addi- application required approximately 850 seconds
tional computing instances, whereas the second to complete on this cluster. The Hadoop system
is designed to support data- or compute-intensive (http://hadoop.apache.org/core) is an open source
applications via scaling capacity. system that implements MapReduce.
Amazon’s EC2 services (www.amazon.com/ Clouds that provide on-demand computing
ec2) are an example of the first category. A small instances can use these instances to supply soft-
EC2 computing instance costs US$0.10 per hour ware as a services (SaaS), such as Salesforce.com
and offers the approximate computing power of does with its product, or to provide a platform as
a 1.0- to 1.2-GHz 2007 Opteron or 2007 Xeon a service (PaaS), such as Amazon’s does with its
processor, with 1.7 Gbytes memory, 160 Gbytes EC2 product.

1520-9202/09/$25.00 © 2009 IEEE Published by the IEEE Computer Society computer.org/ITPro 23


Cloud Computing

What’s New? ganization that provides cloud services to a va-


Now that we’ve covered the basics of cloud com- riety of third-party clients using the same cloud
puting, it’s important to understand what’s new resources. Google, for example, uses GFS,2
about it. On-demand services and resources have MapReduce,3 and BigTable4 internally as part
been available over the Internet for some time, of its private cloud services; at the time of this
but today’s increased focus on cloud computing writing, these services weren’t available to third
is due to three important differences: parties. In contrast, hosted cloud services such
as Amazon’s EC2, S3, and SimpleDB are open to
• Scale. Some companies that rely on cloud com- anyone with a credit card, even at 3 a.m.
puting have infrastructures that scale over sev- It’s important to note that Google uses its pri-
eral (or more) data centers. vate cloud to provide hosted-cloud-based applica-
• Simplicity. Prior to cloud-based computing ser- tions, such as its email and office-based services,
vices, writing code for high-performance and to regular outside users.
distributed computing was relatively compli-
cated and usually required working with grid- Elastic, Usage-Based Pricing
based services, developing code that explicitly Cloud computing is usually offered with a
passed messages between nodes, and employ- usage-based model in which you pay for just the
ing other specialized methods. Although sim- cloud resources that a particular computation
plicity is in the eye of the beholder, most people requires. Computations that require additional
feel that the cloud-based storage service APIs resources simply request them from the cloud
and MapReduce-style computing APIs are rel- (up to the cloud’s overall capacity). Sometimes,
atively simple compared to previous methods. the terms elastic or utility computing are used to de-
• Pricing. Cloud computing is often offered with a scribe this ability of a cloud to provide additional
pricing model that lets you pay as you go and for resources when required. Amazon’s S3 and EC2
just the services that you need. For example, if use this pricing model.
you need an additional 1,000 computing instanc- Organizations, therefore, have several options
es for an hour, you pay just for these 1,000 com- for obtaining cloud services, including running
puting instances and just for the hour that you their own private clouds or buying cloud services
use them. No capital expenditure is required. from a third party using the elastic, usage-based
pricing model. This type of pricing offers two
The impact has been revolutionary—by using the other important advantages as well:
Google File System (GFS) and MapReduce, or the
Hadoop Distributed File System with its imple- • It doesn’t require up-front investments; in-
mentation of MapReduce, it’s relatively easy for a stead, as an on-demand service, users pay for
project to perform a computation over 10 Tbytes capacity as they need it.
of data using 1,000 nodes. Until recently, this • It lets users access capacity exactly when they
would have been out of reach for most projects. need it. For Web 2.0 applications, this means
Several other changes and improvements have that the model can support 100 users one day
raised cloud computing’s profile as well. and 10,000 the next.

Private vs. Hosted Clouds To get a better understanding of utility com-


The management, cost, and security of clouds puting, let’s assume that you have a requirement
depend on whether an organization chooses to to operate 100 servers over the course of three
buy and operate its own cloud or to obtain cloud years. One option is to lease them at $0.40 per
services from a third party. A private cloud is de- instance-hour, which would cost approximately
voted to a single organization’s internal use; it
might be run by the organization itself or out- 100 servers * $0.40 instance-hour * 3 years *
sourced to a third party to operate. Similarly, a 8,760 hours/year = $1,051,200.
private cloud might be owned by the organization
itself or leased by the organization. In contrast, Another option is to buy them. Let’s assume the
a public or hosted cloud is managed by another or- cost to buy each server is $1,500, that you need

24 IT Pro March/April 2009


Application Application

two staff members at $100,000 per year to ad- Compute cloud


minister the servers, and that the servers require
Data cloud
150 watts each, with the cost of electricity at
$0.10 per kilowatt-hour, bringing the yearly cost Storage cloud
to operate the 100 servers to $13,140. This op-
tion would cost approximately
Figure 1. Layered model. Some clouds that
100 servers * $1,500 + 3 years * $13,140 electric- provide on-demand computing capacity use
ity/year + 3 years * 2 staff * $100,000 salary/year layers of services, forming a stack of cloud
= $789,420. services.

So, if you were to run the servers at 100 percent tiple customers, various issues related to multiple
utilization, buying the 100 servers is less expen- customers sharing the same piece of hardware
sive. However, if you were to run them at 75 per- can arise. For example, if one user’s application
cent utilization or less, using an on-demand style compromises the system, it can also compromise
of cloud would be less expensive. applications of other users that share the same
Of courses, these numbers are only estimates, system. Also, having data accessible to third par-
and I haven’t considered all costs, but even from ties (such as a cloud service provider) can present
this simple example, it’s clear that using a pay-as- security, compliance, and regulatory issues.
you-go utility computing model is preferable for
many use cases. Layered Services
A storage cloud provides storage services (block- or
Some Advantages and Disadvantages file-based); a data cloud provides data management
Cloud computing provides several important services (record-, column-, or object-based); and a
benefits over today’s dominant model in which an compute cloud provides computational services. Of-
enterprise purchases computers a rack at a time ten, they’re layered (compute services over data ser-
and operates them themselves. First, cloud com- vices over storage services) to create a stack of cloud
puting’s usage-based pricing model offers several services that acts as a computing platform for devel-
advantages, including reduced capital expense, a oping cloud-based applications; see Figure 1.
low barrier to entry, and the ability to scale up as
demand requires, as well as to support brief surges Parallel Computing over Clouds
in capacity. Second, cloud services enjoy the same At its core, MapReduce is a style of parallel pro-
economies of scale that data centers provide. By gramming supported by capacity-on-demand
providing services at the scale of a data center, it’s clouds. A good illustrating example of how some-
possible to provide operations, business continu- thing like MapReduce works is to compute an in-
ity, and security more efficiently than can be done verted index in parallel for a large collection of
when providing these services a rack at a time. For Web pages stored in a cloud.
this reason, the unit cost for cloud-based services Let’s assume that each node i in the cloud
is often lower than the cost if the services were stores Web pages pi,1, pi,2, pi,3, …, and that a Web
provided directly by the organization itself. Final- page pj contains words (terms) wj,1, wj,2, wj,3, … .
ly, cloud-computing architectures have proven to A basic but important structure in information
be very scalable—for example, cloud-based stor- retrieval is an inverted index, that is, a list
age services can easily manage a petabyte of data,
(w1; p1,1, p1,2, p1,3, …)
whereas managing this much data with a tradi-
tional database is problematic.
(w2; p2,1, p2,2, p2,3, …)
Of course, cloud computing has some disad-
vantages as well. First, because cloud services are
(w3; p3,1, p3,2, p3,3, …),
often remote (at least for hosted cloud services),
they can suffer the latency- and bandwidth-relat- where the list is sorted by the word wj, and asso-
ed issues associated with any remote application. ciated with each word wj is a list of all Web pages
Second, because hosted cloud services serve mul- pi containing that word.

computer.org/ITPro 25
Cloud Computing

MapReduce uses a programming model that for smaller companies, but a downside is two or
processes a list of <key, value> pairs to produce more organizations might share the same physi-
another list of <key’, value’> pairs. The initial list cal resource and not be aware of it.
of <key, value> pairs is distributed over the nodes For some cloud applications, security is still
in the cloud. In the map phase, each Web page pi somewhat immature. Hadoop, for example,
is processed independently on its local node to doesn’t currently have user-level authentication
produce an output list of multiple key-value pairs or access controls, although both are expected
<wj, pi>, one for word wj on the page. A partition in a later version. Fortunately, there’s no techni-
function h(wj) then assigns each key (a word wj cal difficulty per se in providing these tools for
in this example) a machine in the cloud for fur- clouds. Sector,5 which also provides on-demand
ther processing. This is called the shuffle phase computing capacity, offers authentication, autho-
and, in general, nodes involved in the computa- rization, and access controls and, as measured by
tion send data to other nodes involved in the com- the TeraSort benchmark, is faster than Hadoop
putation as determined by the partition function (http://sector.sourceforge.net).
h(wj). In the next phase—called the sort phase—
each node in the cloud sorts the key-value pairs Standards, Interoperability,
<wj, pi> according to the key wj. In the final phase— and Benchmarks
called the reduce phase—the key-value pairs with Organizations that develop cloud-based applica-
the same key wj are merged to create the inverted tions have an interest in frameworks that enable
applications to be ported easily from one cloud
to another and to interoperate with different
Third parties can take advantage of cloud-based services. For example, with an ap-
economies of scale to provide a level propriate interoperability framework, a cloud
application could switch from one provider to
of security that might not be cost- another offering lower cost or a greater range of
effective for smaller companies. cloud services.
Amazon’s APIs (www.aws.amazon.com) have
become the de facto standard for clouds that
index. So with MapReduce, the programmer de- provide on-demand instances. Cloud-based ap-
fines a Map and a Reduce function, whereas the plications that use this API enjoy portability and
system supplies the Shuffle and Sort functions.3 interoperability—for example, Eucalyptus uses
Let’s consider another example: log files that these APIs, and applications that run on Amazon’s
describe an entity’s usage of resources. It’s impor- EC2 service can in turn run on a Eucalyptus cloud.
tant to analyze log files to identify anomalies that Unfortunately, for clouds that provide on-demand
indicate whether a particular resource has been capacity, portability and interoperability are much
compromised. For small log files, this is easy to more problematic. Hadoop is by far the most
do with a database, but, as the size of the log files prevalent system that provides on-demand capac-
grows, it’s difficult to manage them with just a da- ity, but, for instance, it isn’t straightforward for a
tabase. However, clouds can easily manage even Hadoop MapReduce application to run on another
very large collections of log files, and MapReduce- on-demand capacity cloud written in C++.5
style computations can easily identify anomalous Although it might be too early yet for stan-
patterns indicative of compromises. dards to fully emerge, several organizations
are attempting them, including an effort by the
Security Cloud Computing Interoperability Forum (www.
Security is an area of cloud computing that pres- cloudforum.org/) and by the Open Cloud Consor-
ents some special challenges. For hosted clouds, tium (www.opencloudconsortium.org). Service-
the first challenge is simply that a third party based frameworks for clouds have also recently
is responsible both for storing the data and se- debuted—for example, Thrift is a software frame-
curing it. On the positive side, third parties can work for scalable cross-language services devel-
take advantage of economies of scale to provide a opment that relies on a code-generation engine
level of security that might not be cost-effective (http://incubator.apache.org/thrift). Thrift makes

26 IT Pro March/April 2009


it easier for cloud-based applications to access puting is now the data center. Not only has cloud
different storage clouds, such as Hadoop and Sec- computing scaled computing to the data center,
tor. A common language could also help by provid- but it has also introduced software, systems, and
ing an interoperable way for applications to access programming models that significantly reduce the
compute services across several different clouds; so complexity of accessing and using these resources.
far, several people have attempted to provide a lan- Just as significant, with elastic, usage-based pricing
guage for MapReduce-style parallel programming, models, an individual or organization pays for just
including some that extend SQL in a way that sup- those computing instances or computing capac-
ports this style of programming, but no single lan- ity that it requires and only when it requires them.
guage has emerged as the clear winner yet. Truly, this is revolutionary.
A closely related challenge is creating a standard
that would enable different clouds to interoper- References
ate. Perhaps the Internet’s infancy could guide 1. J. Dean and S. Ghemawat, “MapReduce: Simplified
this type of effort—at that time, any organization Data Processing on Large Clusters,” Comm. ACM, vol.
that wanted a network set up its own, so sending 51, no. 1, 2008, pp. 107–113.
data between networks was quite difficult. The 2. S. Ghemawat, H. Gobioff, and S.-T. Leung, “The
introduction of TCP and related Internet proto- Google File System,” Proc. 19th ACM Symp. Operating
cols and standards remedied this situation, but Systems Principles, ACM Press, 2003, pp. 29–43.
many companies with network products resisted 3. J. Dean and S. Ghemawat, “MapReduce: Simplified
them for some time. Today, we’re in a somewhat Data Processing on Large Clusters,” Proc. 6th Symp.
analogous position: although cloud service pro- Operating System Design and Implementation, Usenix As-
viders have pushed back a bit on standardizing, soc., 2004, pp. 137–150.
the ability of different clouds to interoperate 4. F. Chang et al., “Bigtable: A Distributed Storage Sys-
easily would enable an interesting new class of tem for Structured Data,” Proc. 7th Symp. Operating
applications. System Design and Implementation, Usenix Assoc., 2006,
As with standards and a common language, pp. 205–218.
cloud computing doesn’t yet have well-estab- 5. R.L. Grossman and Y. Gu, “Data Mining Using
lished benchmarks. The most common meth- High-Performance Clouds: Experimental Studies
od for measuring cloud performance to date is Using Sector and Sphere,” Proc. 14th ACM SIGKDD
the TeraSort benchmark. For clouds that pro- Int’l Conf. Knowledge Discovery and Data Mining, ACM
vide on-demand instances, a recent benchmark Press, 2008, pp. 920–927.
called Cloudstone has emerged.6 Cloudstone is 6. W. Sobel et al., “Cloudstone: Multi-Platform Multi-
a toolkit consisting of an open source Web 2.0 Language Benchmark and Measurement Tools for
social application, a set of tools for generating Web 2.0,” Proc. Cloud Computing and Its Applications,
workloads, a set of tools for performance moni- 2008; www.cca08.org/papers.php.
toring, and a recommended methodology for
computing a metric that quantifies the dollars Robert L. Grossman is a professor of mathematics, sta-
per user per month that a given cloud requires. tistics, and computer science at the University of Illinois
For clouds that provide on-demand capacity, a at Chicago and managing partner at Open Data Group.
recent benchmark called MalStone (code.google. His research interests include data mining and analytics,
com/p/malgen/) has emerged as well. MalStone distributed computing, and high-performance networking.
is based on the log file example of a MapReduce Grossman has a PhD in applied mathematics from Princ-
computation I described earlier. It includes code eton University. Contact him at grossman at uic.edu.
to generate synthetic events and a recommended
MapReduce computation.
Interested in writing an article

W
for IT Professional magazine?
ith cloud computing, the “unit of com-
puting” has moved from a single com- Visit www.computer.org/itpro
puter or rack of computers to a data cen- and click on “write for IT Pro.”
ter of computers. To say it simply, the unit of com-

computer.org/ITPro 27

You might also like