The Case For Cloud Computing
The Case For Cloud Computing
C
loud computing doesn’t yet have a of available disk space, and moderate I/O perfor-
standard definition, but a good work- mance. The Eucalyptus system (http://eucalyp-
ing description of it is to say that clouds, tus.cs.ucsb.edu) is an open source option that
or clusters of distributed computers, provides on-demand computing instances and
provide on-demand resources and services over shares the same APIs as Amazon’s EC2 cloud.
a network, usually the Internet, with the scale Google’s MapReduce is an example of the sec-
and reliability of a data center. This article gives a ond category. Recent work gives a glimpse of
quick introduction to cloud computing. It covers it in action—in this case, researchers ran a
several different types of clouds, describes what’s benchmark on a cluster containing approximately
new about cloud computing, and discusses some 1,800 machines,1 each of which had two 2-GHz
of the advantages and disadvantages that clouds Intel Xeon processors, 4 Gbytes memory, and
offer. two 160-Gbyte IDE disks. The researchers used
Two different but related types of clouds are MapReduce on the cluster to run the TeraSort
those that provide computing instances on de- benchmark (http://research.microsoft.com/barc/
mand and those that provide computing capacity SortBenchmark), the goal of which is to sort 1010
on demand. Both use similar machines, but the 100-byte records (roughly 1 Tbyte of data). The
first is designed to scale out by providing addi- application required approximately 850 seconds
tional computing instances, whereas the second to complete on this cluster. The Hadoop system
is designed to support data- or compute-intensive (http://hadoop.apache.org/core) is an open source
applications via scaling capacity. system that implements MapReduce.
Amazon’s EC2 services (www.amazon.com/ Clouds that provide on-demand computing
ec2) are an example of the first category. A small instances can use these instances to supply soft-
EC2 computing instance costs US$0.10 per hour ware as a services (SaaS), such as Salesforce.com
and offers the approximate computing power of does with its product, or to provide a platform as
a 1.0- to 1.2-GHz 2007 Opteron or 2007 Xeon a service (PaaS), such as Amazon’s does with its
processor, with 1.7 Gbytes memory, 160 Gbytes EC2 product.
So, if you were to run the servers at 100 percent tiple customers, various issues related to multiple
utilization, buying the 100 servers is less expen- customers sharing the same piece of hardware
sive. However, if you were to run them at 75 per- can arise. For example, if one user’s application
cent utilization or less, using an on-demand style compromises the system, it can also compromise
of cloud would be less expensive. applications of other users that share the same
Of courses, these numbers are only estimates, system. Also, having data accessible to third par-
and I haven’t considered all costs, but even from ties (such as a cloud service provider) can present
this simple example, it’s clear that using a pay-as- security, compliance, and regulatory issues.
you-go utility computing model is preferable for
many use cases. Layered Services
A storage cloud provides storage services (block- or
Some Advantages and Disadvantages file-based); a data cloud provides data management
Cloud computing provides several important services (record-, column-, or object-based); and a
benefits over today’s dominant model in which an compute cloud provides computational services. Of-
enterprise purchases computers a rack at a time ten, they’re layered (compute services over data ser-
and operates them themselves. First, cloud com- vices over storage services) to create a stack of cloud
puting’s usage-based pricing model offers several services that acts as a computing platform for devel-
advantages, including reduced capital expense, a oping cloud-based applications; see Figure 1.
low barrier to entry, and the ability to scale up as
demand requires, as well as to support brief surges Parallel Computing over Clouds
in capacity. Second, cloud services enjoy the same At its core, MapReduce is a style of parallel pro-
economies of scale that data centers provide. By gramming supported by capacity-on-demand
providing services at the scale of a data center, it’s clouds. A good illustrating example of how some-
possible to provide operations, business continu- thing like MapReduce works is to compute an in-
ity, and security more efficiently than can be done verted index in parallel for a large collection of
when providing these services a rack at a time. For Web pages stored in a cloud.
this reason, the unit cost for cloud-based services Let’s assume that each node i in the cloud
is often lower than the cost if the services were stores Web pages pi,1, pi,2, pi,3, …, and that a Web
provided directly by the organization itself. Final- page pj contains words (terms) wj,1, wj,2, wj,3, … .
ly, cloud-computing architectures have proven to A basic but important structure in information
be very scalable—for example, cloud-based stor- retrieval is an inverted index, that is, a list
age services can easily manage a petabyte of data,
(w1; p1,1, p1,2, p1,3, …)
whereas managing this much data with a tradi-
tional database is problematic.
(w2; p2,1, p2,2, p2,3, …)
Of course, cloud computing has some disad-
vantages as well. First, because cloud services are
(w3; p3,1, p3,2, p3,3, …),
often remote (at least for hosted cloud services),
they can suffer the latency- and bandwidth-relat- where the list is sorted by the word wj, and asso-
ed issues associated with any remote application. ciated with each word wj is a list of all Web pages
Second, because hosted cloud services serve mul- pi containing that word.
computer.org/ITPro 25
Cloud Computing
MapReduce uses a programming model that for smaller companies, but a downside is two or
processes a list of <key, value> pairs to produce more organizations might share the same physi-
another list of <key’, value’> pairs. The initial list cal resource and not be aware of it.
of <key, value> pairs is distributed over the nodes For some cloud applications, security is still
in the cloud. In the map phase, each Web page pi somewhat immature. Hadoop, for example,
is processed independently on its local node to doesn’t currently have user-level authentication
produce an output list of multiple key-value pairs or access controls, although both are expected
<wj, pi>, one for word wj on the page. A partition in a later version. Fortunately, there’s no techni-
function h(wj) then assigns each key (a word wj cal difficulty per se in providing these tools for
in this example) a machine in the cloud for fur- clouds. Sector,5 which also provides on-demand
ther processing. This is called the shuffle phase computing capacity, offers authentication, autho-
and, in general, nodes involved in the computa- rization, and access controls and, as measured by
tion send data to other nodes involved in the com- the TeraSort benchmark, is faster than Hadoop
putation as determined by the partition function (http://sector.sourceforge.net).
h(wj). In the next phase—called the sort phase—
each node in the cloud sorts the key-value pairs Standards, Interoperability,
<wj, pi> according to the key wj. In the final phase— and Benchmarks
called the reduce phase—the key-value pairs with Organizations that develop cloud-based applica-
the same key wj are merged to create the inverted tions have an interest in frameworks that enable
applications to be ported easily from one cloud
to another and to interoperate with different
Third parties can take advantage of cloud-based services. For example, with an ap-
economies of scale to provide a level propriate interoperability framework, a cloud
application could switch from one provider to
of security that might not be cost- another offering lower cost or a greater range of
effective for smaller companies. cloud services.
Amazon’s APIs (www.aws.amazon.com) have
become the de facto standard for clouds that
index. So with MapReduce, the programmer de- provide on-demand instances. Cloud-based ap-
fines a Map and a Reduce function, whereas the plications that use this API enjoy portability and
system supplies the Shuffle and Sort functions.3 interoperability—for example, Eucalyptus uses
Let’s consider another example: log files that these APIs, and applications that run on Amazon’s
describe an entity’s usage of resources. It’s impor- EC2 service can in turn run on a Eucalyptus cloud.
tant to analyze log files to identify anomalies that Unfortunately, for clouds that provide on-demand
indicate whether a particular resource has been capacity, portability and interoperability are much
compromised. For small log files, this is easy to more problematic. Hadoop is by far the most
do with a database, but, as the size of the log files prevalent system that provides on-demand capac-
grows, it’s difficult to manage them with just a da- ity, but, for instance, it isn’t straightforward for a
tabase. However, clouds can easily manage even Hadoop MapReduce application to run on another
very large collections of log files, and MapReduce- on-demand capacity cloud written in C++.5
style computations can easily identify anomalous Although it might be too early yet for stan-
patterns indicative of compromises. dards to fully emerge, several organizations
are attempting them, including an effort by the
Security Cloud Computing Interoperability Forum (www.
Security is an area of cloud computing that pres- cloudforum.org/) and by the Open Cloud Consor-
ents some special challenges. For hosted clouds, tium (www.opencloudconsortium.org). Service-
the first challenge is simply that a third party based frameworks for clouds have also recently
is responsible both for storing the data and se- debuted—for example, Thrift is a software frame-
curing it. On the positive side, third parties can work for scalable cross-language services devel-
take advantage of economies of scale to provide a opment that relies on a code-generation engine
level of security that might not be cost-effective (http://incubator.apache.org/thrift). Thrift makes
W
for IT Professional magazine?
ith cloud computing, the “unit of com-
puting” has moved from a single com- Visit www.computer.org/itpro
puter or rack of computers to a data cen- and click on “write for IT Pro.”
ter of computers. To say it simply, the unit of com-
computer.org/ITPro 27