IEEE Paper A Study of Cloud Computing en
IEEE Paper A Study of Cloud Computing en
Abstract
High performance applications requires high processing power reasonable metered cost. Cloud also having large data
to compute highly intensive and complex applications for centres which is suitable for data-intensive applications.
research, engineering, medical and academic projects. In the
traditional way, an organization will have to pay very high II. HPC APPLICATIONS IN CLOUD
costs to run an HPC (High Performance computing)
application. The organization has to purchase highly expensive With increasing demand for high performance, efficiency,
hardware for running an HPC application and maintaining it productivity, agility and lower cost, since several years,
afterwards. The HPC resources on the company premises may information communication technology are dramatically
not satisfy all the demands of scientific application where changing from static silos with manually managing
resources may not suitable for corresponding requirements. resources and applications, towards dynamic virtual
Considering the case of SMEs (small and medium enterprises), environments with automated and shared services, i.e. from
an increasing demand is always challenging. Cloud computing
silo-oriented to service oriented architectures[15]. A
is an on-demand, pay-as-you-go- model, that offers scalable
computing resources, unlimited storage in an instantly “traditional” cloud offers features that are attractive to the
available way. In this paper we included requirements of HPC general public. These services comprise single, loosely
applications in cloud, cluster based HPC applications, types of coupled instances (an instance of an OS running in a virtual
clusters, Google’s HPC Cloud architecture, performance environment) and storage systems backed by service level
analysis of various HPC cloud vendors and four case studies of agreements (SLAs) that provide the end user guaranteed
HPC applications in cloud. levels of service. These clouds offer the following features:
Keywords: Cloud, HPC, SMEs, EC2, POD Cloud, R-HPC
• Instant availability – Cloud offers almost instant
availability of resources.
I. INTRODUCTION • Large capacity – Users can instantly scale the
number of applications within the cloud.
High Performance Computing (HPC) plays an important
Software choice – Users can design instances to
role in both scientific advancement and economic
•
suit their needs from the OS up.
competitiveness of a nation - making production of
scientific and industrial solutions faster, less expensive, and • Virtualized – Instances can be easily moved to and
of higher quality. HPC is a key component in many from similar clouds.
applications: designing vehicles and airplanes; designing • Service-level performance – Users are guaranteed a
high-rise buildings and bridges; discovery of drugs; certain minimal level of performance.
discovery and extraction of new energy sources like oil and
Although these features serve much of the market, HPC
natural gas; weather forecasting; and many more. HPC
users generally have a different set of requirements:
requires very high processing power to compute largely
complex scientific applications. The advantage of pay-as-
you-go computing has been an industry goal for many years • Close to the “metal” – Many man-years have been
starting from main frame, cluster and grid computing. Cloud invested in optimizing HPC libraries and
computing takes grid computing to a whole new level by applications to work closely with the hardware,
using virtualization to encapsulate an operating system thus requiring specific OS drivers and hardware
instance and run it in a cloud whenever they need support.
computational resources. In addition cloud storage can also • User space communication – HPC user
be used independently with operating system instances. applications often need to bypass the OS kernel
Cloud computing also offers unlimited storage and instantly and communicate directly with remote user
available and scalable computing resources, all at a processes.
• Tuned hardware – HPC hardware is often selected Clearly, this computing environment is not found in a
on the basis of communication, memory, and typical cloud and is unique to the HPC market. Attempting
processor speed for a given application set. to run this level of application on a typical cloud will
• Tuned storage – HPC storage is often designed for provide sub-par performance.
a specific application set and user base.
• Batch scheduling – All HPC systems use a batch Finally, any remote computation scheme needs to address
scheduler to share limited resources. the “moving big data problem.” Many HPC applications
require large amounts of data. Many clouds, even those that
Depending on the user’s application domain, these two offer HPC features, cannot solve the problem easily. In
feature sets can make a big difference in performance. For particular, if the time to move large datasets to the cloud
example, applications that require a single node (or threaded outweighs the computation time, then the cloud solution is
applications) can work in a cloud. In this case, the user now the slow solution. Interestingly, the fastest way to
might have a single program that must be run with a wide move data in these cases is with a hard disk and an
range of input parameters (often called parametric overnight courier. (It seems the station wagon full of tapes is
processing), or they might have dataflow jobs, such as the still the fastest way to transport data.)
Galaxy suite used in biomedical research. These types of
applications can benefit from most cloud computing With all the differences between the traditional cloud and
resources. HPC applications, users will be interested to know that HPC
clouds and cloud-like resources are available. A number of
Some applications can utilize highly parallel systems but do companies, including Penguin, R-HPC, Amazon, Univa,
not require a high-performance interconnect or fast storage. SGI, Sabalcore, and Gompute offer specialized HPC clouds.
One often cited example is digital rendering, in which many Notably absent is IBM who, at this time, does not offer
non-interacting jobs can be spawned across a large number public HPC clouds. The company, however, does provide
of nodes with almost perfect scalability. These applications many options for constructing internal or private.
often work well with standard Ethernet and do not require a
specialized interconnect for high performance. III. HPC IN CLUSTERS
Moving up the HPC tree, you’ll find interconnect-sensitive A high performance computer appropriate for most small
applications that require low latency and high throughput and medium-sized businesses today is built from what are
interconnects not found in the traditional cloud. Indeed, basically many ordinary computers connected together with
most of these interconnects (e.g., InfiniBand and High- a network and centrally coordinated by some special
Performance Ethernet) require “userspace” communication software. Because the computers are usually physically very
pathways that do not involve the OS kernel. This method close together, the common term for a high performance
makes the use of cloud virtualization very difficult because computer today is a cluster[5]. The basic structure of a
most virtualization schemes cannot manage “kernel by- cluster based computing is shown in the figure
pass” applications (i.e., these are “on the wire” data
transfers that are hard to virtualize). If high-performance THE BASIC CLUSTER
networks are not available, many HPC applications run
slowly and suffer from poor scalability (i.e., they see no
performance gain when adding nodes).
Also in the tree are many I/O-sensitive applications that,
without a very fast I/O subsystem, will run slowly because
of storage bottlenecks. To open up these bottlenecks, most
HPC systems employ parallel file systems that drastically
increase the I/O bandwidth of computing nodes.
Another growing branch includes performance accelerators
or SIMD units (parallel computing Single-Instruction
Multiple Data processors) from NVidia and AMD/ATI. This
type of hardware is very specific to HPC systems and
therefore is not found on typical cloud hardware.
At the top of the tree are applications that push on all levels High-Performance Computing (HPC) clusters are
of performance (compute, interconnect, and storage). These characterized by many cores and processors, lots of
applications require fast computation (possible a SIMD memory, high-speed networking, and large data stores – all
unit), fast interconnects, and high-performance storage. shared across many rack-mounted servers. User programs
that run on a cluster are called jobs, and they are typically High-performance clusters
managed through a queuing system for optimal utilization of
all available resources. An HPC cluster is made of many • These clusters are used to run parallel programs for
separate servers, called nodes, possibly filling an entire data time-intensive computations and are of special
center with dozens of power-hungry racks. HPC typically
involves simulation of numerical models or analysis of data interest to the scientific community. They
from scientific instrumentation. At the core of HPC is commonly run simulations and other CPU-
manageable hardware and systems software wrangled by intensive programs that would take an inordinate
systems programmers, which allow researchers to devote amount of time to run on regular hardware.
their energies to their code.
A successful HPC cluster is a powerful asset for an Some features of clusters are as follows:
organization. At the same time, these powerful racks present
a multifaceted resource to manage. If not properly managed,
software complexity, cluster growth, scalability, and system • Clusters are built using commodity hardware and
heterogeneity can introduce project delays and reduce the cost a fraction of the vector processors. In many
overall productivity of an organization. cases, the price is lower by more than an order of
A successful HPC cluster requires administrators to magnitude.
provision, manage, and monitor an array of hardware and • Clusters use a message-passing paradigm for
software components communication, and programs have to be explicitly
Clusters are the predominant type of HPC hardware these coded to make use of distributed hardware.
days; a cluster is a set of MPPs(In clusters, also known as • With clusters, you can add more nodes to the
massively parallel processors (MPPs), they don't share the cluster based on need.
same memory;). A processor in a cluster is commonly • Open source software components and Linux lead
referred to as a node and has its own CPU, memory, to lower software costs.
operating system, and I/O subsystem and is capable of • Clusters have a much lower maintenance cost (they
communicating with other nodes. take up less space, take less power, and need less
cooling).
Fail-over clusters
Load-balancing clusters
The compute portion of the HPC cluster consists of a Head
Node running scheduling and management software on a
Load-balancing clusters are commonly used for
Google Compute Engine VM. The compute/worker nodes
•
busy Web sites where several nodes host the same are also running on Google Compute Engine VMs.
site, and each new request for a Web page is Instances sizes can be selected to match the workload.
dynamically routed to a node with a lower load. Choices include Standard, High Memory or High CPU
instances in 1, 2, 4, 8 or 16 core sizes. Instances can also be
added or deleted depending on the resources needed. The
user has a choice of various commercial packages or open high-performance storage systems also can be provided.
source software components to create the cluster. Finally, dedicated storage servers are available. These
systems can isolate data and facilitate encryption/decryption
Compute Engine VMs can also be used to create a file of high volumes of data by using physical shipping rather
system for the cluster. Two popular options are NFS and than Internet transfer.
Gluster. This is an optional component as the Compute
Nodes can also access Google Cloud Storage directly. POD offers a suite of tools to help manage your
computation. Aptly called PODTools, Penguin offers a
Google Cloud Storage provides the backend storage for the collection of command-line utilities for interacting with
cluster. This is a durable, highly available storage option their HPC cloud. Beyond the standard SSH login, POD
making it an excellent choice for HPC work. Google Cloud Tools provide the ability to submit jobs, transfer data, and
SQL is also available for structured input or output data. generate reports. Additionally, Penguin POD can be
The input data can be uploaded by the client directly into seamlessly integrated into existing on-site clusters to
Cloud Storage or uploaded with the job. The resulting data outsource excess workloads – often known as “cloud
can be downloaded to the client or left in the cloud for bursting.” All these capabilities are encrypted and offer a
storage or further processing. high level of security.
V. PERFORMANCE ANALYSIS OF VARIOUS HPC Perhaps Penguin’s biggest asset is a long history of
CLOUD VENDORS delivering on-site HPC solutions. This experience has
allowed them to develop a staff of industry domain experts.
V.1 PENGUIN COMPUTING They also have long list of additional services that
supplement their POD offering. These include on-premises
provision of cloud bursting to the POD cloud, remote
One the first vendors to introduce a true HPC cloud was management of on-premises HPC services, cloud migration
Penguin Computing[8]. The Penguin On Demand, or POD services, private remote HPC as service environments, and
cloud, was one of the first remote HPC services. From the private internal clouds.
beginning, POD has been a bare-metal compute model
similar to an in-house cluster. Each user is given a
virtualized login node that does not play a role in code V.2 R-HPC
execution. The standard compute node has a range of
options, including dual four-core Xeon, dual six-core Xeon, R-HPC[9] offers R-Cloud, wherein clients can “rent” HPC
or quad 12-core AMD processors ranging in speed from 2.2 resources. R-Cloud offers two distinct computing
to 2.9GHz with 24 to 128GB of RAM per server and up to environments. The first is a Shared Cluster, which offers a
1TB of scratch local storage per node. login to shared nodes and a work queue. This environment
offers a classic cluster environment and is essentially a
Getting applications running POD HPC clouds can be quite “shared cluster in the sky.” Users are billed by the job,
simple, because Penguin has more than 150 commercial and creating a pay-as-you go HPC service. No support or
open source applications installed and ready to run on the administration services are provided. The second
system. Installing other applications is straightforward and environment comprises virtual private clusters that are
available to users. Nodes with two NVidia Tesla C2075 carved out of a shared configuration. Use can be on-demand
computing processors are available. with VLAN access. These systems are billed on a 24/7
basis.
In terms of network, POD nodes are connected via
nonvirtualized, low-latency 10Gb Ethernet (GigE) or QDR R-HPC can provide new 3.4GHz quad core Sandy Bridge-
IB networks. The network topology is local to ensure based systems with 16GB of RAM/node (4GB/core), DDR
maximum bandwidth and minimum latency between nodes. 2:1 blocking InfiniBand, and 1TB of local disk.
Storage systems are made available via 10GigE to the local Additionally, they have dual-socket 2.6GHz eight-core
compute cluster. Additionally, POD has redundant high- Sandy Bridge with 128GB of RAM/node (8GB/core), QDR
speed Internet with remote connectivity ranging from non-blocking InfiniBand, 1TB of local storage, and 1TB
50Mbps to 1Gbps. global storage. These offerings are rounded out by Magny-
Cours, Nehalem, and Harpertown systems. GPU-based
systems in beta test are provided for dedicated users.
Several storage options are also available, starting with
high-speed NFS using 10GigE attached storage. Beyond
NFS, there are parallel filesystem options attached via Most applications can be set up and running within one day
multiple 10GigE links and InfiniBand. Lustre and Panasas (although R-HPC notes that licensing issues can delay the
process for some users). Similar to Penguin’s products, all
the interconnects, which include DDR, QDR, FDR, and Amazon does not charge for data transferred into EC2 but
GigE, are run on the wire with no OS virtualization layer. has a varying rate schedule for transfer out of the cloud;
Storage options include 10 GigE attached NFS/SMB with additionally, there are EC2 storage costs. Therefore, the
Lustre over IB as a possible upgrade. If ultimate storage total cost depends on compute time, total data storage, and
performance is needed, R-HPC also offers the Kove RAM transfer. Once created, the instances must be provisioned
disk storage array. All dedicated systems have block storage and configured to work as a cluster by the user.
for security, whereas the shared clusters use shared NFS (no
private mounts). VI. HPC CLOUD CASE STUDIES
R-HPC will make service level agreements on a case-by- VI.1 HPC APPLICATIONS ON AMAZON
case basis depending on the customers needs. In terms of
workflow management, Torque/OpenPBS is the most Amazon Web Services (AWS) is Amazon’s cloud
common scheduler; however, Maui and Grid Engine (and computing platform, with Amazon Elastic Compute Cloud
derivatives) can be provided as needed. Interestingly, cloud (EC2)[1] as its central part, first announced as beta in
bursting, although possible with R-HPC systems, is almost August 2006. Users can rent Virtual Machines (VMs) on
never requested by customers. Another interesting aspect of which they run their applications. EC2 allows scalable
R-HPC offerings includes Windows HPC environments. deployment of applications by providing a web service
through which a user can boot an Amazon Machine
R-HPC offers performance tuning and remote Image(AMI) to create a virtual machine, which Amazon
administration services as well. They have extensive calls an “instance”, containing any software desired. A user
experience in HPC and can provide “tuned” application- can create, launch, and terminate server instances as needed
specific private clusters for clients. and paying by the hour for active servers. EC2 provides
users with control over the geographical location of
V.3 Amazon EC2 HPC instances which allows for latency optimization and high
levels of redundancy.
Perhaps the most well-known cloud provider is Amazon[1].
Inquiries to Amazon were not returned, so information was VI.2 NAS Parallel Benchmark on Amazon EC2
gleaned from their web page. Originally, the EC2 service
was found not suitable for many HPC applications. Amazon In order to find out if and how clouds are suitable for HPC
has since created dedicated “cluster instances” that offer applications, Ed Walker (Walker 2008) run an HPC
better performance to HPC users. Several virtualized HPC benchmark on Amazon EC2[1]. He used several macro and
instances are available on the basis of users’ needs. Their micro benchmarks to examine the “delta” between clusters
first offering is two Cluster Compute instances that provide composed of state-of-the-art CPUs from Amazon EC2
a very large amount of CPU coupled with increased network versus an HPC cluster at the National Center for
performance (10GigE). Instances come in two sizes, a Supercomputing Applications (NCSA). He used the NAS
Nehalem-based “Quadruple Extra Large Instance” (eight Parallel Benchmarks (NAS 2010) to measure the
cores/node, 23GB of RAM, 1.7TB of local storage) and a performance of these clusters for frequently occurring
Sandy Bridge-based “Eight Extra Large Instance” (16 scientific calculations. Also, since the Message-Passing
cores/node, 60.5GB of RAM, 3.4TB of local storage). Interface (MPI) library is an important programming tool
used widely unscientific computing, his results demonstrate
the MPI performance in these clusters by using the mpptest
Additionally, Amazon offers two other specialized
micro benchmark. For his benchmark study on EC2 he use
instances. The first is a Cluster GPU instance that provides
the high-CPU extra large instances provided by the EC2
two NVidia Tesla Fermi M2050 GPUs with proportionally
service. The NAS Parallel Benchmarks (NPB 2010)
high CPU and 10GigE network performance. The second is
comprise a widely used set of programs designed to evaluate
a high-I/O instance that provides two SSD-based volumes,
the performance of HPC systems. The core benchmark
each with 1024GB of storage.
consists of eight programs: five parallel kernels and three
simulated applications. In aggregate, the benchmark suite
Thus, using the small usage case (80 cores, 4GB of RAM mimics the critical computation and data movement
per core, and basic storage of 500GB) would cost US$ involved in computational fluid dynamics and other
24.00/hour (10 Eight Extra Large Instances). The larger “typical” scientific computation. Research from Ed Walker
usage case (256 cores, 4GB of RAM per core, and 1TB of (2008) about the runtimes of each of the NPB programs in
fast global storage) would cost US$ 38.4/hour (16 Eight the benchmark shows a performance degradation of
Extra Large Instances). approximately 7%–21% for the programs running on the
EC2 nodes compared to running them on the NCSA cluster
compute node. Further results and an in-depth analysis than is available in a single high-end desktop machine
showed that message-passing latencies and bandwidth are typically a quad core processor with 4-8 GB of RAM,
an order of magnitude inferior between EC2 compute nodes supplying approximately 20 Gigaflops. Therefore, they
compared to between compute nodes on the NCSA cluster. spread the calculation across machines. In order to solve
Walker (2008) concluded that substantial improvements linear systems of equations they need to be able to access all
could be provided to the HPC scientific community if a of the elements of the array even when the array is spread
high-performance network provisioning solution can be across multiple machines. This problem requires significant
devised for this problem. amounts of network communication, memory access, and
CPU power. They scaled up to a cluster in EC2, giving them
VI.3 LINPACK Benchmark on Amazon Cluster Compute the ability to work with larger arrays and to perform
Instances calculations at up to 1.3 Teraflops, a 60X improvement.
They were able to do this without making any changes to
In July 2010, Amazon announced its Cluster Compute the application code. Each Cluster Compute instance runs 8
Instances (CCI 2010) [1] specifically designed to combine workers (one per processor core on 8 cores per instance).
high compute performance with high performance network Each doubling of the worker count corresponds to a
capability to meet the needs of HPC applications. Unique to doubling of the number of Cluster Computer instances used
Cluster Compute instances is the ability to group them into (scaling from 1 up to 32 instances). They saw near-linear
clusters of instances for use with HPC applications. This is overall throughput (measured in Gigaflops on the y axis)
particularly valuable for those applications that rely on while increasing the matrix size (the x axis) as they
protocols like Message Passing Interface (MPI) for tightly successively doubled the number of instances.
coupled inter-node communication. Cluster Compute
instances function just like other Amazon EC2 instances but VII. CONCLUSION
also offer the following features for optimal performance In this paper, we have presented the advantage of running
with HPC applications: HPC applications in the cloud environment is that using on-
demand cloud resources can reduce the cost of maintenance
• When run as a cluster of instances, they provide low and save on purchase of the software and equipment. But
latency, full bisection 10Gbps bandwidth between instances. existing HPC applications are not always expected to be
Cluster sizes up through and above 128 instances are able to be taken entirely into a cloud environment, due to
supported. current HPC Cluster’ dominant position. Therefore, we also
• Cluster Compute instances include the specific processor present a combined HPC mode in which HPC applications
architecture in their definition to allow developers to tune can use cloud and on-premises resources together. We also
their applications by compiling applications for that specific conducted a study of performance of HPC applications in
processor architecture in order to achieve optimal various HPC cloud vendors. We included case studies with
performance. The Cluster Compute instance family Amazon EC2 for various benchmarks. Given the varied
currently contains a single instance type, the Cluster requirements for HPC clouds, it is understandable that the
Compute Quadruple Extra Large with the following range of options can vary greatly. Solutions range from
specifications: 23 GB of memory, 33.5 EC2 Compute Units shared remote clusters to full virtualized systems in the
(2 x Intel Xeon X5570, quad-core “Nehalem” architecture), cloud. Each method brings its own feature set that must be
1690 GB of instance storage, 64-bit platform, and I/O matched to the users’ needs. Finally, the above items are not
Performance: Very High (10 Gigabit Ethernet). As has been intended to be an exhaustive list of HPC cloud providers.
benchmarked by the Lawrence Berkeley Laboratory team ( Others exist, and given that the market is new and growing,
2010), some applications can expect 10x better performance more vendors will be coming online in the near future.
than on standard EC2. For the Linpack benchmark, they saw Many other factors should also be considered besides the
8.5x compared to similar clusters on standard EC2 brief analysis offered here. Your best results will come from
instances. On an 880-instance CC1 cluster, Linpack doing due diligence and testing your assumptions. Perhaps
achieved a performance of 41.82 Tflops, bringing EC2 at the most important aspect of cloud HPC is the ability to
#146 in the June 2010 Top 500 rankings. work with your vendor, because having a good working
safety net under your cloud might be the best strategy of all.
VI.4 MATLAB on Amazon Cluster Compute Instances
[3]Google Apps. [17]Marozzo, F., Lordan, F., Rafanell, R., Lezzi, D., Talia,
http://www.google.com/apps/intl/en/business/cloud.html D., Badia, R.M.: Enabling cloud interoperability with
COMPSs. In: Kaklamanis, C., Papatheodorou, T., Spirakis,
[4] HPC scheduler. Windows Azure HPC scheduler. P.G. (eds.)
http://msdn.microsoft.com/en-us/library/ Euro-Par 2012. LNCS, vol. 7484, pp. 16–27. Springer,
hh560247(v=vs.85).aspx.Windows Azure. Heidelberg (2012)
http://www.microsoft.com/windowsazure/
[18]Brecher, C., Gorgels, C., Kauffmann, P.,
[5] http://www.wzl.rwth-aachen.de/en/index.htm Röthlingshöfer, T., Flodin, A., Henser, J.:ZaKo3D-
simulation possibilities for PM gears. In: World Congress
Hpccloud. Google Enters IaaS Cloud Race. on Powder Metal,Florenz (2010)
[6]http://www.hpcinthecloud.com/hpccloud/2012-
07-03/google_enters_iaas_cloud_race.html [19]“A Performance and Cost Analysis of the Amazon
Elastic Compute Cloud (EC2) Cluster Compute Instance”
[7] Microsoftware Windows Azure. by Fenn, M., Holmes, J., Nucclarone, J., and Research
http://www.windowsazure.com/en-us/ Computing and Cyberinfrastructure Group, Penn State
University
[8] Penguin Computing on Demand.
http://www.penguincomputing.com/services/hpc-cloud/
POD
[12]UniCloud.
http://www.univa.com/products/unicloud.php