0% found this document useful (0 votes)
58 views8 pages

Azureblast: A Case Study of Developing Science Applications On The Cloud

Cloud computing has emerged as a new approach to large scale computing. It is still unclear just how well the cloud model of computation will serve scientific applications. In this paper we analyze the applicability of cloud to the sciences by investigating an implementation of a well known and computationally intensive algorithm called BLAST.

Uploaded by

kuanhoong1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views8 pages

Azureblast: A Case Study of Developing Science Applications On The Cloud

Cloud computing has emerged as a new approach to large scale computing. It is still unclear just how well the cloud model of computation will serve scientific applications. In this paper we analyze the applicability of cloud to the sciences by investigating an implementation of a well known and computationally intensive algorithm called BLAST.

Uploaded by

kuanhoong1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

AzureBlast: A Case Study of Developing Science

Applications on the Cloud

Wei Lu Jared Jackson Roger Barga


Cloud Computing Futures Cloud Computing Futures Cloud Computing Futures
Microsoft Research Microsoft Research Microsoft Research
[email protected] [email protected] [email protected]

ABSTRACT and best funded research projects are able to afford expen-
Cloud computing has emerged as a new approach to large sive computing infrastructure, most other projects are forced
scale computing and is attracting a lot of attention from the to opt for cheaper resources such as commodity clusters or
scientific and research computing communities. Despite its simply limit the scope of their research. Cloud comput-
growing popularity, it is still unclear just how well the cloud ing [2] proposes an alternative in which resources are no
model of computation will serve scientific applications. In longer hosted locally, but leased from big data centers only
this paper we analyze the applicability of cloud to the sci- when needed. This offers the promise of “democratizing” re-
ences by investigating an implementation of a well known search as a single researcher or small team can have access to
and computationally intensive algorithm called BLAST. the same large-scale compute resources as large, well-funded
BLAST is a very popular life sciences algorithm used com- research organizations without the need to invest in purchas-
monly in bioinformatics research. The BLAST algorithm ing or hosting their own physical infrastructure. Despite the
makes an excellent case study because it is both crucial to existence of several cloud computing vendors, such as Ama-
many life science applications and its characteristics are rep- zon AWS, GoGrid, and more recently Microsoft Windows
resentative of many applications important to data intensive Azure, the potential of cloud platforms for research com-
scientific research. In our paper we introduce a methodol- puting remains largely unexplored.
ogy that we use to study the applicability of cloud plat-
forms to scientific computing and analyze the results from While cloud computing holds promise for the seemingly in-
our study. In particular we examine the best practices of satiable computational demands of the scientific community,
handling the large scale parallelism and large volumes of there are unanswered questions in the applicability of cloud
data. While we carry out our performance evaluation on platforms, including performance, which is the focus of this
Microsoft’s Windows Azure the results readily generalize to work. In this paper we present an experimental prototype,
other cloud platforms. AzureBlast, which is designed to assess the applicability of
cloud platforms for science applications. BLAST [1] is one
Categories and Subject Descriptors of the most widely used bioinformatics algorithms in life
D.1.3 [PROGRAMMING TECHNIQUES]: Concurrent science applications. The BLAST algorithm can discover
Programming—Distributed programming; J.3 [Life and Med- the similarities between the two bio-sequences (e.g.,Protein).
ical Sciences]: Biology and genetics BLAST makes an excellent case study not only because of
its popularity but also because of its representative charac-
teristics of many applications important to data intensive
General Terms scientific research. AzureBlast is a parallel BLAST engine
Performance, Design running on the Windows Azure cloud fabric. Instead of us-
ing some high-level programming models or runtimes such
Keywords as MapReduce[8], AzureBlast is built directly on the fun-
BLAST, Cloud Computing, Windows Azure damental services of Windows Azure so that we are able to
examine each individual building-block. Our ultimate goal
1. INTRODUCTION is to provide a characterization of science applications ap-
Increasingly, scientific breakthroughs will be powered by ad- propriate for cloud computing platforms and best practices
vanced computing capabilities that help researchers manip- for deploying science applications on cloud platforms.
ulate and explore massive datasets. Today while the largest
The structure of this paper is as follows. In Section 2 we
briefly discuss the Windows Azure cloud platform and capa-
bilities it offers to a computational scientist. In this section
we also highlight aspects of modern data centers and the
implications for high performance computing. In Section
3 we introduce the background of BLAST, and then detail
our implementation of AzureBLAST, and identify how we
matched the requirements of the algorithm to capabilities
and limitations of the cloud platform. Throughout Section
3 we identify general patterns to follow in implementing the
similar science applications on a cloud platform. In Section
4 we carry out a detailed performance of AzureBLAST and
discuss implications for what science applications are appro-
priate for cloud computing platforms. Finally, we list the
related work and conclude with a summary of best practices
and application patterns for science applications on cloud
platform.

2. CLOUD SERVICES WITH AZURE


Windows Azure is a cloud computing platform offering by
Microsoft. In contrast to Amazon’s suite of “Infrastructure
as a Service” offerings (c.f., EC2, S3), Azure is a “Platform
as a Service” that provides developers with on-demand com-
pute and storage to host, scale, and manage web applications Figure 1: Illustration of the suggested Azure appli-
on the internet through Microsoft datacenters. A primary cation model
goal of Windows Azure is to be a platform on which ISVs
can implement Software as a Service (SaaS) applications. is isolated from all others. And resources (e.g. num-
Amazon’s EC2, in contrast, provides a host for virtual ma- ber of instances) can be scaled out or in for each role
chines, but the user is responsible for outfitting the virtual independently, and traffic bursts between roles can be
machine with the software needed for their task. absorbed in the queue. Equally important, when re-
trieving a message from the queue a user can spec-
Windows Azure has three parts: a Compute service that ify the visibilitytimeout argument. This means the
runs applications, a Storage service, and a Fabric that sup- message will remain invisible during this timeout pe-
ports the Compute and Storage services. To use the Com- riod, and will reappear in the queue if it has not been
pute service, a developer creates a Windows application con- deleted by the end of the timeout period. This fea-
sisting of Web Role instances and Worker Role instances ture ensures that no message will be lost even if the
using, say, C# and .NET or C++ and the Win32 APIs. instance which is processing the message crashes, pro-
A Web Role instance responds to user requests and may viding fault tolerance for the application.
include an ASP.NET web application. A Worker Role in-
stance, perhaps initiated by a Web Role, runs in the Azure
Application Fabric to implement parallel computations and Figure 1 illustrates the suggested application model, in which
has the ability to execute native code including the abil- one Web Role instance interacts with the web users and com-
ity to launch hosted executable applications. Unlike those municates work requests to the background Worker Role in-
high level parallel programming frameworks such as MapRe- stances through durable Queues.
duce[8] or Dryad[10], Worker Roles are not constrained in
how they communicate with other workers. Each Azure in- 2.1 Windows Azure for Research Applications
stance, representing a virtual server, is managed by the Fab- There are striking differences between scientific application
ric for the failover and recovery. workloads and the workloads for which Azure and other
cloud platforms were originally designed, specifically long
For persistent storage, Windows Azure provides three stor- lived web services with modest intra-cluster communica-
age options: Tables, Blobs, and Queues, all accessed via a tions. Scientific application workloads, on the other hand,
RESTful HTTP interface. define a wide spectrum of system requirements. There is
a very large and important class of “parameter sweep” or
“ensemble computations” that only require a large number
• An Azure table is akin to a scalable key-value store. A of fast processors and have little inter-process communica-
table can contain billions of entities and terabytes of tion requirements. At the other extreme there are parallel
data; the service efficiently scales out by automatically computations, such as fluid flow simulations, where the mes-
scaling to thousands of servers as traffic grows. saging requirements are so extensive that execution time is
• A Blob[6] is a file-like object that can be retrieved, in dominated by communication. These differences in work-
its entirety, by name. Azure enables applications to loads are reflected in the underlying hardware architectures.
store large blobs, up to 50GB each in the cloud. It For example, in networking and storage the architecture of
supports a massively scalable blob system, where hot today’s Azure network and storage service is optimized for
blobs will be served from many servers to scale out scalability and cost efficiency, whereas the primary determi-
and meet the traffic needs of an application. Further, nants of performance in an HPC system network are com-
each blob is highly available and durable as the data munications latency and bisectional bandwidth.
is replicated in the data center at least three times.
Most datacenter networks that support cloud platforms such
• A Queue[14] provides a reliable message delivery mech- as Windows Azure are built without high communication
anism between the compute roles, which can asyn- complexity computations in mind. They are optimized for
chronously communicate via messages placed in the scalable access by external clients. As illustrated in Figure 2,
queue. The queue effectively decouples the roles of an each rack of servers in a typical data center is connected to a
application. Therefore the instance failure of one role switch with 48 1Gbps ports and two to four 10Gbps up-link
ports. Those are connected to layer 2 (L2) switches which when invoked, will map the entire subject sequence database
connect up to slower layer 3 (L3) routers. When viewed as into the invoker’s virtual memory space for the subsequent
an interconnection switch for the servers, this network will sequence-searching operations. This leads to a very large
have an oversubscription factor of 5:1 or more. Furthermore memory footprint.
any routing that has to go to L3 will have greatly increased
latency. Fortunately, the BLAST implementation is also relatively
easy to parallelize since every pairwise alignment can be
conducted independently. Two parallel schemes [7] have
been widely adopted and applied to BLAST, query segmen-
tation and database segmentation. For instance the NCBI
blastall program can parallelize the searching on a SMP or
multi-core machine by partitioning the database into seg-
mentations and spawning multiple threads to search against
each of them in parallel [11]. To further leverage large
scale computing resources, several solutions have been pro-
posed to run the algorithm on a cluster [7, 4]. However, the
large-scale resources required to perform this parallelization
are usually unavailable to the majority of researchers. The
emergence of Cloud computing provides the potential op-
portunity to expand the availability of large-scale alignment
search to a much larger set of researchers.

Figure 2: Structure of Data Center networks


3.2 Basic Design
AzureBlast is a parallel BLAST engine running on the Win-
dows Azure that can marshal the compute power of thou-
sands of Azure instances. To run BLAST on multiple in-
Overall, current data center network architectures are op-
stances, we adopt the query-segmentation data-parallel pat-
timized for high scalability at low cost, and are not
tern. Given an input file which contains a number of query
optimized for low latency communication between ar-
sequences, AzureBlast will first split the input sequences
bitrary compute nodes. In the request/response workloads
into multiple partitions, each of which will be delivered to
found in many internet services, these over-subscription lev-
one worker instance to execute. Once all partitions have
els and resulting network latencies can be tolerable and work
been processed, the results will be merged together. Com-
reasonably well. They are never ideal but they can be suf-
pared with the alternative database segmentation scheme,
ficient to support the workload. But, for workloads that
which needs the inter-node communication for each query,
move massive amounts of data between nodes oversubscrip-
the query segmentation is more suitable for the cloud plat-
tion can impair performance. Adding compute nodes to an
form as it needs little communication between instances,
application that performs extensive communication relative
thus presenting a pleasingly parallel pattern which is easy
to computation can actually reduce throughput as the net-
to scale on the cloud.
work is often the limiting factor in job performance and
scalability.
AzureBlast follows the application model suggested for gen-
eral Azure development. One or more web role instances re-
3. AZUREBLAST ceive the requests from the user through either the web por-
3.1 BLAST tal or the web service interface, and a number of worker role
BLAST (Basic Local Alignment Search Tool) [1] is one of instances running behind do the heavy lifting work, which in
the most widely used bioinformatics algorithms in life sci- this case is executing the NCBI blastall program. However
ence applications. Given one nucleotide or peptide sequence, unlike most business applications, BLAST imposes several
the BLAST algorithm searches against a database of sub- challenges in managing the long-running job, massive paral-
ject sequences and discovers all the local similarities be- lelism and large data volumes.
tween the query sequence and subject sequences. The result
of BLAST can identify the function of the query sequence. 3.3 Batch Job
NCBI (National Center for Biotechnology Information) pro- A BLAST query over a normal size genomics database can
vides one reference implementation of BLAST algorithm, take several hours or even days. Therefore a batch system
blastall, which is publicly downloadable from their website. which allows the user to submit jobs and to periodically
query the status of the submitted jobs is required. When
A BLAST run can be very computationally intensive due to the user submits a long-running job request he will receive a
the large number of the pairwise genome alignment opera- job ID. This ID is used to track the job, manage its execution
tions. It can also be data-intensive because of the large size and group the resulting job output.
of the reference databases and query output, which usually
determined by the size of the database rather than the by In AzureBlast, the batch job management consists of two
the length of the query sequence. For example, GenBank, separate components (Fig. 3), i) the job submission por-
a well-known DNA sequence repository, doubled in size in tal and ii) the job scheduler. The job submission portal,
about 15 months and contains 108,431,692 sequences as of via which the user submits his job request, is hosted by a
August 2009. Moreover, the NCBI blastall implementation, web role instance. Before returning the job ID to the user,
multiple queues as so on. The task parallel programming
model, initially developed to improve the parallel program-
ming productivity on SMP processor (e.g.,Cilk[3] and Mi-
crosoft TPL[16]), has proven to be more suitable for build-
ing applications on the large scale distributed system such
as data center (e.g., MapReduce, Dryad) than the tradition
parallel programming model such as MPI.

For our implementation of AzureBlast we developed our own


task parallel library for Azure. This library actually is a thin
abstraction layer upon Azure messages with the necessary
concurrency and coordination support required for the task
parallelism patterns (e.g., Join/Fork) widely used in science
applications. A task can be serialized into an Azure mes-
sage and all task messages are then enqueued into the global
dispatch queue. All worker instances compete for tasks from
Figure 3: Architecture of AzureBlast this dispatch queue. Maintaining one global dispatch queue
enables the system to dynamically scale to the number of
worker instances based on the length of the queue. Once a
the portal will register the job into a dedicated Azure table worker instance gets a message containing a task, it dese-
called the job table. This immediate persistence is useful to rializes the message and then executes the associated task.
mitigate against loss in the case of a worker role crash. The Built upon the Azure messages, tasks automatically gain
job scheduler, which is an independent process, fetches the fault tolerance brought by the durable nature of messages
job from the job table, schedules its execution, and main- in Azure Queues. If one instance failed, the task which was
tains the job state. Likewise, all the job states are also being processed by this instance will reappear in the dispatch
persistent in the Azure table to prevent the data loss caused queue after the expiration of the visibilitytimeout period and
by instance failure. The scheduling policy is customizable then will be picked by another instance. It is important to
and can integrate simple first-in-first-out mechanisms or in- notice that the execution of each task has to be idempotent
clude task priority assignment. Since the job scheduler and as we can’t tell whether the instance fails or not.
the job submission portal are decoupled by the job table, we
are able to run them on two different instances. This iso- In order to ease exception handling and coordination among
lation provides better fault tolerance as the loss of one will multiple tasks, we adopt the activity semantics in WS-BPEL
not affect the other as the failover mechanism of the cloud workflow [12]. In our model, each task owns two queues
platform can keep the entire system working. (Fig. 4): 1) the result queue and 2) the cancellation token
queue. Whenever the task execution completes, either the
The Azure job scheduler takes responsibility for job dispatch result or the exception will be put into its own result queue.
by enqueuing the job’s task into a global queue called dis- Meanwhile, during the execution the task can detect a can-
patch queue. All worker role instances poll against this queue cellation request by probing for messages in its cancellation
seeking available work. Unlike the traditional batch job sys- queue. The result queue or the cancellation queue also can
tems, such as PBS [9], the job scheduling on the cloud plat- be redirected to a shared queue among multiple tasks.
form can remain unaware of most resource management is-
sues, such as failure recovery and health monitoring, which
are taken care of by the cloud fabric.

3.4 Task Parallelism


Both Windows Azure and Amazon EC2 recommend using a
reliable message queue (i.e., Azure Queue or Amazon SQS) Figure 4: The queues of a task
as the communication method between instances. The ad-
vantage of using reliable messaging for building large-scale
applications in the Cloud are well recognized [14, 17]. Queues To implement the Fork/Join pattern, a task spawns multi-
provide the buffer necessary to manage workload bursts. ple child-tasks. The spawning is just serializing the child-
They decouple the components to make the system more task into the message and then putting the message into the
resilient to the instance failure, and they allow the applica- dispatch queue. After spawning child tasks, the parent task
tion to operate without having to know the exact number can wait for their completion by checking each output queue.
of worker instances. This last feature allows for transparent Waiting can be either synchronously or asynchronously, and
scaling of the system, which is the most desirable feature for in fact the asynchronous wait case turns to be very appealing
scientific applications where data inputs can vary greatly as it saves one valuable instance resources from busy waiting.
between runs. However when the child task is a long-running one, the tra-
ditional asynchronous/ callback pattern is not robust in the
The API of reliable messaging, however, is still inconvenient Cloud because if the instance that is asynchronously wait-
and unintuitive for developing the parallel science applica- ing for results crashes all asynchronous state is lost. This
tions. For example, care needs to be taken when handling er- would forces the re-execution of the parent task and all child
rors and exceptions, and to implement coordination among tasks, which can be very expensive for a typical science ap-
plication. An alternative pattern that has proven to be very sues arise when considering the failure situation on Cloud.
useful based on our experiences, is using the continuation One consideration is partitioning the work across a pool of
task. In this pattern, the parent task specifies a continua- Azure worker nodes. The number of partitions has a subtle
tion task before spawning child tasks and this continuation impact on the system performance. In general, the num-
task, which inherits the result queue from the parent queue, ber of partitions should be large enough that all worker in-
is stored in one Azure table first. Once all child-tasks have stances can work in parallel. A simple scheme is to set the
completed, the continuation task will be fetched out and number of partitions to be equal to the number of worker
put into the dispatch queue for the execution. Hence, any instances available. This scheme, however, may cause load
instance failure does not affect the overall job progress. The imbalance as the processing time of each partition may vary
best practice is the hybrid of these two patterns: for quick significantly. Moreover, a failure in any one instance re-
child tasks the asynchronous wait is preferred for ease of pro- quires the entire ensemble to wait for the visibilitytimeout
grammability and smaller overhead; while for a long-running period to expire before the task will once again become visi-
child-task the continuation task is preferred for the better ble in the Azure queue. In order to improve load balancing,
fault tolerance. one can create a large number of small partitions. However,
the NCBI blastall program must repeat loading the entire
If one exception message is detected from any of the child database into virtual memory for every execution, so the
task output queues, the parent task should have multiple op- overall performance suffers from the cold cache overhead.
tions: canceling all child-tasks (i.e., job abortion), ignoring Through practical experience, we have found the ideal num-
the exception and keeping other tasks running, or retrying ber of partitions to be 2x or 3x the number of instances,
the failed child-task later. In practice, we have found the and the resulting size of each partition is large enough to
ability to promptly abort a long-running job is more desir- mitigate the overhead of database loading.
able. This is especially true when running experiments that
involve a massive number of parallel instances as in Cloud Another consideration is setting the value of visibilitytime-
time is money. out, which essentially is the estimation of the task running
time, for each BLAST task. If the value is too small, one
task that is being processed by one instance will reappear in
3.5 AzureBlast Tasks the dispatch queue, thus leading to a repeated computation;
With the aforementioned task library, it becomes quite straight-
if this value is too large the entire ensemble has to wait a
forward to implement the data-parallel BLAST on Azure.
unnecessary long period of time in case of the instance fail-
The main task of a BLAST job is the data-partitioning task
ure. For the BLAST task, one reasonable way to estimate
which splits the input sequences into multiple partitions,
its running time, thus the visibilitytimeout value, is based on
each of which will be stored as one Azure blob. The data-
the number of total letters (i.e., pair-bases) in the partition.
partitioning task then spawns one child task, called BLAST
task, for each partition, and then sets up a continuation task
to merge the results from all child tasks. Each BLAST task 3.6 Managing read-only large databases
downloads the partition from the blob storage, and simply One key component in BLAST application is the subject se-
executes the NCBI blastall binary over it. After the execu- quence database, over which the input sequence will be com-
tion is done, the BLAST task puts the output result back to pared. NCBI provides a set of reference BLAST databases
blob storage and puts the completion message into its own of Nucleotide and Protein via FTP for download. Most ref-
result queue. Once completion messages are received from erence databases are large. For instance, the size of the
all child-tasks, the merging task downloads all results from NR database, which is a non-redundant protein sequence
blob storage and merges them together to form the final re- database, is about 10GB. Moreover, these databases are pe-
sult, which is again pushed back to the Azure blob. Finally riodically updated by NCBI to offer up to date reference
the job scheduler is notified that the job has completed. data to the biologist.

The NCBI blastall treats the sequence database just as a


regular local file, thus each worker role instance must have
local file system access to the database files. The simplest
solution is to embed the required database as a part of de-
ployment package (or image in Amazon EC2 term) as [13]
does. However considering the frequently updated charac-
teristic of NCBI BLAST database, this solution is far from
optimal. Another naı̈ve way is having each worker role in-
stance download databases when needed directly from NCBI
site. Since Azure, and other cloud platforms, charge for data
transfers to/from the data center, this approach would be
cost prohibitive. Moreover, when a number of worker role
instances download large databases from NCBI simultane-
ously, the NCBI ftp server can be easily overwhelmed.

Figure 5: Workflow of AzureBlast tasks In AzureBlast, we take an indirect scheme which leverages
the highly scalable Azure blob storage. A background database
updating process, which runs on its own role instance, peri-
Although the workflow is straightforward, some subtle is- odically refreshes the NCBI databases into Azure blob stor-
age. Specified by the user, the database can be staged dur-
ing the initialization phase of each instance, or it can staged
in a lazy manner when the instance is going to execute a
BLAST task. In either case, if the timestamp of local replica
has expired, the database will be updated from blob stor-
age. As Azure blobs are designed to provide highly scalable
throughput, this indirect solution actually provides the best
performance.

Another simple but effective optimization is data compres-


Figure 6: Azure Instance size and configuration
sion. Most Blast databases can have very high compress
ratio. For example after running ZIP, the NR database is
only about 2.8GB, 28% of its original size. Compression
decreases the blob storage size, thus being more economic.
More importantly, the smaller files actually save significant
bandwidth between the blob storage and multiple concur-
rent instances.

Another option that one could consider for managing a large


database is using AzureDrive [5], which is akin to the Ama-
zon EBS. The AzureDrive allows the role instance to mount
a blob (actually it is page blob) as a NTFS local disk drive.
When multiple instances want to mount the same read-only
blob, the recommended solution is first creating a snap-
shot for the shared blob and then each instance mounts the
snapshot as a local AzureDrive. Certainly using AzureDrive
brings interesting advantages. For example, the system de-
velopment is greatly simplified as there is no need for the Figure 7: Performance of Blast on one instance
explicit data staging. Moreover, the size of databases is
not limited by the size of local instance disk and the drive
automatically takes care of the caching, paging and other
BLAST on an individual Azure instance as it is important
non-trivial issues. However as the drive hides most low-level
to get the best local optimization before we scale the sys-
parameters (e.g., data transfer implementation, caching im-
tem out with massive instances. As mentioned earlier, the
plementation) from the user, its I/O performance hardly
NCBI blastall program can parallelize the single query on the
compete with a fine-tuned blob client implementation built
multi-core processor. AzureBlast takes advantage of this lo-
directly on the Azure Blob API. In addition, the data com-
cal parallelization by automatically adjusting the command
pression optimization can no longer be applied.
line argument -a of blastall, which tells the BLAST imple-
mentation how many processors it can use, according to the
4. EVALUATION AND DISCUSSION size of the running instance. That means for the small, me-
In this section we present an evaluation of AzureBLAST dian, large and extra-large size instances, the value of argu-
and discuss implications for what science applications are ment -a is 1, 2, 4 and 8 respectively. In order to quantify
appropriate for cloud computing platforms. To evaluate the the performance impact of the task granularity, we deploy
performance and scalability of AzureBlast, we deploy our AzureBlast on Azure with one single worker role instance
implementation on the Windows Azure platform. Windows and measure the elapsed time of one submitted job, which
Azure compute instances come in four unique sizes to enable splits all input sequences into one single partition. We vary
complex applications and workloads. Each Azure instance the size of the input query from one sequence to 300 se-
represents a virtual server and the hardware configurations quences, each of which is around 110 base-pairs in length.
of each size are listed in Fig. 6. In term of software, every The database staging is completed during the instance ini-
Azure instance , no matter the size, runs a specially tai- tialization phase so we are guaranteed that each instance
lored Microsoft Windows Server 2008 Enterprise operating has a local replica and the database staging time is excluded
system as the guest OS, referred to as the Azure Guest OS. from our measurement. Moreover, we also vary the size of
Azure provides several geo-location choices and all experi- the worker instance to identify the performance difference
ments reported in this paper were conducted on our South caused by the instance size. The measurements are summa-
Central US regional data center. The NCBI blastall used rized in Figure 7. The smallest task, which only contains one
in the experiments presented below is the Windows 64-bit sequence, is an order of magnitude slower than that of a large
binary version 2.2.2 and the test subject database is the task which contains 100 sequences. After the task granular-
most recent NR database, a non-redundant protein sequence ity is more than 100 sequences per partition the instance
database that contains 10,427,007 sequences (3,558,078,962 is saturated and generates the constant throughput. The
total letters and about 10 Giga total bytes), downloaded result clearly demonstrates the benefit of the warm cache
from NCBI. Our AzureBlast implementation is written for effect.
the Azure SDK (February 2010).
Another interesting observation is the performance enhance-
For our evaluation, we first measure the performance of ment achieved by increasing the instance size. We calculate
Figure 9: Scalability of AzureBlast
Figure 8: Cost of Running Blast on one instance

the throughput speedup of different sizes of instances against


the base case, which is the throughput of a one-core small
instance. Although it is predictable that the larger instance
performs better due to the provisioning of additional cores,
the values of obtained speedup are actually all larger than
the number of cores they have. This super-linear speedup is
primarily due to the memory capability. The test database,
NR, is around 10GB size while the small instanced only has
1.75GB, thus unable to load the database into memory. Con-
versely the large or extra-large instance has ample memory
to contain the database. Given the fact that Azure increases
the computation price of different sizes of instances in a pro-
portional manner, we can immediately derive an interesting
point that the larger instance is actually more economical Figure 10: Read and Write Throughput of Blob
to use. In fact, the costs chart in Figure 8, which is liter- Storage
ally converted from Figure 7, shows the extra-large instance
provides not only the largest throughput but also the most
economical throughput. stage the 2.8GB compressed NR database, it takes about 3
minutes for one instance and 13 minutes for all 64 instances
In the next experiment, we measure the scalability of Azure- to complete the staging. This level of latency, compared
Blast. In this experiment, we use up to 64 large size in- with the execution time of a single BLAST task, is tolera-
stances. The instances are allocated statically and again ble, thus the lazy data staging is feasible. The relatively low
the database staging takes place during the instance initia- throughput of blob writing, we believe, is caused by the data
tion phase. We first deploy AzureBlast on Azure with one consistency mechanism, which atomically maintains three
worker instance to measure the throughput of one job; fol- independent copies of each blob.
lowing this we then re-deploy the project with double num-
ber of instances and repeat the measurement. The input
query contains 4096 sequences and will be partitioned into 5. RELATED WORK
64 partitions. The measurement is summarized in Figure 9. Running BLAST on local cluster has been well studied.
We see the throughput of AzureBlast increases almost lin- mpiBlast[7], built upon on MPI, takes the database seg-
early when given more instances. This is not surprising as mentation approach. In contrast, Braun et al.[4] present
AzureBlast is essentially a pleasingly parallel solution, which a coarse-grained query segmentation approach, which uses
is one of most scalable patterns for cloud computing plat- PBS as the batch-job scheduler for the local cluster. Like-
form. wise, CloudBLAST[13] also adopts the query-segmentation
data-parallel pattern. However CloudBLAST uses the MapRe-
Finally to understand and characterize the data staging per- duce approach to model this simple parallelism and relies
formance in AzureBlast, we measure the throughput of Azure on the Hadoop runtime for the management of node fail-
blobs. In this experiment a number of worker instances ure, data and jobs. The experiments on CloudBLAST were
are instantiated and each instance keeps reading or writing conducted in two virtual clusters connected by virtual net-
large-volume data from/into Azure blob storage. The aggre- works and the test database is statically bound with the
gated throughput averaged and reported in Figure 10. Both deployment. With more emphasis on the cost, Wilkening et
the blob and worker instances are all located in the South al. [18] reports a feasibility study of running BLAST work-
Central US region and HTTP is used as the transport pro- flow on Amazon EC2. They compared the BLAST execu-
tocol. Azure blobs provide remarkable read throughput and tion time on Amazon EC2 extra large nodes with the one
scale with increased number of instances. For example, to on the local cluster nodes; and then estimated the corre-
sponding running cost on these two resources. Their result [2] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph,
suggested that Cloud cost currently is slightly higher. How- R. Katz, A. Konwinski, G. Lee, D. A. Patterson,
ever their estimation doesn’t count some Cloud-unique fea- A. Rabkin, I. Stoica, and M. Zaharia. Above the
tures in, such as the capacity elasticity, the failover mech- clouds: A berkeley view of cloud computing, Feb 2009.
anism and the durable data storage service. Schatz pre- [3] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E.
sented a MapReduce based parallel sequence alignment al- Leiserson, K. H. Randall, and Y. Zhou. Cilk: an
gorithm, CloudBurst [15], which servers the same goal of efficient multithreaded runtime system. In PPOPP
the BLAST algorithm. Instead of using the NCBI BLAST ’95: Proceedings of the fifth ACM SIGPLAN
program, CloudBurst implements the alignment algorithm symposium on Principles and practice of parallel
directly in the Hadoop MapReduce programming model. programming, pages 207–216, New York, NY, USA,
Its performance evaluation on Amazon EC2 shows a good 1995. ACM.
scalability. While this reimplementation of the algorithm [4] R. C. Braun, K. T. Pedretti, T. L. Casavant, T. E.
leads to a finer-grained parallelism, it is unclear that how it Scheetz, C. L. Birkett, and C. A. Roberts.
performs and scales when comparing with the simple data- Parallelization of local blast service on workstation
parallel NCBI-BLAST based approach, especially consider- clusters. Future Generation Computer Systems,
ing NCBI-BLAST itself has been carefully optimized on the 17(6):745 – 754, 2001.
multi-core machine. [5] B. Calder and A. Edwards. Windows azure drive.
Technical report, Microsoft, 2010.
Notice that those Cloud-enabled BLAST implementations [6] B. Calder, T. Wang, S. Mainali, and J. Wu. Windwos
are all based on Hadoop MapReduce runtime. Conversely, azure blob. Technical report, Microsoft, 2009.
AzureBlast is built directly on the basic services of Windows
[7] A. E. Darling, L. Carey, and W. chun Feng. The
Azure. Our intension is to obtain a better understanding
design, implementation, and evaluation of mpiblast. In
of these building-blocks of Cloud from the experiments of
In Proceedings of ClusterWorld, 2003.
AzureBLAST. Meanwhile this approach also provides more
flexibility to help us identify useful practices and patterns, [8] J. Dean and S. Ghemawat. Mapreduce: simplified
such as exception handling, for developing science applica- data processing on large clusters. Commun. ACM,
tions on cloud platforms. 51(1):107–113, 2008.
[9] R. L. Henderson. Job scheduling under the portable
batch system. In IPPS ’95: Proceedings of the
6. CONCLUSION Workshop on Job Scheduling Strategies for Parallel
In this paper we have described the implementation of Azure- Processing, pages 279–294, London, UK, 1995.
Blast, a parallel BLAST engine on Windows Azure. BLAST Springer-Verlag.
is not only relevant to a large number of research com- [10] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly.
munities; it represents a large-number of science applica- Dryad: distributed data-parallel programs from
tions. These applications are usually computation intensive, sequential building blocks. In EuroSys ’07:
data intensive and can be parallelized by a simple coarse- Proceedings of the 2nd ACM SIGOPS/EuroSys
grained data-parallel computational pattern. While high European Conference on Computer Systems 2007,
performance is often considered desirable, scalability and pages 59–72, New York, NY, USA, 2007. ACM.
reliability are usually more important for this class of appli- [11] H.-S. Kim, H.-J. Kim, and D.-S. Han. Performance
cations. Our experience demonstrates that Windows Azure evaluation of blast on smp machines. pages 668–676.
can support the BLAST and associated class of applications 2006.
very well due to its scalable and fault-tolerant computation [12] R. Lucchi and M. Mazzara. A pi-calculus based
and storage services. Moreover the pay-as-you-go model, to- semantics for ws-bpel. Journal of Logic and Algebraic
gether with elasticity scalability of cloud computing greatly Programming, 70(1):96–118, January 2007.
facilitates the democratization of research. Research ser- [13] A. Matsunaga, M. Tsugawa, and J. Fortes. Cloudblast:
vices in the cloud such as AzureBlast can make any research Combining mapreduce and virtualization on
group competitive with the best funded research organiza- distributed resources for bioinformatics applications.
tions in the world. We have identified several general best eScience, IEEE International Conference on, 2008.
practices from AzureBlast throughout our paper. For ex- [14] Microsoft. Windows azure queue. Technical report,
ample, the task parallel programming model naturally fits Microsoft, 2008.
the characteristics of the cloud platform; decoupling com-
[15] M. C. Schatz. Cloudburst: highly sensitive read
ponents via reliable queues or other durable storage so the
mapping with mapreduce. Bioinformatics,
system can achieve better fault tolerance and resource op-
(11):1363–1369, June 2009.
timization; position large data close to the geo-location of
computation for better throughput and lower cost; and last [16] S. Toub. Patterns for parallel programming:
but not least, allocating resources such as instance size based Understanding and applying parallel patterns with the
on profiling characteristics of the application to achieve the .net framework 4. Technical report, Microsoft, 2010.
most cost-effective performance. [17] J. Varia. Architecting for the cloud: Best practices.
Technical report, Amazon, 2010.
[18] J. Wilkening, A. Wilke, N. Desai, and F. Meyer. Using
7. REFERENCES clouds for metagenomics: A case study. Proceedings
[1] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and IEEE Cluster, 2009.
D. J. Lipman. Basic local alignment search tool.
Journal of Molecular Biology, 215(3):403 – 410, 1990.

You might also like