2014 SWF14 Cloud-Workflow
2014 SWF14 Cloud-Workflow
(1)
TABLE IX. PERFORMANCE IMPROVEMENT (EUCALYPTUS)
Cluster Size Time(s) Base Line(s) Time Saved(percent)
32 39.598 40.376 -
16 9.548 34.173 72%
8 7.61 28.331 73%
4 5.844 25.053 76%
2 4.571 23.891 80%
1 4.482 21.503 79%
C. Montage Image Mosaic Workflow
In the OpenNebula environment, We demonstrate and
analyze the integration implementation using a Montage Image
Mosaic Workflow. Montage is a suite of software tools
developed to generate large astronomical image mosaics by
composing multiple small images, as shown in Fig. 4. The
typical workflow process involves the following key steps:
Image projection:
o re-project each image into a common coordinate space
(mProjectPP)
Background rectification:
o Calculate a list of overlapping images (mOverlaps)
o Perform image difference between each pair of
overlapping images (mDiffFit)
o Fit difference images into a plane (mConcatFit)
o Background correction (mBackground)
Image co-addition (mAdd):
o Optionally divide a region into a grid of sub-regions,
and co-add the images in each region into a mosaic
o Co-add the processed images (or mosaics in sub-
regions) into a final mosaic
And finally the mosaic is shrunk (mShrink) and converted
into a JPEG image (mJPEG) for display.
To visualize the workflow execution, We developed a
Nebula Image Mosaic demo service. In the demo a user can
pick one of the nebulae (e.g. the Swan Nebulae) to create the
mosaic for it, and the demo service submits a workflow request
to the Cloud workflow service, which in turn instantiates the
Cloud resources on-the-fly to execute the workflow. The demo
also visualizes workflow progress in a DAG (directed acyclic
graph) on the top, and displays the execution log on the lower
left and intermediate results on the lower right, as illustrated in
Fig. 5. The deployment provides scientists with an easy-to-use
platform to manage and execute scientific workflows on a
Cloud platform without knowing the details of workflow
scheduling and Cloud resource provisioning.
Fig. 4. The Montage Workflow
Fig. 5. Nebula Image Mosaic Demo
VI. CONCLUSIONS AND FUTURE WORK
We discuss the challenges for traditional scientific
workflow applications in the big data era and the available
solutions to deal with the these challenges. We propose a
service framework, which meets all the essential requirements
for a scientific computing Cloud platform, to normalize the
integration of SWFMS and Cloud computing and address the
big data processing problem in traditional infrastructures.
Meanwhile, we also present our implementation details based
on the service framework for the integration of the Swift
workflow management system with both OpenNebula and
Eucalyptus, and set up a series of experiments to demonstrate
the capability of our implementation. We also demonstrate the
functionality and efficiency of our approach using a Montage
Image Mosaic Workflow.
For future work, we will investigate to port different
SWFMSs, such as Taverna, VIEW, etc., to Clouds according to
the proposed framework. We will also investigate autonomous
application deployment in resource provisioning, which can
deploy workflow applications automatically in a virtual cluster.
ACKNOWLEDGMENT
This paper is supported by the key project of National
Science Foundation of China No. 61034005 and No.
61272528.
REFERENCES
[1] Woitaszek, M., Dennis, J., Sines, T. Parallel High-resolution Climate
Data Analysis using Swift. 4th Workshop on Many-Task Computing on
Grids and Supercomputers 2011.
[2] Damkliang K, Tandayya P, Phusantisampan T, et al. Taverna Workflow
and Supporting Service for Single Nucleotide Polymorphisms
Analysis[C]//Information Management and Engineering, 2009.
ICIME'09. International Conference on. IEEE, 2009: 27-31.
[3] Zhang J, Votava P, Lee T J, et al. Bridging VisTrails Scientific
Workflow Management System to High Performance
Computing[C]//Services (SERVICES), 203 IEEE Ninth World Congress
on. IEEE, 2013: 29-36.
[4] Zhang J. Ontology-driven composition and validation of scientific grid
workflows in Kepler: a case study of hyperspectral image
processing[C]//Grid and Cooperative Computing Workshops, 2006.
GCCW'06. Fifth International Conference on. IEEE, 2006: 282-289.
[5] Juve G, Deelman E. Scientific workflows in the cloud[M]//Grids,
Clouds and Virtualization. Springer London, 2011: 71-91.
[6] I. Foster, Y. Zhao, I. Raicu, S. Lu. Cloud Computing and Grid
Computing 360-Degree Compared, IEEE Grid Computing
Environments (GCE08) 2008, co-located with IEEE/ACM
Supercomputing 2008. Austin, TX. pp. 1-10
[7] G. Bell, T. Hey, A. Szalay, Beyond the Data Deluge, Science, Vol. 323,
no. 5919, pp. 1297-1298, 2009.
[8] E. Deelman et al. Pegasus: A framework for mapping complex scientific
workflows onto distributed systems, Scientific Programming, vol. 13,
iss. 3, pp. 219-237. July 2005.
[9] B. Ludscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E.
A. Lee, J. Tao, Y. Zhao, Scientific workflow management and the
Kepler system, Concurrency and Computation: Practice and
Experience,Special Issue: Workflow in Grid Systems, vol. 18, iss. 10,
pp. 10391065, 25 August 2006.
[10] J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger and H.
T. Vo, Managing Rapidly-Evolving Scientific Workflows, Provenance
and Annotation of Data, Lecture Notes in Computer Science, 2006, vol.
4145/2006, 10-18, DOI: 10.1007/11890850_2
[11] D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. Pocock, P. Li, and T.
Oinn, Taverna: a tool for building and running workflows of services,
Nucleic Acids Research, vol. 34, pp. 729-732, 2006.
[12] J.-S. Vckler, G. Juve, E. Deelman, M. Rynge, G. B. Berriman.
Experiences Using Cloud Computing for A Scientific Workflow
Application, Invited Paper, ACM Workshop on Scientific Cloud
Computing (ScienceCloud) 2011. pp. 15-24.
[13] Lin C, Lu S, Lai Z, et al. Service-oriented architecture for VIEW: a
visual scientific workflow management system[C]//Services Computing,
2008. SCC'08. IEEE International Conference on. IEEE, 2008, 1: 335-
342.
[14] Kozlovszky M, Karoczkai K, Marton I, et al. Enabling generic
distributed computing infrastructure compatibility for workflow
management systems[J]. Computer Science, 2012, 13(3): 61-78.
[15] Wang L, Duan R, Li X, et al. An Iterative Optimization Framework for
Adaptive Workflow Management in Computational Clouds[C]//Trust,
Security and Privacy in Computing and Communications (TrustCom),
2013 12th IEEE International Conference on. IEEE, 2013: 1049-1056.
[16] Jrad F, Tao J, Streit A. A broker-based framework for multi-cloud
workflows[C]//Proceedings of the 2013 international workshop on
Multi-cloud applications and federated clouds. ACM, 2013: 61-68.
[17] K. Keahey, T. Freeman, Science Clouds: Early Experiences in Cloud
Computing for Scientific Applications, Cloud Computing and Its
Applications 2008 (CCA-08), Chicago, IL. October 2008.
[18] C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman,
J. Good, On the Use of Cloud Computing for Scientific Workflows,
3rd International Workshop on Scientific Workflows and Business
Workflow Standards in e-Science (SWBES), pp. 640-645, 2008.
[19] M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa, M.
Hategan, B. Clifford, I. Raicu, Parallel Scripting for Applications at the
Petascale and Beyond, IEEE Computer Nov. 2009 Special Issue on
Extreme Scale Computing, vol. 42, iss. 11, pp. 50-60, 2009.
[20] Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. Laszewski, I. Raicu, T.
S.-Praun, M. Wilde. Swift: Fast, Reliable, Loosely Coupled Parallel
Computation, IEEE Workshop on Scientific Workflows 2007, pp. 199-
206.
[21] M. Christie and S. Marru. The lead portal: a teragrid gateway and
application service architecture: Research articles. Concurrency and
Computation : Practice and Experience, 19(6):767{781, 2007.
[22] Bhandarkar M. MapReduce programming with apache
Hadoop[C]//Parallel & Distributed Processing (IPDPS), 2010 IEEE
International Symposium on. IEEE, 2010: 1-1.
[23] Chaisiri S, Bong Z, Lee C, et al. Workflow framework to support data
analytics in cloud computing[C]//Cloud Computing Technology and
Science (CloudCom), 2012 IEEE 4th International Conference on. IEEE,
2012: 610-613.
[24] Keahey, K., and T. Freeman. Contextualization: Providing One-click
Virtual Clusters. in eScience. 2008, pp. 301-308. Indianapolis, IN, 2008.
[25] C. Lin, S. Lu, X. Fei, A. Chebotko, D. Pai, Z. Lai, F. Fotouhi, and J.
Hua, A Reference Architecture for Scientific Workflow Management
Systems and the VIEW SOA Solution, IEEE Transactions on Services
Computing (TSC), 2(1), pp.79-92, 2009.
[26] G. Juve and E. Deelman. Wrangler: Virtual Cluster Provisioning for the
Cloud. In HPDC, pp. 277-278, 2011.
[27] I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, M. Wilde. Falkon: a Fast
and Light-weight tasK executiON framework, IEEE/ACM
SuperComputing 2007, pp. 1-12.
[28] Lacroix Z, Aziz M. Resource descriptions, ontology, and resource
discovery[J]. International Journal of Metadata, Semantics and
Ontologies, 2010, 5(3): 194-207.
[29] Szabo C, Sheng Q Z, Kroeger T, et al. Science in the Cloud: Allocation
and Execution of Data-Intensive Scientific Workflows[J]. Journal of
Grid Computing, 2013: 1-20.