See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/321257451
Distributed High Performance Computing in OpenStack Cloud over SDN
Infrastructure
Conference Paper · November 2017
DOI: 10.1109/SmartCloud.2017.29
CITATION READS
1 293
4 authors, including:
Ram Sharan Chaulagain Santosh Pandey
Tribhuvan University Stevens Institute of Technology
3 PUBLICATIONS 7 CITATIONS 9 PUBLICATIONS 9 CITATIONS
SEE PROFILE SEE PROFILE
Subarna Shakya
Tribhuvan University
73 PUBLICATIONS 151 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
BlockSIM: A practical simulation tool for optimal network design, stability and planning. View project
Multimedia System Elective course View project
All content following this page was uploaded by Santosh Pandey on 14 August 2018.
The user has requested enhancement of the downloaded file.
Distributed High Performance Computing in
OpenStack Cloud over SDN Infrastructure
Sadhu Ram Basnet∗ , Ram Sharan Chaulagain †, Santosh Pandey ‡ , Subarna Shakya §
Department of Electronics and Computer Engineering
Institute of Engineering(IOE), Tribhuvan University
Lalitpur, Nepal
Email: (∗ sadhupaper, † tangent.rams, ‡ santosh.pandey2222)@gmail.com, §
[email protected] Abstract—Cloud computing facilitates end users to allow high Cloud Computing provides on-demand compute, storage
performance computing applications by allocating resources on and network resources and charges which depend on the usage
demand. This avoids large capital expenditure for small and of the resources. The cloud service models are classified into
medium sized enterprises having limited resources to obtain High Software as a Service(SaaS), Platform as a Service(PaaS) and
Performance Computing(HPC). In this paper, we propose a new Infrastructure as a Service(IaaS). And deployment of these
distributed HPC model in self-built OpenStack public cloud service models may be private, public or hybrid. The cloud
under SDN infrastructure. For partitioning any task in HPC, computing provides the compute and storage resources using
domain decomposition strategy is used. We analyse different the open-source middle-ware tools such as OpenStack,
parallel computation topologies and benchmark their
CloudStack and OpenNebula [3]. There are many researches
performances. The public cloud constructed under OpenStack
platform is integrated with OpenDaylight SDN controller for
to evaluate the performance of HPC in Cloud. The earliest
network controlling and monitoring. Also, we analyze and evaluated platform is EC2 from Amazon. Performance of the
compare the performance of the OpenStack cloud with and Amazon EC2 platform is analyzed using micro-benchmarks,
without proposed distributed HPC system. The result shows that kernels and e-science workloads. The evaluation result shows
the speed performance of OpenStack cloud under SDN that the virtualized environments have the significant influence
infrastructure is enhanced by implementation of our HPC system network overhead, this hampers the adoption of GPGPU
based on Hypercube algorithm and Mesh Algorithm. communication-intensive applications [4]. All other similar
evaluating researches carried out on Azure and Nimbus
Index Terms—HPC, OpenStack, Public cloud, SDN, environment landed with a similar conclusion that the major
OpenDaylight controller, Hypercube Algorithm, Mesh Algorithm impact factor preventing HPC applications to scale is a
virtualization layer. But, the major benefit of virtualization is
I. INTRODUCTION higher availability of resource re-allocation on demand. This
With the advancement in the field of cloud technology, attracts the attention of HPC community for executing of large
computation throughput is increased in terms of performance parallel applications. Several models are used to use the cloud
and speed day by day. High Performance Computing(HPC) efficiently when launching HPC applications. There are
utilizes resources optimally. It uses high computation power of basically two ways to perform computation of wide number of
thousands of processors and high-throughput network. It data [5]. One is to use a single machine to perform complex
solves compute-intensive scientific and large-scale distributed task which obviously takes a long time to accomplish. Another
problems. Generally, the use of HPC systems is limited only to approach is to employ a distributed system that works in
scientists and researchers who have access to super-computing parallel. The work done will be fast and will be achieved on
labs and centers. However, as with the extension of HPC to time. In Energy-Aware Heterogeneous Cloud Management
cloud system, there is an increasing demand for high model [6], Heterogeneous Task Assignment Algorithm is used
computational computer cloud resources on rent [1]. Many to reduce the energy cost of the mobile heterogeneous systems.
business enterprises requiring high performances are benefited In this paper, we employ Hypercube Algorithm and Mesh
from the cloud technology [2]. When a business client runs an Algorithm to perform an extensive evaluation of HPC enabled
application in a HPC cloud, the computation speed is fast as OpenStack cloud server under SDN infrastructure. The result
HPC possesses high processing power with low latency shows that the computational speed performance of the HPC
networks and non-blocking communication. There have been cloud is faster than the HPC less cloud server. The subsequent
various approaches proposed for employing HPC in the cloud sections of this paper are organized as follows: Section II
for application processing. But the concept of distributed HPC
describes the background, section III discusses related work,
includes multi-processor system, multi-core system and multi-
section IV includes design and implementation, section V
computer systems. Many business industries, scientists,
researchers prefer multi-computer system over multi-processor describes the experimental settings of our paper, section VI
system due to its features like low cost, flexibility and explains results and analysis of the experiments. Finally,
scalability. section VII concludes the paper with recommendations and
future work.
II. BACKGROUND III. RELATED WORK
In this paper, HPC in the cloud is obtained by using Mesh Many works have been done so far integrating SDN
algorithm and Cube algorithm for matrices multiplication. We technology with OpenStack as well as integrating HPC with
constructed our own public cloud by using OpenStack. We OpenStack independently. But there is no research regarding
integrate SDN controller with OpenStack. In this section, we performance evaluation of distributed HPC in OpenStack cloud
briefly describe the basics of HPC, OpenStack and SDN. over SDN infrastructure.
A. HPC Our main contributions are:
For testing HPC applications, HPC users require the high- (i) We implement Hypercube Algorithm and Mesh Algorithm
end compute, storage and network execution environment in a to obtain High Performance Computing in the self-built
dynamic manner. HPC OpenStack cloud gives the required OpenStack public Cloud.
high performance computing resources to scientific, research as
well as business community for testing, computing their (ii) We integrate HPC and OpenDaylight SDN Controller in
experiments as well as tasks. HPC has mainly been focused on the OpenStack Cloud that manage to obtain good performance
the acceleration of sequential applications in various domains in terms of speed computation.
in parallel architectures to achieve results in a time duration
that would not have been possible on a traditional single IV. DESIGN AND IMPLEMENTATION
processor [7]. While these parallel architectures with clusters of A. Architecture
singe processor computers, today excels in processor
architectures that has resulted in a wide spectrum of computing Figure 1 shows the overall architecture of Distributed High
elements. Performance Computing in OpenStack Cloud over SDN
Infrastructure. The architecture is divided into two major
B. OpenStack
parts: a cloud user access network and a scalable OpenStack
OpenStack is a cloud computing platform which manages cloud server. Users use the portal to submit new jobs and
compute, storage and networking resources in a cloud [8]. It obtain results. ODL SDN Controller is integrated with
provides centralized security policy management for a multi OpenStack as a network controller. The compute Node
cloud environment. It can create a private cloud or a public comprises 3 tenants in our architecture. Tenant network 1
cloud. Cloud users can create Virtual Machines(VMs) to run deploys no HPC whereas Tenant network 2 and Tenant
cloud computing applications. VM’s virtual interfaces network 3 deploy HPC.
communicate with host’s physical interfaces via layer 2
switching. Using switch device and virtual Bridge Linux, layer
2 switching is done. Linux bridge operates such as standard
Ethernet switch to packet forwarding with MAC learning.
OpenStack implements all basic services of the middleware
[9]: identity, image, compute, networking, block storage, object
storage, orchestrator and telemetry. OpenStack uses Neutron
as its default Network Controller. It is a pluggable and API-
driven network controller.
C. SDN
SDN architecture decouples the network control and
forwarding functions. A flow characterizes a set of packets
transferring from one network endpoints (or set of endpoints)
to another endpoint (or set of endpoints). The endpoints in
SDN is characterized by IP address, TCP/UDP port pairs,
VLAN endpoints, layer three tunnel endpoints and input ports.
The SDN controller defines the flows which actually denotes
the data itself. OpenFlow protocol in SDN allows to
periodically collect information from network devices
concerning their status along with commands involving how to
handle traffic. The information it gathers will be passed in an
abstract format to a network controller OS. An administrator
can program the OpenFlow controller to define how
applications traffic can be handled or managed. Several
vendors have their own OpenFlow controllers, such as Beacon
OpenDaylight, Ryu, NOX and Floodlight. The advantage of
SDN integration with OpenStack cloud are advanced
management, seamless convergence and optimal balancing
[10]. Fig 1. Architecture of Distributed HPC in OpenStack Cloud over SDN
B. Algorithms 2) Mesh Algorithm for HPC: Mesh algorithm uses n x n
1) Hypercube Algorithm for HPC: Each processor Pr of a processors arranged in a mesh configuration to multiply an n x
SIMD computer has three registers Xr, Yr, and Zr, also denoted n matrix X by an n x n matrix Y. Mesh rows are numbered 1,
X(i, j, k), Y(i, j, k), and Z(i, j, k) respectively. Initially, ...,n and mesh columns 1,....n. The running time of mesh
processor Ps, in position (0, j, k), 0 ≤ j n, 0 ≤ k < n, contains matrix multiplication algorithm is fast assuming that only
xjk and yjk in its registers Xs, and Ys respectively. The registers boundary processors are capable of handling input and output
of all other processors are initialized to zero. At the end of the operations. Matrices X and Y are fed into the boundary
computation, Z should contain zjk, where zjk=∑xji x bik. The processors in column 1 and row 1 respectively as shown in the
algorithm is designed to perform the n3 multiplications figure 3. The row i of matrix X lags one time unit behind row
involved in computing the n2 entries of C simultaneously [11]. i-1 for 2≤i≤n. Similarly, column j of matrix Y lags one time
Stage 1: The elements in matrices X and Y are unit behind column j-1 for 2≤j≤n. This confirms that ais meets
distributed to n3 processors. As a result, X(i, j, k ) = xji and bsj in processor P(i,j) at the right time [11]. At the end of the
Y(i,j, k ) = yik algorithm, element zij of the product matrix Z located in
Stage 2: The products Z(i, j, k ) = X(i, j, k ) x Y(i, j, k ) processor P(i,j). Initially cij is zero. Subsequently, when P(i,j)
are computed. receives inputs x and y, it
Stage 3: The sums ∑ Z(i, j, k ) are computed. a) multiplies them
b) adds the result to zij,
c) sends x to P(i,j+1) unless j=n, and
d) sends y to P(i+1,j) unless i=n
Fig. 2. Cube connected computer with sixteen processor
The algorithm is given as follows:
Step 1: for m = 3q-1 downto 2q do
for all r in {N,r, =0} do in parallel
(1.1) Xr(m)=X,
(1.2) Yr(m)=Yr
end for
end for.
Step 2: for m = q-1 downto 0 do
for all r in {N,rm =r2q+m} do in parallel
Xr(m) ←X,
Fig. 3. Two matrices multiplication using Mesh Algorithm
end for
end for. The algorithm of matrices multiplication using Mesh
Algorithm is given as follows:
Step 3: for m = 2q-1 downto q do
for all r in {N,rm =rq+m} do in parallel for i=1 to n do in parallel
Yr(m) ←Y, for j=1 to n do in parallel
end for (1) zij ← 0
end for. (2) while P(i,j) receives 2 inputs x and y do
(i) zij←zij + (x x y)
Step 4: for r = 1 to N do in parallel (ii) if i < n then send y to P(i+1,j)
Z,←X, x Y, end if
end for. (iii) if j < n then send x to P(i,j+1)
Step 5: for m = 2q to 3q-1 do end if
for r=1 to N do in parallel end while
Z,←Zr +Zr(m) end for
end for end for.
end for.
V. EXPERIMENTAL SETTINGS TABLE II. INSTANCES CREATION IN OPENSTACK
For the execution of the algorithms we use Intel Xeon Name Image Name IP Address Size
VM1 img1_ubuntu14 192.168.0.5, 202.70.95.68 m1.small
processor X5687 server systems having 12M Cache, 3.60
VM2 img1_ubuntu14 192.168.0.6, 202.70.95.69 m1.small
GHZ, 6.40 GT/s Intel QPI. Operating systems used are Redhat VM3 img1_ubuntu14 192.168.0.7, 202.70.95.70 m1.small
Linux Server 64-bit, version 7.1. To deploy Red Hat VM4 img1_ubuntu14 192.168.0.8, 202.70.95.71 m1.small
OpenStack cloud server, two Red Hat Enterprise Linux Server VM5 img1_ubuntu14 192.168.0.9, 202.70.95.72 m1.small
64-bit, version 7.1 machines are configured. One machine is VM6 img1_ubuntu14 192.168.0.10, 202.70.95.73 m1.small
used as a dedicated cloud controller node and the second VM7 img1_ubuntu14 192.168.0.11, 202.70.95.74 m1.small
VM8 img1_ubuntu14 192.168.0.12, 202.70.95.75 m1.small
machine is used as a Nova compute node. For OpenStack VM9 img1_ubuntu14 192.168.0.13, 202.70.95.76 m1.small
controller configuration , 64-bit x86 processor with support for VM10 img1_ubuntu14 192.168.0.14, 202.70.95.77 m1.small
the Intel 64 CPU extensions and Intel VT hardware VM11 img1_ubuntu14 192.168.0.15, 202.70.95.78 m1.small
virtualization support is used. In addition, 4 GB RAM and 1 VM12 img1_ubuntu14 192.168.0.16, 202.70.95.79 m1.small
TB disk space with 4 x 1 Gbps network interface cards (NIC) VM13 img1_ubuntu14 192.168.0.17, 202.70.95.80 m1.small
are used. For OpenStack compute node, we use 64-bit x86
processor with support for the Intel 64 extensions and the Intel VI. RESULT AND ANALYSIS
VT hardware virtualization extensions is enabled. 2 GB RAM
with 50 GB Disk space and 2 x 1 Gbps NIC cards are used for We evaluated the performance of distributed HPC in
compute node. We create 310 GB for cinder volume and OpenStack cloud over SDN infrastructure. We run a series of
experiments to calculate matrices multiplications in HPC
Packstack utility is used to deploy Red Hat OpenStack.
deployed OpenStack cloud in order to measure the cloud’s
Before instantiating any instances in the OpenStack cloud, performance under SDN infrastructure. Figure 4 and Figure 5
we create and upload an image “img1_ubuntu14” that depict speedup performance of HPC system using Hypercube
constitutes the operating system, configure a security group, algorithm and Mesh algorithm.
create an SSH key pair (to be able to connect to the instance)
and allocate floating IP addresses to access the instances from
the outside world. Then we create 1 private network named
“subpriv” with 192.168.0.0/24 addresses pool and 1 public
network named “subpub” with 202.70.95.64/26 addresses pool.
We create router named “router1”. We upload
“img1_ubuntu14.img” image and set this image to the
QCOW2-QEMU Emulator format. We use “kvm”
virtualization for 13 VMs instantiation. We create 3 tenants
network namely “Tenant1”, “Tenant2” and “Tenant3”. Tenant1
consists 1 VM, Tenant2 comprises 4 VMs and Tenant3
consists 8 VMs. We employ Mesh Algorithm and Hypercube
Algorithm in Tenant2 and Tenant3 to make them capable as
High Performance Computing system. We allocate “m1.small”
resource quota to each 13 VM instantiated. “m1.small” quota
refers 1 VCPU, 4096 MB RAM, 20 GB root disk, 1 GB
Ephemeral Disk and 512 MB swap disk. Table II shows 13 Fig. 4. Speedup Performance Using Hypercube Algorithm
VMs instantiation for our experiments.
In our system, we chose the OpenDaylight SDN controller
to manage the network of OpenStack Cloud. We use
OpenDaylight Lithium as a SDN Controller. All the OpenFlow
switches in the data network establish their TCP connections
with OpenDaylight on the controller node. Before integration
of OpenDaylight with OpenStack, we erase all VMs, networks,
routers and ports in the controller node of OpenStack. The
Neutron plugin in every node is removed. Then, Open
vSwitches are connected to OpenDaylight controller to manage
Open vSwitches [12]. Table I shows networking that we have
created for our experiments in the OpenStack public cloud.
TABLE I. NETWORKING IN OPENSTACK
Subnets Associated DHCP Agents Shared Status
subpub 202.70.95.64/26 0 No ACTIVE
subpriv 192.168.0.0/24 1 No ACTIVE Fig. 5. Speedup Performance Using Mesh Algorithm
We have Tenant2 and Tenant3 that refer HPC VII. CONCLUSION
implemented tenants where as Tenant1 refer HPC less tenant.
The paper demonstrates that the combination of distributed
The speedup performances achieved in HPC with the
HPC, OpenStack and SDN controller is pretty powerful. It
implementation of Hypercube algorithm and Mesh algorithm
solves time-consuming workflows in new and unique ways,
are evaluated taking the reference with respect to computation
achieving distributive high level computing performance with
time of matrix multiplication using Tenant1(HPC less system).
time savings in the process. The paper considers that the
The result shows that the speedup performance using
continued integration of HPC, OpenStack Cloud and SDN
Hypercube algorithm is faster than Mesh algorithm.
controller will cause significant transformations across several
For the multiplication of 2 matrices each having 100 x 100 small and medium sized business enterprises, bringing about
size or 150 x 150 size or 200 x 200 size or 250 x 250 size or new business computation in the cloud environment. Our
350*350 size or 500 x 500 size or 700*700 size or 1000 x 1000 future work will be oriented towards the implementation of
size or 2300 x 2300 size or 4500 x 4500 size or 6000*6000 machine learning in distributed HPC in OpenStack cloud over
size or 9000 x 9000 size or 10000*10000 size are considered SDN infrastructure. We hope our research will help OpenStack
for computation of matrices multiplication in order to measure Cloud to become standard platform for running HPC intensive
the speedup performance of 2 tenants namely Tenat2 and time-consuming applications.
Tenant3(employing HPC). As the matrix size increases, the
speedup performance of multiplication(computation speed) of ACKNOWLEDGMENT
HPC compared to HPC less system increases which is clearly We would like to thank Prof. Dr. Subarna Shakya for his
shown in the Figure 4 and Figure 5. This is because of the fact encouragement and guidance through out this research as well
that as the matrices size increase for multiplication, the number as Surendra Karmacharya of Nepal Telecom for his support.
of processes increases for the processor to compute the We would also like to remember our friends and family as they
multiplication result. For the single processor without HPC too have supported us to reach this goal.
system, the number of processes overloads the processor and
slows down the result whereas for the HPC system, the number REFERENCES
of processes involved in computation are decomposed and [1] S. Salaria, K. Brown, H. Jitsumoto and S. Matsuoka, “Evaluation of
distributed equally into all processors(computer) so that HPC-Big Data Applications Using Cloud Platforms,” in 17th IEEE/ACM
International Symposium on Cluster, Cloud and Grid Computing, 2017.
speedup performance is achieved compared to HPC less
[2] G. Mariani, A. Anghel, R. Jongerius and G. Dittmann, “Predicting Cloud
system. The decomposition of any matrices size is basically Performance for HPC Application: a User-oriented Approach,”, in 17th
done into 2 base size matrices namely 4 x 4 and 8 x 8. And for IEEE/ACM International Symposium on Cluster, Cloud and Grid
distribution of processes in each VMs of any tenant namely Computing, 2017.
Tenant2 and Tenant3 is managed by Mesh algorithm or [3] G. Kannan, S. Vivekanandan and S. Thamarai, “A Distributed Cloud
Resource Management Framework for High-Performance Computing
Hypercube algorithm. (HPC) Applications,” in 8th IEEE nternational Conference on Advanced
Figure 4 shows the performance of Hypercube algorithm Computing (ICoAC), 2016.
implemented in the proposed architecture. Figure 5 compares [4] D. Tomic, Z. Car and D. Ogrizovic, “Running HPC applications on
Mesh algorithm on the same resources using both Tenant2 and many million cores Cloud,” in MIPRO, 2017.
Tenant3. From Figure 4, we can see that Tenant3, having [5] F. Benchara, M. Youssfi, O. Bouattane and H. Ouajji, “A New Efficient
highest resource among two tenants, provides maximum Distributed Computing Middleware based on Cloud Micro-Services for
HPC,” in IEEE 2016.
performance for computation. Large matrix are decomposed
[6] K. Gai, M. Qiu and H. Zhao, “Energy-Aware Task Assignment for
and distributed over virtual machines available in each VM in Mobile Cyber-Enabled Applications in Heterogeneous Cloud
each tenant using block-oriented division. Communication time Computing,” in J. Parallel Distrib. Comput., 2017,
over distributed system has a great impact on total computation http://dx.doi.org/10.1016/j.jpdc.2017.08.001
time. Communication between virtual machines on our [7] W. Ahmed, “Evaluating High Performance Computing (HPC)
proposed system is implemented using SDN, providing low- Requirements of Devices on the Smart Grid for Increased
Cybersecurity,” in IEEE 2017.
latency communication which significantly reduces the
[8] E. Roloff, M. Diener, L. Gaspary and P. Navaux, "HPC Application
communication time increasing performance of the system. Performance and Cost Efficiency in the Cloud," in 25th Euromicro
The benefits of SDN can be clearly visualized on Hypercube International Conference on Parallel, Distributed and Network-Based
matrix multiplication algorithm (requiring more Processing, 2017.
communication time). SDN decreases the total computation [9] H. Castro, M. Villamizar, O. Garcés, J. Pérez, R. Caliz and P. Arteaga,
“Facilitating the Execution of HPC Workloads in Colombia through the
time for multiplication with better networking services, integration of a Private IaaS and a Scientific PaaS/SaaS Marketplace,” in
consequently increasing the efficiency of parallel 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid
algorithms(especially algorithms requiring more Computing
communication time). From the experiments and result [10] M. Jamal Salim, “An Analysis of SDN-OpenStack Integration,”, in PIC
analysis, we see that implementation of our own built HPC in S&T, Ukraine 2015.
the OpenStack public cloud increases the computational [11] Selim G. Akl, “The Design and Analysis of Parallel Algorithms,”
Prentice Hall, Englewood Cliffs, New Jersey 07632.
efficiency of applications requiring frequent communication
[12] Source:http://superuser.openstack.org/articles/open-daylight-integration-
between the processes for high computations. with-openstack-a-tutorial/. Accessed on: 2017-01-07.
View publication stats