0% found this document useful (0 votes)
18 views5 pages

Performance Analysis of Parallel Programs in HPC in Cloud

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

Performance Analysis of Parallel Programs in HPC in Cloud

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2018 IEEE International Student’s Conference on Electrical, Electronics and Computer Science

Performance Analysis of Parallel Programs


in HPC in Cloud
Mayrin George, Neenu Mary Margret and Neenu Nelson
Department of Computer Science and Engineering, Rajagiri School of Engineering and Technology
Rajagiri Valley, Kakkanad, Kerala, India - 682039
Affiliated to MG University
(Email: [email protected])

Abstract - From this paper an understanding of how applications processors, and current day’s processors typically range
can take advantage of modern parallel workable architectures to between two to four cores. A range of four nodes, or sixteen
reduce the computational time using the wide array of models cores could be present in single cluster. In a typical cluster
existing up to date is obtained. The performance exhibited by a environment, the cluster range dominates between 16 to 64
single device is analyzed against parallel working architectures
nodes, or from 64 to 256 cores.
based on modular division of work. A private cloud has been
used to get the results. A minimum of two computers are
required for cluster formation. The execution speed is analyzed Linux operating system currently dominates HPC installations.
between parallel run devices against a single device run This is mainly due to HPC’s history in the dominant fields
algorithm. One of the major point in parallel programming is the such as Unix, high end computing and large scale machines.
reconfiguration of the existing applications to work on parallel The operating system decision is driven by the kind of
systems bringing out faster work results and increased applications that need to be run on high performance
efficiency. MPICH2 (message passing interface software) is used computer.
which is a standardized and portable message passing system.
The MPI language helps to work on a wide variety of parallel
Cloud technology presents significant opportunities for
computing architectures. The standard defines the syntax and
semantics of a core of library routines useful to a wide range of implementing high performance computing. The HPC
users writing portable message passing programs in C, C++, and managers can use the assets of cloud environment in
Fortran. For doing analysis Score-P has been used which gives remodeling the traditional HPC environment. By integrating
the necessary information on the trace buffer, number of visits to HPC Cloud to the classical HPC system, it helps to facilitate
each function, time taken by each function, visit/time (um) and so better outcomes in system utilization, the tasks at hand could
on of the parallel program run. A graphical analysis is done for be better organized and HPC can be made deliverable to a
the work performed in physical cluster, cloud cluster and HPC larger community.
cluster.
Keywords - HPC, MPI, Cluster, Score-p. Opportunities with HPC Cloud include:

● Wider range for support functions and employment


I. INTRODUCTION needs with computerized workload-optimized node OS
provisioning
High Performance Computing is referred as the
● Uncomplicated automated access for wider community
practice of combining the computational potential in a method
of people, decreasing management and coaching fees
to deliver greater performance compared to that of a standard
● By eliminating the necessity of owning HPC systems by
desktop computer or workstation in order to solve larger
extending HPC resources to wider communities such as
problems. Large problems could include divisions in science,
cloud and availing the facility of permitting pay as you
engineering, or business.
use, reporting is implemented on basis of actual resource
usage by user, group, project, or account
In high performance computing, several interconnected
● The efficiency of cloud can be elevated without change
individual nodes work jointly to resolve the specific problem
in cost and interruption of ripping and replacing the
larger than any single computer can easily solve. The
existing structure
individual nodes communicate to one another to work
meaningfully together. Clustering can be done by using a
In the 21st century, cloud provides 10-Gigabit Ethernet, and
variety of methods. This paper has implemented MPI cluster.
supercomputers commonly use InfiniBand switch and
proprietary interconnects. The thin VMs, containers, so on are
The high performance computers contain clusters of
the present optimized virtualization models that help in
computers. Individual computers used in small commonly
decreasing virtualization costs, their downside include no
based clusters, core size ranges between one and four
noise free environment.

978-1-5386-2663-4/18/$31.00 ©2018 IEEE

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:03:20 UTC from IEEE Xplore. Restrictions apply.
Other hurdles include: Virtual Machine (KVM) as the Virtual machine monitor. The
● the costing by pricing model, there is a gap between the compute node also monitors the Network plug-in and also
classical supercomputing consulting method of grants plays the role of an agent that links resident networks to
and quotas and the pay as you use models cloud-based instances and provide firewalling services. More than one
services compute node is possible, here three compute nodes have been
● the growth of HPC into the cloud has been hampered in used.
some ways. Since it was said to be much cheaper to run
big computing jobs on dedicated equipment, but The volume limit of virtual machines that are currently
customers found that on regular, predictable workloads, supported in the private cloud include 20. Compute is a six-
the cloud is still at least twice as expensive as in-house core processor running at 3200Mhz and with 8GB RAM.
solutions. Compute1 is an octa-core processor running at 2128Mhz and
● the submission model, which is evolving from job with 16GB RAM and Compute2 is a quad-core processor
queuing and reservations toward VM deployment running at 1600Mhz and with 8GB RAM. The Controller
● security, regulatory compliance, and various divisions node has 8GB RAM.
such as performance, availability, business continuity,
agreements made at the service level, and so on. For creating the cluster, the /etc/hostname file contains the
unique name for the instance. Master node can be named as
‘master’ and slave instances can be named as ‘slave1’,
II. PROPOSED IDEA
‘slave2’,and so on. The /etc/hosts file contains the ip
The performance of the physical cluster is evaluated against addresses of the master node and the slave nodes contained in
that of scaled up system such as cloud cluster whereby the setup. NFS synchronization allows the creation of a folder
importance is given to the MPI programs. The paper mainly on the master node and access to the slave nodes. The
looks into the performance of the sequential program and its ‘/mirror’ folder created for this purpose contains all the
equivalent parallel code in the cloud cluster and physical common data between master and slave nodes. To distribute
cluster. One of the major point in parallel programming is the the contents within the folder placed in the master node, the
reconfiguration of the existing applications to work on parallel /etc/exports file of the master node is appended with the line
systems bringing out faster work results and increased /mirror *(rw, sync). By mounting, the client nodes can access
efficiency. the ‘/mirror’ file. In order to mount it on every boot the fstab
is edited to contain : ‘compute:/mirror /mirror nfs’.
Cluster implementation is used for communication between
nodes to do work. A single master node is combined with A common user is created called ‘mpi user’ and ssh protocol
many other nodes with a minimum limitation of two systems is used for communication. The user is defined with same
in total. For this paper, a two- node cluster has been name and user id for all the nodes in the home directory
implemented using MPI. “/mirror”. A file called "machinefile" in mpi user's home
directory is created with node names followed by a colon and
Analysis of a particular program with a given set of inputs is number of processes to spawn by that specified node.
done using the tool Score-p. Information regarding the time
taken, time per visits, the size requirements, buffer needed to
avoid dump are one among the few information that can be IV. RESULTS
obtained. The sequential and parallel programs were run on Intel
Pentium (two logical threads) and i7 processor (four logical
III. EXPERIMENTAL SETUP threads) systems and the corresponding time taken for the
execution is found out.
A Minimal Three Node Architecture has been used for cloud
using OpenStack, containing a controller, network and three
compute nodes. The Architecture used is Neutron Networking.
The Controller node functions include running the Identity and
Image Service, look into the execution of the Compute and
Network, Networking plug-in, and the dashboard. Supporting
services such as a SQL database, message queue, and Network
Time Protocol (NTP) are also included. The Network node
runs the Networking plug-in and provide switching, routing,
NAT, and DHCP services. The external Internet connectivity
for resident virtual machine instances are also taken care of by
Network node. The Compute node operates on resident virtual
machines or instances. By default, Compute uses Kernel-based

SCEECS 2018

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:03:20 UTC from IEEE Xplore. Restrictions apply.
TABLE I
EXECUTION TIME FOR MATRIX ADDITION IN PHYSICAL CLUSTER

Data set Sequential code’s Parallel code’s


(N X N) execution time execution time

10X10 0.000375 0.001837

25X25 0.003097 0.002491

50X50 0.011868 0.003976


Fig.1 Sequential (blue) VS Parallel (green) program run in Physical
cluster.
100X100 0.026009 0.015318

250X250 0.171731 0.123751

350X350 0.284236 0.258182

500X500 0.576776 0.146868

590X590 0.788343 0.211460

TABLE II
EXECUTION TIME FOR MATRIX ADDITION IN CLOUD CLUSTER
Fig.2 Sequential (red) VS Parallel (black) program run in Cloud
cluster.
Data set Sequential code’s Parallel code’s
(N X N) execution time execution time

10x10 0.415095 0.008953

50x50 5.551466 0.016487

100x100 22.598632 0.018657

200x200 86.647588 0.045899

Fig.3 Parallel program run in physical cluster (green) VS cloud


350x350 290.279719 0.120107 cluster (black).

The data presented in Table 1 represents the execution time


taken by the sequential and parallel programs for Matrix
Addition algorithm in a physical cluster and that in Table 2
represents the execution time taken when run in a cloud
cluster. The graphs for the same are shown below.

SCEECS 2018

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:03:20 UTC from IEEE Xplore. Restrictions apply.
TABLE III
SCOREP ANALYSIS FOR PHYSICAL CLUSTER 2551.29 42521466.29 ALL
300 X 300
1914.84 34193518.53 MPI
DATA TIME(S) TIME/VISITS[US] REGION
636.45 159112734.90 COM
0.02 414.77 ALL

50 X 50 0.02 382.82 MPI


V. DISCUSSION
0.00 862.06 COM
When the sequential program is run in the MPI cluster, initial
0.06 1056.47 ALL execution time is less for the small input data. As the size of
input increases, the execution time increases as evident in the
100 X100 0.05 913.60 MPI tables given above. When the parallelized program is run in
the cluster, the master passes the necessary contents to the
0.01 3056.61 COM slaves for execution. Here when the input becomes large,
execution time reduces.
6.73 112144.59 ALL
250 X 250
5.05 90190.77 MPI
The MPI code(parallel code) for matrix addition contains a
section for parallelization where the rows of each matrix are
1.68 419498.08 COM passed to different processors. The parting of the work
between the processors is defined in the machine file. The
33.17 552884.26 ALL different processors add the two matrix rows and pass back the
result to the master node and thus the execution time taken is
500 X 500 24.89 444375.08 MPI comparatively less.

8.29 2072012.87 COM When the matrix addition program is run in the physical
cluster, the parallel code takes less time to execute than its
40.66 677658.04 ALL sequential code when its input size is 25 x 25 or more. When
the same matrix addition program is run in the cloud cluster,
550 X 550 30.50 544625.69 MPI
the parallel code always seems to be executing faster than its
10.16 2540110.97 COM
sequential code. From the Score-p analysis, an increase in time
and time per visits is observed for both the cluster setups but
the physical cluster value when compared to the cloud cluster
values, a difference of 200 times is observed in time per visits
TABLE IV and 6 times difference in time. From the data, the large
SCOREP ANALYSIS FOR CLOUD CLUSTER difference in cloud and physical clusters time and time per
visits values it can be inferred that the execution in the cloud
DATA TIME[S] TIME/VISITS[US] REGION
cluster is comparatively slower than the physical cluster, this
is because time is spent to send the data from the current
6.07 101093.20 ALL working node to the network and after computation from the
50 x 50 network the data has to be transmitted back to the nodes thus
5.49 97958.65 MPI there is a lag in time when compared to physical cluster. Also,
the cloud used for this purpose has limited memory size of
0.58 144976.88 COM 32GB in total and the switch used is a normal switch resulting
in slower working. For faster implementation, an InfiniBand
23.12 385335.18 ALL switch could be used. Ubuntu 14.04 OS is used which is a
user-friendly OS but not apt for sophisticated work nature.
100 X100 18.41 328751.43 MPI
Thus, when the data to be processed is large in number,
4.71 1177507.66 COM
parallel approach can help in great reduction of time due to
1004.89 16748098.56 ALL increased speed. A similar setup created using Amazon cloud
200 X 200 could result in smaller values in cloud environment.
754.70 13476826.73 MPI

250.18 62545904.13 COM

SCEECS 2018

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:03:20 UTC from IEEE Xplore. Restrictions apply.
VI. FUTURE OF HPC The paper deals with a personal cloud setup, it has its
limitations and higher efficiency can be achieved in purchased
cloud environments. An extrapolation can be done based on
Now a days, HPC capabilities are more and it serves a crucial
the above analysis on how performance of HPC in cloud
role in variety of sectors.
cluster and physical cluster can be. Variations based on
systems efficiency is also to be expected in the results.
● Hybrid HPC solutions are more within the reach of
the community compared to the past. HPC resources
need to be rented through a firm HPC environment.
Since HPC technologies are becoming more REFERENCES
widespread within commercial enterprise data [1] Niᴄhᴏlas A. Chaimov, “Performance Analysis of Many-Task Runtimes,”
centers. Organizations can now scale HPC resources January 27, 2016.
as needed. Technology teams can now make a [2] Sanjay Kumar Sharma and Dr. Kusum Gupta,”Performance Analysis of
balance between cloud-based and on premise Parallel Algorithms on Multi-core System using OpenMP “, October
2012.
solutions. [3] Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane
Canon, Shreyas Cholia, John Shalf Harvey J. Wasserman, and Nicholas
● Major sector adapts HPC. Industries need for higher J. Wright,“Performance Analysis of High Performance Computing
processing speeds brings either total success or total Applications on the Amazon Web Services Cloud”, Cloud Computing
Technology and Science (CloudCom), 2010 IEEE Second International
failure issue in the ever-evolving customer service Conference.
paradigm. Increase in demand from organizations for [4] MPI TUTORIALS:-http://mpitutorial.com/tutorials/
evolving businesses and emerging markets will need
[5] MPI CLUSTER:-https://help.ubuntu.com/community/MpichCluster
to upgrade their infrastructures too. It will be
[6] SCOREP:-https://silc.zih.tu-dresden.de/scorep-current/pdf/scorep.pdf
beneficial for providers and consultants in the HPC
space.

● Market diversification will be the new standard.


Interoperability will become a major challenge and
consideration for companies. With market
diversification the hurdles faced in HPC regarding
adoption of the applications to run on architectures
that are evolving very quickly from many
manufacturers is quite challenging. Constraints
include resources and money to develop HPC
capabilities within IT organizations, but the aspect of
HPC moving to the cloud technology trend can lift it.

The HPC management tools should be able to provide a


simplified HPC management solution to various complex
environments, provide IT administrators more control and
visibility, and allow ultimate consumers to handle themselves.

VII. CONCLUSION
From the above analysis it is understood that parallel
algorithms can reduce the time taken compared to a sequential
approach. This is an expected result that can be predicted
beforehand. The comparison between the physical cluster and
the cloud cluster shows that more time is taken in a cloud
environment. This is due to the time taken to pass the data
onto the network of the cloud environment from the user’s
side, process the result and send it back. Although cloud takes
more time it has several features that pose to be more
advantageous than physical cluster such as the availability
factor, anywhere and anytime, reliable nature and pay as you
use model.

SCEECS 2018

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on October 03,2024 at 08:03:20 UTC from IEEE Xplore. Restrictions apply.

You might also like