Ebook What Is HPC
Ebook What Is HPC
2
Executive summary
This ebook is an introductory guide to high-performance computing. It
summarises the different use cases, workload and processing types that exist in
HPC. It gives an overview of HPC clusters and their architecture while examining
where they can be deployed - whether on-premise (in your own hardware) or in
the public cloud. It also highlights the many different components involved in
HPC clusters.
Overall, you will find this guide useful to understand the inner workings of
HPC clusters, their architecture, typical use cases and associated tooling for
HPC implementations. After reading this ebook, you should have a sufficient
understanding of the world of HPC and be equipped to evaluate what you need
to get started.
HPC vs supercomputing
These days, supercomputing has become a synonym for high-performance
computing. However, they are not exactly interchangeable: supercomputers
and supercomputing generally refer to the larger cluster deployments and
the computation that takes place there. HPC mainly refers to a computation
performed using extremely fast computers on clusters ranging from small-
scale HPC clusters to large supercomputers. Most often, HPC clusters and
supercomputers share the same architecture and are built out of commodity
servers.
For example, some mobile phones can reach a few gigaflops, whereas the CDC
6600, a supercomputer designed by Seymour Cray in the 1960s, was estimated to
deliver about three megaflops. At the time, supercomputers were more powerful
than anything else on the market and very expensive to build and develop. Their
architecture was far superior to the personal computers that were available. That
is why they were called supercomputers.
3
What are the main use cases for HPC?
HPC is used to solve some of the most advanced and toughest computational
problems we have today. These problems exist in all sectors, such as science,
engineering, or business. Some of the most popular use cases for HPC include:
These use cases are solved with numerical equations - such as those in
computational fluid dynamics (CFD). They analyse or process large data sets
like those in high-performance data analytics (HPDA), artificial intelligence and
machine learning.
Workloads for these different use cases can be classified into one or many
different types, depending on how they are executed or processed. Batch
processing, for instance, involves running a large number of similar jobs in
a sequence. Real-time processing involves processing data in real-time as it
arrives. Interactive processing involves running interactive applications such as
simulations or data visualisations.
Let’s explore some of these use cases in more detail, as they are closely related
to HPC.
4
AI (artificial Intelligence) and machine learning
AI (artificial intelligence) and machine learning are related fields of computer
science focusing on the development of computer systems that can learn, reason,
and make decisions. AI and machine learning (ML) involve the use of algorithms to
identify patterns and trends in data sets, and to make predictions and decisions
from the data. AI and ML are used in a variety of applications, including data
mining, natural language processing, autonomous vehicles, and more.
So how does HPC work? Let’s explore the components behind HPC cluster
architecture and common tools used to run HPC workloads. First, let’s define
what we mean by an HPC cluster.
5
HPC cluster architecture
Cluster management solutions for VDI with GPU or vGPU support include:
Organisations can also provision workstations remotely for HPC. Solutions that
enable a remote workstation experience include:
• Remote access software such as VNC, which provides access to the desktop
environment.
• Desktop environments such as Ubuntu Desktop running on a VM.
• Desktop workstations running in the cloud, such as Ubuntu in AWS workspaces.
6
Servers
A server is a computer or system that provides resources, data, services, or
programs to other computers, known as clients, over a network. Servers can
provide various functionalities, often called services, such as sharing data or
resources among multiple clients, or performing computations for a client.
Common examples of server types include web servers, application servers,
database servers, and file servers. In high-performance computing, servers are
used for two primary purposes: to compute mathematical models or process
data, and to serve data through file servers. Servers used for computation and
data processing are generally called compute nodes. Servers that serve data are
generally referred to as storage nodes.
Compute nodes
Compute notes are the processing component of an HPC cluster. They execute
the workload using local resources, like CPU, GPU, FPGA and other processing
units. These workloads also use other resources on the compute node for
processing, such as the memory, storage and network adapter. The workloads
use the available bandwidth of these underlying components. Depending on how
the workload uses those components, it can be limited by one or more of those
during execution. For example, some workloads that use a lot of memory might
be limited on memory bandwidth or capacity. Workloads that either use a lot of
data or generate a large amount of data during computation might be limited
in their processing speed due to network bandwidth or storage performance
constraints - if that data is written down to storage as part of the computation
of that workload. Some workloads might just need plenty of computational
resources and be limited by the processing ability of the cluster.
When creating and designing these clusters, it’s important to understand the
resource utilisation of the workload and design the cluster with that in mind. The
best way to understand workload resource usage is by monitoring the resources
used. This allows you to gain an understanding of the limitations.
Head nodes
Head nodes or access nodes act as an entry point into an HPC cluster. It’s where
users interact with the input and output of their workloads and get access to the
local storage systems available to that cluster. It’s also where they schedule their
workloads. The scheduler, in turn, executes processes on compute nodes.
Storage nodes
A storage node is a computer or server responsible for storing and providing
access to data over a network. Storage nodes are typically connected to other
storage nodes in a storage cluster and provide access to data stored on the
cluster. They are often connected to other storage or compute nodes via a high-
speed network, such as InfiniBand or Ethernet, providing access to data directly
or via a file system. Multiple protocols exist to provide storage access, from
traditional NFS share to other shared storage implementations such as Lustre or
BeeGFS.
7
Operating system
To operate the nodes, you need an operating system (OS). The OS is responsible for managing
the computer’s memory, processor, storage, and other components. It also provides an interface
between the user and the servers, allowing users to interact with the computer and execute
programs. Common operating systems used in HPC include Windows, macOS, and Linux.
Linux in HPC
The Linux operating system, probably one of the most recognised open-source projects, has
been both a driver for open-source software in HPC and been driven by HPC use cases. NASA was
an early user of Linux and Linux, in turn, was fundamental to the first Beowulf cluster. Beowulf
clusters were essentially clusters created using commodity servers and high-speed interconnects,
instead of more traditional mainframes or supercomputers. The first Beowulf cluster was
deployed at NASA, and went on to shape HPC as we know it today. It drove Linux adoption from
then onwards in government and expanded well outside that sector into others. Today, this type
of cluster is used by enterprises as well.
HPC has driven a lot of development efforts in Linux, all focused heavily on driving down latency
and increasing performance across the stack - from networking to storage.
Ubuntu is the Linux OS preferred by 66% of developers and it is ideal for HPC. It can be used for
workstations, to access HPC clusters, or installed on servers, giving the user a uniform experience
across both.
Cluster provisioning
Node homogeneity is important in HPC to ensure workload consistency. That’s
why it’s common to see HPC clusters provisioned with metal-as-a-service
MAAS
Metal as a Service or MAAS, is an open source project developed and maintained
by Canonical. MAAS was created with one purpose: API-centric, bare-metal
provisioning. MAAS automates all aspects of hardware provisioning, from
detecting a racked machine to deploying a running, custom-configured operating
system. It makes management of large server clusters, such as those in HPC,
easy through abstraction and automation. It was created to be easy to use, has
a comprehensive UI - unlike many other tools in this space - and is highly scalable
thanks to its disaggregated design. MAAS is split into a region controller which
manages the overall state of the cluster, anywhere from keeping information
on the overall hardware specification to maintaining information about which
servers have been provisioned and which servers are available. Moreover, it makes
all of this available to the user. MAAS also comes with a stateless rack controller
that handles PXE booting and Power Control. Multiple rack controllers can be
deployed, allowing for easy scale out regardless of the environment’s size. It’s
notable that MAAS can be deployed in a highly available configuration, giving it
the fault tolerance that comparable projects in the industry don’t have.
8
Moreover, it makes all of this available to the user. MAAS also comes with a
stateless rack controller that handles PXE booting and Power Control. Multiple
rack controllers can be deployed, allowing for easy scale out regardless of the
environment’s size. It’s notable that MAAS can be deployed in a highly available
configuration, giving it the fault tolerance that comparable projects in the
industry don’t have.
xCAT
Warewulf
Networks
As mentioned above, parallel HPC workloads heavily depend on inter-process
communication. When that communication takes place within a compute
node, it’s just passed from one process to another through the memory of that
computational node. But when a process communicates with a process on another
computational node, that communication needs to go through the network.
This inter-process communication might be quite frequent. If that’s the case, it’s
important that the network has low latency to prevent communication delays
between processes. After all, you don’t want to spend valuable computation
time on processes awaiting message deliveries. In cases where data sizes are
large, it’s important to deliver that data as fast as possible. That’s enabled by high
throughput networks. The faster the network can deliver data, the sooner any
processes can start working on the workload. Frequent communication and large
message and data sizes are regular features of HPC workloads. This has led to the
creation of specialised networking solutions that often deliver low latency and
high throughput to meet HPC-specific demands.
Networking solutions
As mentioned above, parallel HPC workloads heavily depend on inter-process
communication. When that communication takes place within a compute
node, it’s just passed from one process to another through the memory of that
computational node.
9
Ethernet
1. Physical
2. Data Link
3. Network
4. Transport
5. Session
6. Presentation
7. Application
Nvidia InfiniBand
10
Cornellis OmniPath
Storage
Storage solutions in the HPC space are most commonly file-based, with POSIX
support. These file-based solutions can be generally abstracted into two
categories, general-purpose and parallel storage solutions. Other solutions,
such as object storage, or Blob (Binary Large Objects) storage, as it’s sometimes
referred to in HPC, can be utilised by some workloads directly, but not all
workloads have that capability.
General-purpose storage
There are two main uses for general-purpose storage in an HPC cluster. One
would be for the storage of available application binaries and their libraries.
That’s because it’s important that all binaries and libraries are consistent across
the cluster when running an application, making central storage convenient.
The other would be for the user’s home directories and other user data, as it’s
important for the user to have consistent access to their data throughout the
HPC cluster. It’s common to use an NFS server for this purpose, but other storage
protocols do exist that enable POSIX-based file access.
11
Object Storage
Object storage solutions are often used for storage in HPC clusters, either to
archive past computational results or other related data. Alternatively, they may
be used directly by workloads that support native object storage APIs.
Storage solutions
There are various storage solutions available, both proprietary and open source.
The ones that are most commonly used in HPC are detailed below.
Ceph
Lustre
Lustre is a parallel distributed file system used for large-scale cluster computing.
The word lustre is a blend of the words Linux and Cluster. It has consistently
ranked high on the IO500, a bi-yearly benchmark that compares storage solution
performance as it relates to high-performance computing use cases, and has seen
significant use throughout the TOP500 list, a bi-yearly benchmark publication
focused on overall cluster performance. Lustre was originally created as a
research project by Peter J. Braam, who worked at Carnegie Mellon University,
and went on to found his own company (Cluster File Systems) to work on Lustre.
Like Ceph, Lustre was developed under the Advanced Simulation and Computing
Program (ASC) and its PathForward project, which received its funding through
the US Department of Energy (DoE), Hewlett-Packard and Intel. Sun Microsystems
eventually acquired Cluster File Systems, which was acquired shortly after by
Oracle.
12
Oracle announced soon after the Sun acquisition that it would cease the
development of Lustre. Many of the original developers of Lustre had left Oracle
by that point and were interested in further maintaining and building Lustre
but this time under an open community model. A variety of organisations were
formed to do just that, including the Open Scalable File System (OpenSFS),
EUROPEAN Open File Systems (EOFS) and others. To join this effort by
OpenSFS and EOFS a startup called Whamcloud was founded by several of the
original developers. OpenSFS funded a lot of the work done by Whamcloud.
This significantly furthered the development of Lustre, which continued after
Whamcloud was eventually acquired by Intel. Through restructuring at Intel,
the development department focused on Lustre was eventually spun out to a
company called DDN.
BeeGFS
A parallel file system developed for HPC, BeeGFS was originally developed at
the Fraunhofer Centre for High-Performance Computing by a team around
Sven Breuner. He became the CEO of ThinkParQ, a spin-off company created to
maintain and commercialise professional offerings around BeeGFS. It’s used by
quite a few European institutions whose clusters reside in the TOP500.
DAOS
GPFS
IBM General Parallel File System (also known as IBM Spectrum Scale) is a high-
performance clustered file system used by many commercial HPC cluster
deployments as an accelerated storage solution. It can also be found in multiple
supercomputing clusters on the TOP 500 list. GPFS started as the Tiger Shark
file system, a research project at IBM’s Almaden Research Center in 1993, initially
designed for throughput multimedia applications. This throughput-focused
design proved to be an excellent fit for scientific computing.
VAST Data
VAST Data is a relatively new player in the storage market that offers storage
appliances that leverage some of the latest technologies. For example, they use
Intel Optane / 3D XPoint NVMe SSDs and 3D XPoint-based non-volatile memory
as part of their data architecture. These act as an accelerated data tier in front of
more cost-effective higher, density NAND Flash-based SSDs. VAST Data can be
connected through either NVMe-oF using Ethernet or InfiniBand and supports
RDMA for NFS version 3.
13
Weka
PanNFS
Schedulers
In HPC, a scheduler queues up workloads against the resources of the cluster
in order to orchestrate its use. Schedulers act as the brain for the clusters.
They receive any requests for workloads that need to be scheduled from users
of the cluster, keep track of them and then run those workloads as needed
when resources are available. Schedulers are aware of any resource availability
and utilisation and do their best to consider any locality that might affect
performance. Their main purpose is to schedule compute jobs based on optimal
workload distribution. The schedule is often based on organisational needs.
The scheduler keeps track of the workloads and sends workloads over to another
integral component: an application process that runs on the compute nodes to
execute that workload.
14
Scheduling solutions
SLURM workload manager
Open OnDemand
Not a scheduler per say, but deserves an honourable mention with SLURM. Open
OnDemand is a user interface for SLURM that eases the deployment of workloads
via a simple web interface. It was created by the Ohio Supercomputing Centre
with a grant from the National Science Foundation.
Grid Engine
A batch scheduler that has had a complicated history, Grid Engine has been
known for being open source and also closed source. It started as a closed source
application released by Gridware but after their acquisition by Sun, it became Sun
Grid Engine (SGE). It was then open sourced and maintained until an acquisition
by Oracle took place, at which point they stopped releasing the source and it
was renamed Oracle Grid Engine. Forks of the last open source version soon
appeared. One called Son of Grid Engine, which was maintained by the University
of Liverpool and no longer is (for the most part). Another called Grid Community
Toolkit is also available but not really under active maintenance. A company called
Univa started another closed source fork after hiring many of the main engineers
of the Sun Grid Engine team. Univa Grid Engine is currently the only actively
maintained version of Grid Engine. It is closed sourced and was recently acquired
by Altair. The Grid Community Toolkit Grid Engine manager is available on Ubuntu
under the Universe repositories.
OpenPBS
Portable Batch System (PBS) was originally developed for NASA, under a
contract by MRJ. It was made open source in 1998 and is actively developed.
Altair now owns PBS, and releases an open-source version called OpenPBS.
Another fork exists that used to be maintained as open source but has since gone
closed source. It’s called Terascale Open-source Resource and QUEue Manager
(TORQUE) and it was forked and maintained by Adaptive Computing. PBS is
currently not available as a package on Ubuntu.
HTCondor
15
Kubernetes
What is MPI?
MPI is a communication protocol and a standard used to enable portable
message passing from the memory of one system to another on parallel
computers. Message-passing allows computational workloads to be run across
compute nodes connected via a high-speed networking link. This was vital to the
development of HPC, as it allowed an ever greater number of organisations to
solve their computational problems at a lower cost and at a greater scale than
ever before. Suddenly, they were no longer limited to the computational ability of
a single system.
MPI solutions
OpenMP
16
OpenMPI
MPICH
MVAPICH
Originally based on MPICH, MVAPICH is freely available and open source. The
implementation is being led by Ohio State University. Its goals are to “deliver
the best performance, scalability and fault tolerance for high-end computing
systems and servers’’ that use high-performance interconnects. Its development
is very active, and multiple versions are available that provide optimal hardware
compatibility and the best possible performance for the underlying fabric.
Notable developments include its support for DPU offloading, where MVAPICH
takes advantage of underlying SmartNICs to offload MPI processes. SmartNICs
and Data Processing Units (DPU) are an advanced form of network cards, which
have the traditional components of a computer, such as a CPU. This allows them
to act as a computer, have their own operating system and even process data or
networking traffic that goes through them. This allows them to process some
of the host’s workload functions, for example. With MVAPICH, this could, for
example, be handling the MPI communication allowing the host’s processors to
focus entirely on the workload.
Workloads
Many HPC workloads come from in-house or open-source development, driven by
a strong community effort. Often these workloads come from a strong research
background, initiated through University work or national interests, often serving
multiple institutes or countries. When it comes to open source there are plenty
of workloads covering all sorts of scenarios - anything from weather research to
physics.
17
Workload solutions
BLAST
OpenFOAM
ParaView
WRF
18
Fire Dynamics Simulator and Smokeview
Containers
HPC environments often depend on complex dependencies to run workloads.
A lot of effort has been put into the development of module-based systems
such as Lmod, which allow users to load applications or dependencies, like
libraries, outside of normal system paths. This is often due to a need to compile
applications against a certain set of libraries which depend on specific numerical
or vendor versions. To avoid the complex set of dependencies, organisations can
invest in containers. This effectively allows the user to bundle up an application
with all its dependencies into a single executable application container.
Container solutions
LXD
Docker
The predominant container runtime for cloud-native applications, has seen some
usage in HPC environments. Its adoption has been limited in true multi-user
systems, such as large cluster environments, as Docker fundamentally requires
privileged access. Another downside often mentioned is the overall size of Docker
images, which is attributed to application dependencies, including MPI libraries.
This often creates large application containers that might easily duplicate
components of other application containers. However, when done right, Docker
can be quite effective for dependency management when it comes to developing
and enabling a specific hardware stack. It allows the packaging of applications to
depend on a unified stack. This has some strengths. For example, it avoids storing
multiple dependencies by having dependent container images. You can see this to
great effect in Nvidia NGC containers.
19
Singularity
Charliecloud
Auxiliary services
Many software components can be used to improve the usage of HPC
clusters. These include anything from identity management to monitoring and
observability software.
Identity management
Identity access managers are quite common in HPC clusters. They serve as the
single source of truth for identity and access management. Unified access makes
it easy for users to access any node in the cluster. This is often a prerequisite for
resource scheduling. For example, if you want to run a parallel job across multiple
nodes in the cluster via a batch scheduler, you need consistent access to compute
nodes and storage resources. An identity management solution can help you
ensure consistency. Without it, as an administrator, you would need to ensure
both user creation, identity and storage configurations, all lined up across the
cluster through individually configured nodes.
LDAP
Active Directory
20
FreeIPA
Monitoring and observability tools provide deeper insight into workload resource
utilisation and are thus key to solving any performance issues or detecting issues
with overall cluster health. Metrics that are often observed in HPC clusters
include CPU and memory utilisation, network and memory bandwidth, and
scheduler metrics such as workload throughput - which measure the amount of
jobs being completed in a given period. Job wait times and job completion time
metrics, as well as scheduler queue utilisation metrics are also key.
These changes have made the modern monitoring stack more relevant for HPC
cluster monitoring. Modern solutions like Prometheus and Grafana are becoming
more visible in these clusters.
Observability solutions
Prometheus
Grafana
21
Grafana Loki
The COS stack combines Prometheus, Grafana and Loki into a single deployable
solution providing an all-round monitoring solution for cluster monitoring which
gives a comprehensive overview of metrics.
Now that we have covered the different components of HPC clusters, you might
be wondering where to run them: on public or private clouds?
Many public cloud providers offer specialised resources with deep foundations
in the HPC space that are available for consumption to organisations of all sizes.
Cloud computing has made HPC possible for organisations that might require
bursting or scaling beyond what is reasonable with dedicated clusters. It’s now
also possible to run small experimental clusters aimed at those getting started
with HPC, who may not have the capacity to maintain the infrastructure required
for a private cluster. Resources for experimentation or testing, such as GPU, FPGA
or other architectures that might be in the beginning phase of adoption are also
available. Let’s explore what different public cloud vendors offer in the area of
HPC.
AWS has been one of the key players when it comes to driving innovation in
providing public cloud services for HPC. Their implementation of the AWS Nitro
System was key for them to eliminate virtualisation overhead and enable direct
access to underlying host hardware. This drove down latency and increased
performance, vital to running HPC clusters and workloads in a public cloud. In
order to be able to deliver on the demands of HPC workloads when it comes
to inter-node communication, they developed the Elastic Fabric Adaptor which
was key to reducing latency and increasing the performance for workloads that
communicate across nodes and require a high-performance interconnect.
22
To cover the storage needs of HPC users, Amazon added a specialised storage
offering based on Lustre, called Amazon FSx for Lustre. Alongside that, they have
scheduling solutions such as AWS ParallelCluster and AWS Batch.
Azure
Azure is a key player when it comes to driving HPC in the public cloud and has
provided strong instance types that use traditional HPC technologies such
as Infiniband, which provides RDMA functionality for optimal latency and
performance. They also have instance types that cater to those looking to reduce
the number of cores exposed to the workload catering to workloads primarily
limited by memory bandwidth rather than available cores. They even have an
offering that delivers supercomputers as a service, their Cray solution. Along with
that, they offer HPC-focused storage with Cray ClusterStor.
Google Cloud Platform offers pre-configured HPC VMs. Their offerings also
consist of automation and scripting, making it easy to generate Terraform-based
scripts that handle the provisioning of a Google Cloud-based HPC environment.
Google makes it easy to spin up an environment that fits the user’s needs.
They also have some documentation that allows users to take steps similar to
what Terraform-based infrastructure automation offers, with clear guides for
anything on MPI workloads and HPC images. They give users clear and practical
information on how to get the most out of their usage of the cloud for HPC
workloads.
Oracle was an early player when it came to the enablement of HPC in public
clouds. They take a bare metal approach to HPC in the public cloud, offering
instance types with ultra-low latency RDMA networking. The resulting solution is
close to what one might expect from a dedicated private HPC cluster.
23
Dedicated private HPC clusters
Private clusters are a solid option in HPC for those looking to optimise on cost,
control, and even particular data ownership or security requirements. There
are solutions that give users cloud-like management capabilities for local on-
premise resources. The main challenge with private HPC clusters is the high
upfront investment and required expertise. This can be mitigated by working
with partners such as Canonical who give you access to expert knowledge and
solutions that make adoption more feasible. The foundations for such clusters
rely on cluster provisioning solutions such as MAAS, which we covered in the
cluster provisioning section above.
Hybrid HPC
Hybrid usage of private and public cloud-based resources has been very
popular in the HPC space. Hybrid clouds give users the best of both worlds:
the cost optimisation and control offered by on-premise servers, along with
the extreme scalability of public cloud clusters. In a way, hybrid clouds deliver
a complementary solution where the negatives of one get mitigated by the
positives of the other. The main additional challenge coming from such a setup
might be increased complexity, but overall it has the possibility to bring greater
overall resiliency. With solutions for both public and private clouds, Canonical can
help you simplify the increased complexity of the setup.
24
Take your next steps in HPC with Canonical
Canonical can help you take the next steps in your HPC journey. Our solutions can
help you meet your HPC needs from the operating system layer to infrastructure
automation and more, across clouds and on-premise. Ubuntu is the ultimate
Linux distribution for high-performance computing. Some of the benefits Ubuntu
provides include:
• Recent kernel
• Extensive package repositories
• 2-year fixed release cadence for LTS releases with 5-year support
• Maintenance and bug fixes extendable to 10 years with Ubuntu Pro
Perfect for long-running environments. With Ubuntu Pro, you can make sure your
environment is supported throughout its lifetime.
For on-premise deployments, you can level up your server provisioning process
using MAAS, trusted by a number of organisations that depend on on-premise
HPC clusters. Its highly available architecture makes MAAS fault-tolerant and
makes sure it can be deployed at scale. No matter the size of your cluster, you can
trust MAAS to provide provisioning capabilities and the ultimate cloud experience
in bare metal cluster management, delivering the ultimate performance and
flexibility.
Juju, our solution for infrastructure automation, can help you get a SLURM-
based cluster up and running and ready for users, and thanks to Juju your day 2
operations are taken care of. Juju can be used across public cloud endpoints and
can be used for on-premise deployments with MAAS. To take advantage of cloud-
native deployments, consider Charmed Kubernetes from Canonical.
You can use a combination of solutions for a proper hybrid cloud strategy, and run
your workloads depending on your needs. Whatever your requirements are, and
no matter the computation size, Canonical has the solutions for you.
© Canonical Limited 2023. Ubuntu, Kubuntu, Canonical and their associated logos are the registered trademarks of Canonical Ltd. All
other trademarks are the properties of their respective owners. Any information referred to in this document may change without
Canonical
notice and Canonical will not be held responsible for any such changes.
Canonical Limited, Registered in Isle of Man, Company number 110334C, Registered Office: 2nd Floor, Clarendon House, Victoria
Street, Douglas IM1 2LN, Isle of Man, VAT Registration: GB 003 2322 47