Intro HPC Linux Gent
Intro HPC Linux Gent
Version 20171010.01
For Linux Users
Author:
Geert Borstlap (UAntwerpen)
Co-authors:
Kenneth Hoste, Toon Willems, Jens Timmerman (UGent)
Geert-Jan Bex (UHasselt and KU Leuven)
Mag Selwa (KU Leuven)
Stefan Becuwe, Franky Backeljauw, Bert Tijskens, Kurt Lust (UAntwerpen)
Balzs Hajgat (VUB)
Acknowledgement: VSCentrum.be
Audience:
This HPC Tutorial is designed for researchers at the UGent and affiliated institutes who are
in need of computational power (computer resources) and wish to explore and use the High
Performance Computing (HPC) core facilities of the Flemish Supercomputing Centre (VSC) to
execute their computationally intensive tasks.
The audience may be completely unaware of the HPC concepts but must have some basic un-
derstanding of computers and computer programming.
Contents:
This Beginners Part of this tutorial gives answers to the typical questions that a new HPC
user has. The aim is to learn how to make use of the HPC.
Beginners Part
Questions chapter title
What is a HPC exactly? 1 Introduction to HPC
Can it solve my computational needs?
How to get an account? 2 Getting an HPC Account
How do I connect to the HPC and trans- 3 Connecting to the HPC
fer my files and programs?
How to start background jobs? 4 Running batch jobs
How to start jobs with user interaction? 5 Running interactive jobs
Where do the input and output go? 6 Running jobs with input/output data
Where to collect my results?
Can I speed up my program by explor- 7 Multi core jobs/Parallel Computing
ing parallel programming techniques?
Can I start many jobs at once? 10 Multi-job submission
What are the rules and priorities of 9 HPC Policies
jobs?
The Advanced Part focuses on in-depth issues. The aim is to assist the end-users in running
their own software on the HPC.
Advanced Part
Questions chapter title
Can I compile my software on the HPC? 11 Compiling and testing your software on
the HPC
What are the optimal Job Specifica- 8 Fine-tuning Job Specifications
tions?
Do you have more examples for me? 12 Program examples
Any more advice? 13 Best Practices
The Annexes contains some useful reference guides.
Annex
Title chapter
HPC Quick Reference Guide A
TORQUE options B
Useful Linux Commands C
Notification:
3
In this tutorial specific commands are separated from the accompanying text:
$ commands
These should be entered by the reader at a command line in a Terminal on the UGent-HPC.
They appear in all exercises preceded by a $ and printed in bold. Youll find those actions in a
grey frame.
Button are menus, buttons or drop down boxes to be pressed or selected.
Directory is the notation for directories (called folders in Windows terminology) or specific
files. (e.g., /user/home/gent/vsc400/vsc40000 )
Text Is the notation for text to be entered.
Tip: A Tip paragraph is used for remarks or tips.
More support:
Before starting the course, the example programs and configuration files used in this Tutorial
must be copied to your home directory, so that you can work with your personal copy. If you
have received a new VSC-account, all examples are present in an /apps/gent/tutorials/Intro-
HPC/examples directory.
$ cp -r /apps/gent/tutorials/Intro-HPC/examples /
$ cd
$ ls
They can also be downloaded from the VSC website at https://www.vscentrum.be. Apart
from this HPC Tutorial, the documentation on the VSC website will serve as a reference for all
the operations.
Tip: The users are advised to get self-organised. There are only limited resources available at
the HPC, which are best effort based. The HPC cannot give support for code fixing, the user
applications and own developed software remain solely the responsibility of the end-user.
More documentation can be found at:
This tutorial is intended for users who want to connect and work on the HPC of the UGent.
This tutorial is available in a Windows, Mac or Linux version.
This tutorial is available for UAntwerpen, UGent, KU Leuven, UHasselt and VUB users.
Request your appropriate version at [email protected].
Contact Information:
4
We welcome your feedback, comments and suggestions for improving the HPC Tutorial (contact:
[email protected]).
For all technical questions, please contact the HPC staff:
1. Website: http://www.ugent.be/hpc
2. By e-mail: [email protected]
Mailing-lists:
5
Glossary
Compute Node The computational units on which batch or interactive jobs are processed. A
compute node is pretty much comparable to a single personal computer. It contains one
or more sockets, each holding a single processor or CPU. The compute node is equipped
with memory (RAM) that is accessible by all its CPUs.
CPU A single processing unit. A CPU is a consumable resource. Compute nodes typically
contain of one or more CPUs.
Distributed memory system Computing system consisting of many compute nodes connected
by a network, each with their own memory. Accessing memory on a neighbouring node is
possible but requires explicit communication.
FTP File Transfer Protocol, used to copy files between distinct machines (over a network.)
FTP is unencrypted, and as such blocked on certain systems. SFTP or SCP are secure
alternatives to FTP.
HPC High Performance Computing, high performance computing and multiple-task computing
on a supercomputer. The UGent-HPC is the HPC infrastructure at the UGent.
Infiniband A high speed switched fabric computer network communications link used in HPC.
Job constraints A set of conditions that must be fulfilled for the job to start.
L1d Level 1 data cache, often called primary cache, is a static memory integrated with pro-
cessor core that is used to store data recently accessed by a processor and also data which
may be required by the next operations..
L2d Level 2 data cache, also called secondary cache, is a memory that is used to store recently
accessed data and also data, which may be required for the next operations. The goal of
having the level 2 cache is to reduce data access time in cases when the same data was
already accessed before..
6
L3d Level 3 data cache. Extra cache level built into motherboards between the microprocessor
and the main memory..
LCC The Last Level Cache is the last level in the memory hierarchy before main memory. Any
memory requests missing here must be serviced by local or remote DRAM, with significant
increase in latency when compared with data serviced by the cache memory..
Login Node On HPC clusters, login nodes serve multiple functions. From a login node you
can submit and monitor batch jobs, analyse computational results, run editors, plots,
debuggers, compilers, do housekeeping chores as adjust shell settings, copy files and in
general manage your account. You connect to these servers when want to start working on
the UGent-HPC.
Moab Moab is a job scheduler, which allocates resources for jobs that are requesting resources.
Modules HPC uses an open source software package called Environment Modules (Modules
for short) which allows you to add various path definitions to your shell environment.
MPI MPI stands for Message-Passing Interface. It supports a parallel programming method
designed for distributed memory systems, but can also be used well on shared memory
systems.
Node Typically, a machine, one computer. A node is the fundamental object associated with
compute resources.
PBS, TORQUE or OpenPBS are Open Source resource managers, which are responsible for
collecting status and health information from compute nodes and keeping track of jobs
running in the system. It is also responsible for spawning the actual executable that is
associated with a job, e.g., running the executable on the corresponding compute node.
Client commands for submitting and managing jobs can be installed on any host, but in
general are installed and used from the Login nodes.
7
Queues PBS/TORQUE queues, or classes as Moab refers to them, represent groups of com-
puting resources with specific parameters. A queue with a 12 hour runtime or walltime
would allow jobs requesting 12 hours or less to use this queue.
scp Secure Copy is a protocol to copy files between distinct machines. SCP or scp is used
extensively on HPC clusters to stage in data from outside resources.
Scratch Supercomputers generally have what is called scratch space: storage available for tem-
porary use. Use the scratch filesystem when, for example you are downloading and un-
compressing applications, reading and writing input/output data during a batch job, or
when you work with large datasets. Scratch is generally a lot faster then the Data or Home
filesystem.
sftp Secure File Transfer Protocol, used to copy files between distinct machines.
Shared memory system Computing system in which all of the processors share one global
memory space. However, access times from a processor to different regions of memory
are not necessarily uniform. This is called NUMA: Non-uniform memory access. Memory
closer to the CPU your process is running on will generally be faster to access than memory
that is closer to a different CPU. You can pin processes to a certain CPU to ensure they
always use the same memory.
SSH Secure Shell (SSH), sometimes known as Secure Socket Shell, is a Unix-based command
interface and protocol for securely getting access to a remote computer. It is widely used
by network administrators to control Web and other kinds of servers remotely. SSH is
actually a suite of three utilities - slogin, ssh, and scp - that are secure versions of the
earlier UNIX utilities, rlogin, rsh, and rcp. SSH commands are encrypted and secure in
several ways. Both ends of the client/server connection are authenticated using a digital
certificate, and passwords are protected by encryption. Popular implementations include
OpenSSH on Linux/Mac and Putty on Windows.
ssh-keys OpenSSH is a network connectivity tool, which encrypts all traffic including pass-
words to effectively eliminate eavesdropping, connection hijacking, and other network-level
attacks. SSH-keys are part of the OpenSSH bundle. On HPC clusters, ssh-keys allow
password-less access between compute nodes while running batch or interactive parallel
jobs.
Swap space A quantity of virtual memory available for use by batch jobs. Swap is a consumable
resource provided by nodes and consumed by jobs.
TLB Translation Look-aside Buffer, a table in the processors memory that contains infor-
mation about the virtual memory pages the processor has accessed recently. The table
cross-references a programs virtual addresses with the corresponding absolute addresses in
physical memory that the program has most recently used. The TLB enables faster com-
puting because it allows the address processing to take place independent of the normal
address-translation pipeline..
8
Walltime Walltime is the length of time specified in the job-script for which the job will run
on a batch system, you can visuallyse walltime as the time measured by a wall mounted
clock (or your digital wrist watch). This is a computational resource.
9
Contents
Glossary 5
I Beginners Guide 15
1 Introduction to HPC 16
1.1 What is HPC? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2 What is the UGent-HPC? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 What is the HPC not! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Is the HPC a solution for my computational needs? . . . . . . . . . . . . . . . . . 18
1.4.1 Batch or interactive mode? . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.2 Parallel or sequential programs? . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.3 What programming languages can I use? . . . . . . . . . . . . . . . . . . . 18
1.4.4 What operating systems can I use? . . . . . . . . . . . . . . . . . . . . . . 19
1.4.5 What is the next step? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
10
3.2 Transfer Files to/from the HPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1 Using scp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2 Using sftp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.3 Using a GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.2 Available modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.3 Organisation of modules in toolchains . . . . . . . . . . . . . . . . . . . . 34
3.3.4 Activating and de-activating modules . . . . . . . . . . . . . . . . . . . . . 35
3.3.5 Explicit version numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.6 Get detailed info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
11
6.1.1 Default file names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.1.2 Filenames using the name of the job . . . . . . . . . . . . . . . . . . . . . 56
6.1.3 User-defined file names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Where to store your data on the HPC . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2.1 Pre-defined user directories . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.2.2 Your home directory ($VSC_HOME) . . . . . . . . . . . . . . . . . . . . 59
6.2.3 Your data directory ($VSC_DATA) . . . . . . . . . . . . . . . . . . . . . 59
6.2.4 Your scratch space ($VSC_SCRATCH) . . . . . . . . . . . . . . . . . . . 59
6.2.5 Pre-defined quotas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.3 Writing Output files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.4 Reading Input files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.5 How much disk space do I get? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.5.1 Quota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.5.2 Check your quota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
12
8.3.1 Number of processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.3.2 Monitoring the CPU-utilisation . . . . . . . . . . . . . . . . . . . . . . . . 86
8.3.3 Fine-tuning your executable and/or job-script . . . . . . . . . . . . . . . . 86
8.4 The system load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.4.1 Optimal load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.4.2 Monitoring the load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.4.3 Fine-tuning your executable and/or job-script . . . . . . . . . . . . . . . . 88
8.5 Checking File sizes & Disk I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.5.1 Monitoring File sizes during execution . . . . . . . . . . . . . . . . . . . . 89
8.6 Specifying network requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9 HPC Policies 91
II Advanced Guide 92
10 Multi-job submission 93
10.1 The worker Framework: Parameter Sweeps . . . . . . . . . . . . . . . . . . . . . . 94
10.2 The Worker framework: Job arrays . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.3 MapReduce: prologues and epilogue . . . . . . . . . . . . . . . . . . . . . . . . . 99
10.4 Some more on the Worker Framework . . . . . . . . . . . . . . . . . . . . . . . . 101
10.4.1 Using Worker efficiently . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.4.2 Monitoring a worker job . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.4.3 Time limits for work items . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.4.4 Resuming a Worker job . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
10.4.5 Further information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
13
12 Program examples 111
14
Part I
Beginners Guide
15
Chapter 1
Introduction to HPC
1
Wikipedia: http://en.wikipedia.org/wiki/Supercomputer
16
1.2 What is the UGent-HPC?
The HPC is a collection of computers with Intel processors, running a Linux operating system,
shaped like pizza boxes and stored above and next to each other in racks, interconnected with
copper and fiber cables. Their number crunching power is (presently) measured in hundreds of
billions of floating point operations (gigaflops) and even in teraflops.
5. . . .
17
Chapter 1. Introduction to HPC
Typically, the strength of a supercomputer comes from its ability to run a huge number of
programs (i.e., executables) in parallel without any user interaction in real time. This is what is
called running in batch mode.
It is also possible to run programs at the HPC, which require user interaction. (pushing buttons,
entering input data, etc.). Although technically possible, the use of the HPC might not always be
the best and smartest option to run those interactive programs. Each time some user interaction
is needed, the computer will wait for user input. The available computer resources (CPU, storage,
network, etc.) might not be optimally used in those cases. A more in-depth analysis with the
HPC staff can unveil whether the HPC is the desired solution to run interactive programs.
Interactive mode is typically only useful for creating quick visualisations of your data without
having to copy your data to your desktop and back.
Parallel computing is a form of computation in which many calculations are carried out
simultaneously. They are based on the principle that large problems can often be divided into
smaller ones, which are then solved concurrently (in parallel).
Parallel computers can be roughly classified according to the level at which the hardware sup-
ports parallelism, with multicore computers having multiple processing elements within a single
machine, while clusters use multiple computers to work on the same task. Parallel computing
has become the dominant computer architecture, mainly in the form of multicore processors.
Parallel programs are more difficult to write than sequential ones, because concurrency in-
troduces several new classes of potential software bugs, of which race conditions are the most
common. Communication and synchronisation between the different subtasks are typically some
of the greatest obstacles to getting good parallel program performance.
It is perfectly possible to also run purely sequential programs on the HPC.
Running your sequential programs on the most modern and fastest computers in the HPC can
save you a lot of time. But it also might be possible to run multiple instances of your program
(e.g., with different input parameters) on the HPC, in order to solve one overall problem (e.g.,
to perform a parameter sweep). This is another form of running your sequential programs in
parallel.
You can use any programming language, any software package and any library provided it has a
version that runs on Linux, specifically, on the version of Linux that is installed on the compute
nodes, Red Hat Enterprise Linux.
For the most common programming languages, a compiler is available on CentOS 7.2 (phanpy,
golett, swalot), Scientific Linux 6.7 (raichu, delcatty). Supported and common programming
languages on the HPC are C/C++, FORTRAN, Java, Perl, Python, MATLAB, R, etc.
18
1.4 Is the HPC a solution for my computational needs?
All nodes in the HPC cluster run under CentOS 7.2 (phanpy, golett, swalot), Scientific Linux 6.7
(raichu, delcatty), which is a specific version of Red Hat Enterprise Linux. This means that all
programs (executables) should be compiled for CentOS 7.2 (phanpy, golett, swalot), Scientific
Linux 6.7 (raichu, delcatty).
Users can connect from any computer in the UGent network to the HPC, regardless of the
Operating System that they are using on their personal computer. Users can use any of the
common Operating Systems (such as Windows, macOS or any version of Linux/Unix/BSD) and
run and control their programs on the HPC.
A user does not need to have prior knowledge about Linux; all of the required knowledge is
explained in this tutorial.
When you think that the HPC is a useful tool to support your computational needs, we encourage
you to acquire a VSC-account (as explained in chapter 2), read chapter 3, Setting up the
environment, and explore chapters 5 to 8 which will help you to transfer and run your programs
on the HPC cluster.
Do not hesitate to contact the HPC staff for any help.
19
Chapter 2
All users of AUGent can request an account on the HPC, which is part of the Flemish Super-
computing Centre (VSC).
See chapter 9 for more information on who is entitled to an account.
The VSC, abbreviation of Flemish Supercomputer Centre, is a virtual supercomputer centre.
It is a partnership between the five Flemish associations: the Association KU Leuven, Ghent
University Association, Brussels University Association, Antwerp University Association and the
University Colleges-Limburg. The VSC is funded by the Flemish Government.
The UGent-HPC clusters use public/private key pairs for user authentication (rather than pass-
words). Technically, the private key is stored on your local computer and always stays there; the
public key is stored on the HPC. Access to the HPC is granted to anyone who can prove to have
access to the corresponding private key on his local computer.
20
2.1 Getting ready to request an account
Since all VSC clusters use Linux as their main operating system, you will need to get acquainted
with using the command-line interface and using the terminal.
Launch a terminal from your desktop application menu and you will see the bash shell. There
are other shells, but most Linux distributions use bash by default.
Before requesting an account, you need to generate a pair of ssh keys. One popular way to do
this on Linux is using the OpenSSH client included with Linux, which you can then also use to
log on to the clusters.
Secure Shell (ssh) is a cryptographic network protocol for secure data communication, remote
command-line login, remote command execution, and other secure network services between
two networked computers. In short, ssh provides a secure connection between 2 computers via
insecure channels (Network, Internet, telephone lines, . . . ).
Secure means that:
OpenSSH is a FREE implementation of the SSH connectivity protocol. Linux comes with its
own implementation of OpenSSH, so you dont need to install any third-party software to use it.
Just open a terminal window and jump in!
On all popular Linux distributions, the OpenSSH software is readily available, and most often
installed by default. You can check whether the OpenSSH software is installed by opening a
terminal and typing:
$ ssh -V
OpenSSH_4.3p2, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
To access the clusters and transfer your files, you will use the following commands:
A key pair might already be present in the default location inside your home directory. Therefore,
we first check if a key is available with the list short (ls) command:
21
Chapter 2. Getting an HPC Account
$ ls /.ssh
You can recognise a public/private key pair when a pair of files has the same name except for
the extension .pub added to one of them. In this particular case, the private key is id_rsa
and public key is id_rsa.pub. You may have multiple keys (not necessarily in the directory
/.ssh) if you or your operating system requires this.
You will need to generate a new key pair, when:
For extra security, the private key itself can be encrypted using a passphrase, to prevent anyone
from using your private key even when they manage to copy it. You have to unlock the private
key by typing the passphrase. Be sure to never give away your private key, it is private and
should stay private. You should not even copy it to one of your other machines, instead, you
should create a new public/private key pair for each machine.
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/user/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/user/.ssh/id_rsa.
Your public key has been saved in /home/user/.ssh/id_rsa.pub.
This will ask you for a file name to store the private and public key, and a passphrase to protect
your private key. It needs to be emphasised that you really should choose the passphrase wisely!
The system will ask you for it every time you want to use the private key that is every time you
want to access the cluster or transfer your files.
Without your key pair, you wont be able to apply for a personal VSC account.
Visit https://account.vscentrum.be/ from within the UGent campus network (or using
a VPN connection.) You will be redirected to our WAYF (Where Are You From) service where
you have to select your Home Organisation.
22
2.2 Applying for the account
Select UGent in the dropdown box and optionaly select Save my preference and permanently.
Click Confirm
You will now be taken to the authentication page of your institute.
You will now have to log in with CAS using your UGent account, and then be requested to share
your Mail and Affiliation with KU Leuven (which actually means the VSC accountpage). Select
the Remember check-box and click Yes, continue.
23
Chapter 2. Getting an HPC Account
After you log in using your UGent login and password, you will be asked to upload the file that
contains your public key, i.e., the file id_rsa.pub which you have generated earlier.
This file has been stored in the directory /.ssh/ .
After you have uploaded your public key you will receive an e-mail with a link to confirm your
e-mail address. After confirming your e-mail address the VSC staff will review and if applicable
approve your account.
Within one day, you should receive a Welcome e-mail with your VSC account details.
Dear (Username),
Your VSC-account has been approved by an administrator.
Your vsc-username is vsc40000
Kind regards,
-- The VSC administrators
Now, you can start using the HPC. You can always look up your vsc id later by visiting
24
2.3 Computation Workflow on the HPC
https://account.vscentrum.be.
6. Wait while
Well take you through the different tasks one by one in the following chapters.
25
Chapter 3
Before you can really start using the HPC clusters, there are several things you need to do or
know:
1. You need to log on to the cluster using an SSH client to one of the login nodes. This
will give you command-line access. The software youll need to use on your client system
depends on its operating system.
2. Before you can do some work, youll have to transfer the files that you need from your
desktop computer to the cluster. At the end of a job, you might want to transfer some files
back.
3. Optionally, if you wish to use programs with a graphical user interface, you will need
an X-server on your client system and log in to the login nodes with X-forwarding enabled.
4. Often several versions of software packages and libraries are installed, so you need to
select the ones you need. To manage different versions efficiently, the VSC clusters use
so-called modules, so you will need to select and load the modules that you need.
3.1.1 Connect
The HPC is only accessible from within the UGent network, but you can get external access
(e.g., from home) by using a VPN connection.
Open up a terminal and enter the following command to connect to the HPC.
$ ssh [email protected]
Here, user vsc40000 wants to make a connection to the hpcugent cluster at UGent via the login
node login.hpc.ugent.be, so replace vsc40000 with your own vsc id in the above command.
The first time you make a connection to the login node, you will be asked to verify the authenticity
of the login node, e.g.,
26
3.1 First Time connection to the HPC
$ ssh [email protected]
The authenticity of host login.hpc.ugent.be (<IP-adress>)
cant be established.
RSA key fingerprint is RSA: 2f:0c:f7:76:87:57:f7:5d:2d:7b:d1:a1:e1:86:19:f3 (MD5)
SHA256:k+eqH4D4mTpJTeeskpACyouIWf+60sv1JByxODjvEKE, ECDSA:
13:f0:11:d1:94:cb:ca:e5:ca:82:21:62:ab:9f:3f:c2 (MD5)
SHA256:1MNKFTfl1T9sm6tTWAo4sn7zyEfiWFLKbk/mlT+7S5s, , ED25519:
fa:23:ab:1f:f0:65:f3:0d:d3:33:ce:7a:f8:f4:fc:2a (MD5),
SHA256:5hnjlJLolblqkKCmRduiWA21DsxJcSlpVoww0GLlagc
Are you sure you want to continue connecting (yes/no)? yes
Congratulations, youre on the HPC infrastructure now! To find out where you have
landed you can print the current working directory:
$ pwd
/user/home/gent/vsc400/vsc40000
$ cd /apps/gent/tutorials
$ ls
Intro-HPC/
This directory currently contains all training material for the use of:
More relevant training material to work with the HPC can always be added later in this directory.
You can now explore the content of this directory with the ls l (lists long) and the cd (change
directory) commands:
As we are interested in the use of the HPC , move further to Intro-HPC and explore the
contents up to 2 levels deep:
$ cd Intro-HPC
$ tree -L 2
.
-- examples
|-- Compiling-and-testing-your-software-on-the-HPC
|-- Fine-tuning-Job-Specifications
|-- Multi-core-jobs-Parallel-Computing
|-- Multi-job-submission
|-- Program-examples
|-- Running-batch-jobs
|-- Running-jobs-with-input
|-- Running-jobs-with-input-output-data
|-- example.pbs
-- example.sh
9 directories, 5 files
27
Chapter 3. Connecting to the HPC
2. An examples sub-directory, containing all the examples that you need in this Tutorial, as
well as examples that might be useful for your specific applications.
$ cd examples
$ cp -r /apps/gent/tutorials/Intro-HPC/examples /
You will now make a doc directory in your home dir and copy the latest version of this document
to this directory. We will use this later in this tutorial.
$ mkdir /docs
$ cp -r /apps/gent/tutorials/intro-HPC-linux-gent/
Go to your home directory, check your own private examples directory, . . . and start working.
$ cd
$ ls -l
Upon connecting you will see a login message containing your last login time stamp and a basic
overview of the current cluster utilisation.
Last login: Tue Jan 6 08:53:11 2015 from helios.ugent.be
STEVIN HPC-UGent infrastructure status on Mon, 19 Oct 2015 15:00:01
28
3.1 First Time connection to the HPC
$ exit
logout
Connection to login.hpc.ugent.be closed.
$ locale
LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=
A locale is a set of parameters that defines the users language, country and any special variant
preferences that the user wants to see in their user interface. Usually a locale identifier consists
of at least a language identifier and a region identifier.
Open the /.bashrc on your local machine with your favourite editor and add the following lines:
$ nano /.bashrc
...
export LANGUAGE="en_US.UTF-8"
export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
export LANG="en_US.UTF-8"
...
Tip: vi: To start entering text in vi: move to the place you want to start entering text with the
arrow keys and type i to switch to insert mode. You can easily exit vi by entering: <esc>:wq
To exit vi without saving your changes, enter <esc>:q!
or alternatively (if you are not comfortable with the Linux editors), again on your local machine:
29
Chapter 3. Connecting to the HPC
You can now log out, open a new terminal/shell on your local machine and reconnect to the
HPC, and you should not get these warnings anymore.
Before you can do some work, youll have to transfer the files you need from your desktop or
department to the cluster. At the end of a job, you might want to transfer some files back.
The preferred way to transfer files is by using an scp or sftp via the secure OpenSSH protocol.
Linux ships with an implementation of OpenSSH, so you dont need to install any third-party
software to use it. Just open a terminal window and jump in!
Secure copy or SCP is a tool (command) for securely transferring files between a local host (=
your computer) and a remote host (the HPC). It is based on the Secure Shell (SSH) protocol.
The scp command is the equivalent of the cp (i.e., copy) command, but can copy files to or
from remote machines.
Open an additional Terminal window and check that youre working on your local machine.
$ hostname
<local-machine-name>
If youre still using the terminal that is connected to the HPC, close the connection by typing
exit in the terminal window.
For example, we will copy the (local) file localfile.txt to your home directory on the HPC cluster.
We first generate a small dummy localfile.txt, which contains the word Hello. Use your own
<vsc-account>, which is something like vsc40000 .
Connect to the HPC via another terminal, print the working directory (to make sure youre in
the home-directory) and check whether the file has arrived:
30
3.2 Transfer Files to/from the HPC
$ pwd
/user/home/gent/vsc400/vsc40000
$ ls -l
total 1536
drwxrwxr-x 2 vsc40000 131072 Sep 11 16:24 bin/
drwxrwxr-x 2 vsc40000 131072 Sep 17 11:47 docs/
drwxrwxr-x 10 vsc40000 131072 Sep 17 11:48 examples/
-rw-r--r-- 1 vsc40000 6 Sep 18 09:44 localfile.txt
$ cat localfile.txt
Hello
The scp command can also be used to copy files from the cluster to your local machine. Let us
copy the remote file intro-HPC-linux-gent.pdf from your docs sub-directory on the cluster to
your local computer.
First, we will confirm that the file is indeed in the docs sub-directory. On the Terminal on the
HPC, enter:
$ cd /docs
/user/home/gent/vsc400/vsc40000 /docs
$ ls -l
total 1536
-rw-r--r-- 1 vsc40000 Sep 11 09:53 intro-HPC-linux-gent.pdf
Now we will copy the file to the local machine. On the Terminal on your own local computer,
enter:
$ scp [email protected]:./docs/intro-HPC-linux-gent.pdf .
intro-HPC-linux-gent.pdf 100% 725KB 724.6KB/s 00:01
$ ls -l
total 899
-rw-r--r-- 1 gborstlap staff 741995 Sep 18 09:53 intro-HPC-linux-gent.pdf
-rw-r--r-- 1 gborstlap staff 6 Sep 18 09:37 localfile.txt
The SSH File Transfer Protocol (also Secure File Transfer Protocol, or SFTP) is a
network protocol that provides file access, file transfer and file management functionalities over
any reliable data stream. It was designed as an extension of the Secure Shell protocol (SSH)
version 2.0. This protocol assumes that it is run over a secure channel, such as SSH, that the
server has already authenticated the client, and that the identity of the client user is available
to the protocol.
The sftp is an equivalent of the ftp command, with the difference that it uses the secure ssh
protocol to connect to the clusters.
One easy way of starting a sftp session is
$ sftp [email protected]
31
Chapter 3. Connecting to the HPC
If you prefer a GUI to transfer files back and forth to the HPC, you can use your file browser.
Open your file browser and press
Ctrl + l
This should open up a address bar where you can enter a URL. Alternatively, look for the
connect to server option in your file browsers menu.
Enter: sftp://vsc40000 @login.hpc.ugent.be/ and press enter.
You should now be able to browse files on the HPC in your file browser.
3.3 Modules
Software installation and maintenance on a HPC cluster such as the VSC clusters poses a number
of challenges not encountered on a workstation or a departmental cluster. We therefore need a
system on the HPC, which is able to easily activate or deactivate the software packages that you
require for your program execution.
The program environment on the HPC is controlled by pre-defined settings, which are stored in
environment (or shell) variables.
You can use shell variables to store data, set configuration options and customise the environment
on the HPC. The default shell under Scientific Linux on the HPC is Bash (Bourne Again Shell)
and can be used for the following purposes:
32
3.3 Modules
6. Run commands that you want to run whenever you log in or log out.
7. Setup aliases and/or shell function to automate tasks to save typing and time.
The environment variables are typically set at login by a script, whenever you connect to the
HPC. These pre-defined variables usually impact the run time behaviour of the programs that
we want to run.
All the software packages that are installed on the HPC cluster require different settings. These
packages include compilers, interpreters, mathematical software such as MATLAB and SAS, as
well as other applications and libraries.
In order to administer the active software and their environment variables, a module package
has been developed, which:
2. Allows setting and unsetting of environment variables, including adding and deleting entries
from database-type environment variables.
4. Takes care of versioning aspects: For many libraries, multiple versions are installed and
maintained. The module system also takes care of the versioning of software packages in
case multiple versions are installed. For instance, it does not allow multiple versions to be
loaded at same time.
5. Takes care of dependencies: Another issue arises when one considers library versions and the
dependencies they create. Some software requires an older version of a particular library
to run correctly (or at all). Hence a variety of version numbers is available for important
libraries.
This is all managed with the module command, which is explained in the next sections.
A large number of software packages are installed on the HPC clusters. A list of all currently
available software can be obtained by typing:
33
Chapter 3. Connecting to the HPC
$ module av
or
$ module available
As you are not aware of the capitals letters in the module name, we looked for a case-insensitive
name with the -i option.
This gives a full list of software packages that can be loaded. Note that modules starting with a
capital letter are listed first.
The amount of modules on the VSC systems can be overwhelming, and it is not always im-
mediately clear which modules can be loaded safely together if you need to combine multiple
programs in a single job to get your work done.
Therefore the VSC has defined so-called toolchains on the newer VSC-clusters. A toolchain
contains a C/C++ and Fortran compiler, a MPI library and some basic math libraries for (dense
matrix) linear algebra and FFT. Two toolchains are defined on most VSC systems. One, the
intel toolchain, consists of the Intel compilers, MPI library and math libraries. The other one,
the foss toolchain, consists of Open Source components: the GNU compilers, OpenMPI, Open-
BLAS and the standard LAPACK and ScaLAPACK libraries for the linear algebra operations
and the FFTW library for FFT. The toolchains are refreshed twice a year, which is reflected in
their name. E.g., foss-2014b is the second version of the foss toolchain in 2014.
The toolchains are then used to compile a lot of the software installed on the VSC clusters. You
can recognise those packages easily as they all contain the name of the toolchain after the version
34
3.3 Modules
number in their name. Packages compiled with the same toolchain and toolchain version will
typically work together well without conflicts.
For some clusters, additional toolchains are defined, e.g., to take advantage of specific properties
of that cluster such as GPU accelerators or special interconnect features that require a vendor-
specific MPI-implementation.
Obviously, you need to keep track of the modules that are currently loaded. If you executed the
two load commands stated above, you will get the following:
$ module list
Currently Loaded Modulefiles:
1) cluster/delcatty(default) 8) intel/2014b
2) MATLAB/2013b 9) bzip2/1.0.6-intel-2014b
3) GCC/4.8.3 10) zlib/1.2.7-intel-2014b
4) icc/2013.5.192-GCC-4.8.3 11) ncurses/5.9-intel-2014b
5) ifort/2013.5.192-GCC-4.8.3 12) libreadline/6.2-intel-2014b
6) impi/4.1.3.049-GCC-4.8.3 13) Python/2.7.6-intel-2014b
7) imkl/11.1.2.144-2013.5.192-GCC-4.8.3
It is important to note at this point that other modules (e.g., intel/2014a) are also listed, although
the user did not explicitly load them. This is because Python/2.7.6-intel-2014a depends on it
(as indicated in its name), and the system administrator specified that the intel/2014a module
should be loaded whenever the Python module is loaded. There are advantages and disadvantages
to this, so be aware of automatically loaded modules whenever things go wrong: they may have
something to do with it!
In fact, an easy way to check the components and version numbers of those components of a
toolchain is to simply load the toolchain and then list the modules that are loaded.
To unload a module, one can use the module unload command. It works consistently with the
load command, and reverses the latters effect. However, the dependencies of the package are
NOT automatically unloaded; the user shall unload the packages one by one. One can however
unload automatically loaded modules manually, to debug some problem. When the Python
module is unloaded, only the following modules remain:
35
Chapter 3. Connecting to the HPC
Notice that the version was not specified: the module system is sufficiently clever to figure out
what the user intends. However, checking the list of currently loaded modules is always a good
idea, just to make sure . . .
In order to unload all modules at once, and hence be sure to start in a clean state, you can use:
$ module purge
However, on some VSC clusters you may be left with a very empty list of available modules
after executing module purge. On those systems, module av will show you a list of modules
containing the name of a cluster or a particular feature of a section of the cluster, and loading
the appropriate module will restore the module list applicable to that particular system.
Modules need not be loaded one by one; the two load commands can be combined as follows:
As a rule, once a module has been installed on the cluster, the executables or libraries it comprises
are never modified. This policy ensures that the users programs will run consistently, at least
if the user specifies a specific version. Failing to specify a version may result in unexpected
behaviour.
Consider the following example: the user decides to use the GSL library for numerical compu-
36
3.3 Modules
tations, and at that point in time, just a single version 1.12, compiled with Intel is installed on
the cluster. The user loads the library using:
rather than
Everything works fine, up to the point where a new version of GSL is installed, 1.13 compiled
for gcc. From then on, the users load command will load the latter version, rather than the
intended one, which may lead to unexpected problems.
Lets now generate a version conflict with the ABAQUS module, and see what is happening.
$ module av ABAQUS
ABAQUS/6.12.1-linux-x86_64
ABAQUS/6.13.5-linux-x86_64
ABAQUS/6.14.1-linux-x86_64
$ module load ABAQUS/6.12.1-linux-x86_64
$ module load ABAQUS/6.13.5-linux-x86_64
ABAQUS/6.13.5-linux-x86_64(12):ERROR:150: Module ABAQUS/6.13.5-linux-x86_64
conflicts with the currently loaded module(s) ABAQUS/6.12.1-linux-x86_64
ABAQUS/6.13.5-linux-x86_64(12):ERROR:102: Tcl command execution failed: conflict
ABAQUS
$ module swap ABAQUS/6.13.5-linux-x86_64
Note: A module swap command combines the appropriate module unload and module load
commands.
In order to know more about a certain package, and to know what environment variables will be
changed by a certain module, try:
37
Chapter 3. Connecting to the HPC
$ module help
38
Chapter 4
In order to have access to the compute nodes of a cluster, you have to use the job system. The
system software that handles your batch jobs consists of two pieces: the queue- and resource
manager TORQUE and the scheduler Moab. Together, TORQUE and Moab provide a suite of
commands for submitting jobs, altering some of the properties of waiting jobs (such as reordering
or deleting them), monitoring their progress and killing ones that are having problems or are no
longer needed. Only the most commonly used commands are mentioned here.
When you connect to the HPC, you have access to (one of) the login nodes of the cluster. There
you can prepare the work you want to get done on the cluster by, e.g., installing or compiling
programs, setting up data sets, etc. The computations however, should not be performed on this
login node. The actual work is done on the clusters compute nodes. These compute nodes
39
Chapter 4. Running batch jobs
are managed by the job scheduling software (Moab) and a Resource Manager (TORQUE), which
decides when and on which compute nodes the jobs can run. It is usually not necessary to log
on to the compute nodes directly. Users can (and should) monitor their jobs periodically as they
run, but do not have to remain logged in the entire time.
The documentation in this Running batch jobs section includes a description of the general
features of job scripts, how to submit them for execution and how to monitor their progress.
Usually, you will want to have your program running in batch mode, as opposed to interactively as
you may be accustomed to. The point is that the program must be able to start and run without
user intervention, i.e., without you having to enter any information or to press any buttons
during program execution. All the necessary input or required options have to be specified on
the command line, or needs to be put in input or configuration files.
As an example, we will run a perl script, which you will find in the examples subdirectory on the
HPC. When you received an account to the HPC a subdirectory with examples was automatically
generated for you.
Remember that you have copied the contents of the HPC examples directory to your home
directory, so that you have your own personal copy (editable and over-writable) and that you
can start using the examples. If you havent done so already, run these commands now:
$ cd
$ cp -r /apps/gent/tutorials/Intro-HPC/examples /
First go to the directory with the first examples by entering the command:
$ cd /examples/Running-batch-jobs
Each time you want to execute a program on the HPC youll need 2 things:
The executable The program to execute from the end-user, together with its peripheral input
files, databases and/or command options.
A configuration script (also called a job-script), which will define the computer resource re-
quirements of the program, the required additional software packages and which will start
the actual executable. The HPC needs to know:
40
4.1 Defining and submitting your job
Later on, the HPC user shall have to define (or to adapt) his/her own configuration scripts.
For now, all required configuration scripts for the exercises are provided for you in the examples
subdirectories.
List and check the contents with:
$ ls -l
total 512
-rw-r--r-- 1 vsc40000 193 Sep 11 10:34 fibo.pbs
-rw-r--r-- 1 vsc40000 609 Sep 11 10:25 fibo.pl
In this directory you find a Perl script (named fibo.pl) and a job-script (named fibo.pbs).
2. The job-script is actually a standard Unix/Linux shell script that contains a few extra
comments at the beginning that specify directives to PBS. These comments all begin with
#PBS.
We will first execute the program locally (i.e., on your current login-node), so that you can see
what the program does.
On the command line, you would run this using:
41
Chapter 4. Running batch jobs
$ ./fibo.pl
[0] -> 0
[1] -> 1
[2] -> 1
[3] -> 2
[4] -> 3
[5] -> 5
[6] -> 8
[7] -> 13
[8] -> 21
[9] -> 34
[10] -> 55
[11] -> 89
[12] -> 144
[13] -> 233
[14] -> 377
[15] -> 610
[16] -> 987
[17] -> 1597
[18] -> 2584
[19] -> 4181
[20] -> 6765
[21] -> 10946
[22] -> 17711
[23] -> 28657
[24] -> 46368
[25] -> 75025
[26] -> 121393
[27] -> 196418
[28] -> 317811
[29] -> 514229
Remark: Recall that you have now executed the Perl script locally on one of the login-nodes of
the HPC cluster. Of course, this is not our final intention; we want to run the script on any of
the compute nodes. Also, it is not considered as good practice, if you abuse the login-nodes
for testing your scripts and executables. It will be explained later on how you can reserve your
own compute-node (by opening an interactive session) to test your software. But for the sake of
acquiring a good understanding of what is happening, you are pardoned for this example since
these jobs require very little computing power.
The job-script contains a description of the job by specifying the command that need to be
executed on the compute node:
fibo.pbs
1 #!/bin/bash -l
2 cd $PBS_O_WORKDIR
3 ./fibo.pl
So, jobs are submitted as scripts (bash, Perl, Python, etc.), which specify the parameters related
to the jobs such as expected runtime (walltime), e-mail notification, etc. These parameters can
also be specified on the command line.
This job script that can now be submitted to the clusters job system for execution, using the
qsub (Queue SUBmit) command:
42
4.2 Monitoring and managing your job(s)
$ qsub fibo.pbs
123456.master15.delcatty.gent.vsc
The qsub command returns a job identifier on the HPC cluster. The important part is the
number (e.g., 123456); this is a unique identifier for the job and can be used to monitor and
manage your job.
Your job is now waiting in the queue for a free workernode to start on.
Go and drink some coffee . . . but not too long. If you get impatient you can start reading the
next section for more information on how to monitor jobs in the queue.
After your job was started, and ended, check the contents of the directory:
$ ls -l
total 768
-rw-r--r-- 1 vsc40000 vsc40000 44 Feb 28 13:33 fibo.pbs
-rw------- 1 vsc40000 vsc40000 0 Feb 28 13:33 fibo.pbs.e123456
-rw------- 1 vsc40000 vsc40000 1010 Feb 28 13:33 fibo.pbs.o123456
-rwxrwxr-x 1 vsc40000 vsc40000 302 Feb 28 13:32 fibo.pl
$ more fibo.pbs.o123456
$ more fibo.pbs.e123456
These files are used to store the standard output and error that would otherwise be shown in the
terminal window. By default, they have the same name as that of the PBS script, i.e., fibo.pbs
as base name, followed by the extension .o (output) and .e (error), respectively, and the job
number (123456 for this example). The error file will be empty, at least if all went well. If
not, it may contain valuable information to determine and remedy the problem that prevented
a successful run. The standard output file will contain the results of your calculation (here, the
output of the perl script)
Using the job ID that qsub returned, there are various ways to monitor the status of your job,
e.g.,
To get the status information on your job:
$ qstat <jobid>
To show on which compute nodes your job is running, at least, when it is running:
$ qstat -n <jobid>
To remove a job from the queue so that it will not run, or to stop a job that is already running.
43
Chapter 4. Running batch jobs
$ qdel <jobid>
When you have submitted several jobs (or you just forgot about the job ID), you can retrieve
the status of all your jobs that are submitted and are not yet finished using:
$ qstat
master15.delcatty.gent.vsc :
Job ID Name User Time Use S Queue
----------- ------- --------- -------- - -----
123456 .... mpi vsc40000 0 Q short
Here:
As we learned above, Moab is the software application that actually decides when to run your
job and what resources your job will run on. For security reasons, it is not possible to see what
other users are doing on the clusters. As such, the PBS qstat command only gives information
about your own jobs that are queued or running, ordered by JobID.
However, you can get some idea of the load on the clusters by specifying the -q option to the
qstat command:
44
4.4 Specifying job requirements
$ qstat -q
server: master15.delcatty.gent.vsc
In this example, 477 jobs are queued in the various queues whereas 127 jobs are effectively
running.
Without giving more information about your job upon submitting it with qsub, default values
will be assumed that are almost never appropriate for real jobs.
It is important to estimate the resources you need to successfully run your program, such as the
amount of time the job will require, the amount of memory it needs, the number of CPUs it will
run on, etc. This may take some work, but it is necessary to ensure your jobs will run properly.
The qsub command takes several options to specify the requirements, of which we list the most
commonly used ones below.
$ qsub -l walltime=2:30:00
For the simplest cases, only the amount of maximum estimated execution time (called walltime)
is really important. Here, the job will not require more than 2 hours, 30 minutes to complete.
As soon as the job would take more time, it will be killed (terminated) by the job scheduler.
There is absolutly no harm if you slightly overestimate the maximum execution time.
$ qsub -l mem=4gb
$ qsub -l nodes=5:ppn=2
The job requires 5 compute nodes with two cores on each node (ppn stands for processors per
node, where processor is used to refer to individual cores).
45
Chapter 4. Running batch jobs
$ qsub -l nodes=1:westmere
The job requires just one node, but it should have an Intel Westmere processor. A list with site-
specific properties can be found in the next section or in the User Portal (Available hardware-
section)1 of the VSC website.
These options can either be specified on the command line, e.g.
or in the job-script itself using the #PBS-directive, so fibo.pbs could be modified to:
1 #!/bin/bash -l
2 #PBS -l nodes=1:1
3 #PBS -l mem=2gb
4 cd $PBS_O_WORKDIR
5 ./fibo.pl
Note that the resources requested on the command line will override those specified in the PBS
file.
In order to guarantee a fair share access to the computer resources to all users, only a limited
number of jobs with certain walltimes are possible per user.
We therefore classify the submitted jobs in categories (confusingly also called queues), depending
on the their walltime specification. A user is allowed to run up to a certain maximum number
of jobs in each of these walltime categories.
The currently defined walltime categories for the HPC are:
Queue Walltime Max # Jobs
category
Minimum Maximum Queuable Runnable
/ from / to
(value (value in-
not in- cluded)
cluded)
short 0 1 hour
long 0 72 hours
bshort 0 1 hour
debug 0 15 minutes
1
URL: https://www.vscentrum.be/infrastructure/hardware
46
4.5 Job output and error files
The following table contains some node-specific properties that can be used to make sure the
job will run on nodes with a specific CPU or interconnect. Note that these properties may vary
over the different VSC sites.
To get a list of all properties defined for all nodes, enter
$ pbsnodes
This list will also contain properties referring to, e.g., network components, rack number, etc.
At some point your job finishes, so you may no longer see the job ID in the list of jobs when you
run qstat (since it will only be listed for a few minutes after completion with state C). After
your job finishes, you should see the standard output and error of your job in two files, located
by default in the directory where you issued the qsub command.
When you navigate to that directory and list its contents, you should see them:
$ ls -l
total 1024
-rw-r--r-- 1 vsc40000 609 Sep 11 10:54 fibo.pl
-rw-r--r-- 1 vsc40000 68 Sep 11 10:53 fibo.pbs
-rw------- 1 vsc40000 52 Sep 11 11:03 fibo.pbs.e123456
-rw------- 1 vsc40000 1307 Sep 11 11:03 fibo.pbs.o123456
In our case, our job has created both output (fibo.pbs.o123456) and error files (fibo.pbs.e123456)
containing info written to stdout and stderr respectively.
Inspect the generated output and error files:
$ cat fibo.pbs.o123456
...
$ cat fibo.pbs.e123456
...
You can instruct the HPC to send an e-mail to your e-mail address whenever a job begins, ends
and/or aborts, by adding the following lines to the job-script fibo.pbs:
1 #PBS -m b
2 #PBS -m e
3 #PBS -m a
4 #PBS -M <your e-mail address>
47
Chapter 4. Running batch jobs
or
1 #PBS -m abe
2 #PBS -M <your e-mail address>
These options can also be specified on the command line. Try it and see what happens:
You dont have to specify the e-mail address. The system will use the e-mail address which is
connected to your VSC account.
48
Chapter 5
5.1 Introduction
Interactive jobs are jobs which give you an interactive session on one of the compute nodes.
Importantly, accessing the compute nodes this way means that the job control system guarantees
the resources that you have asked for.
Interactive PBS jobs are similar to non-interactive PBS jobs in that they are submitted to PBS
via the command qsub. Where an interactive job differs is that it does not require a job script,
the required PBS directives can be specified on the command line.
Interactive jobs can be useful to debug certain job scripts or programs, but should not be the
main use of the UGent-HPC. Waiting for user input takes a very long time in the life of a CPU
and does not make efficient usage of the computing resources.
The syntax for qsub for submitting an interactive PBS job is:
$ qsub -I <...pbs directives ...>
This means that youre now working on the login-node login.hpc.ugent.be of the HPC cluster.
The most basic way to start an interactive job is the following:
$ qsub -I
qsub: waiting for job 123456.master15.delcatty.gent.vsc to start
qsub: job 123456.master15.delcatty.gent.vsc ready
49
Chapter 5. Running interactive jobs
1. The qsub command (with the interactive -I flag) waits until a node is assigned to your
interactive session, connects to the compute node and shows you the terminal prompt on
that node.
2. Youll see that your directory structure of your home directory has remained the same.
Your home directory is actually located on a shared storage system. This means that the
exact same directory is available on all login nodes and all compute nodes on all clusters.
Note that we are now working on the compute-node called node2001.delcatty.gent.vsc. This
is the compute node, which was assigned to us by the scheduler after issuing the qsub -I
command.
Now, go to the directory of our second interactive example and run the program primes.py.
This program will ask you for an upper limit (> 1) and will print all the primes between 1 and
your upper limit:
$ cd /examples/Running-interactive-jobs
$ ./primes.py
This program calculates all primes between 1 and your upper limit.
Enter your upper limit (>1): 50
Start Time: 2013-09-11 15:49:06
[Prime#1] = 1
[Prime#2] = 2
[Prime#3] = 3
[Prime#4] = 5
[Prime#5] = 7
[Prime#6] = 11
[Prime#7] = 13
[Prime#8] = 17
[Prime#9] = 19
[Prime#10] = 23
[Prime#11] = 29
[Prime#12] = 31
[Prime#13] = 37
[Prime#14] = 41
[Prime#15] = 43
[Prime#16] = 47
End Time: 2013-09-11 15:49:06
Duration: 0 seconds.
Note that you can now use this allocated node for 1 hour. After this hour you will be auto-
matically disconnected. You can change this usage time by explicitly specifying a walltime,
i.e., the time that you want to work on this node. (Think of walltime as the time elapsed when
watching the clock on the wall.)
You can work for 3 hours by:
50
5.3 Interactive jobs, with graphical support
$ qsub -I -l walltime=03:00:00
If the walltime of the job is exceeded, the (interactive) job will be killed and your connection to
the compute node will be closed. So do make sure to provide adequate walltime and that you
save your data before your (wall)time is up (exceeded)! When you do not specify a walltime,
you get a default walltime of 1 hour.
To display graphical applications from a Linux computer (such as the VSC clusters) on your
machine, you need to install an X Window server on your local computer. An X Window server
is packaged by default on most linux distributions. If you have a graphical user interface this
generally means that you are using an X Window server.
The X Window system (commonly known as X11, based on its current major version being 11,
or shortened to simply X) is the system-level software infrastructure for the windowing GUI
on Linux, BSD and other UNIX-like operating systems. It was designed to handle both local
displays, as well as displays sent across a network. More formally, it is a computer software
system and network protocol that provides a basis for graphical user interfaces (GUIs) and rich
input device capability for networked computers.
In order to get the graphical output of your application (which is running on a compute node
on the HPC) transferred to your personal screen, you will need to reconnect to the HPC with
X-forwarding enabled, which is done with the -X option.
$ exit
$ ssh -X <vsc-account>@login.hpc.ugent.be
$ hostname -f
gligar01.gligar.gent.vsc
51
Chapter 5. Running interactive jobs
We first check whether our GUIs on the login node are decently forwarded to your screen on
your local machine. An easy way to test it is by running a small X-application on the login node.
Type:
$ xclock
You can close your clock and connect further to a compute node with again your X-forwarding
enabled:
$ qsub -I -X
qsub: waiting for job 123456.master15.delcatty.gent.vsc to start
qsub: job 123456.master15.delcatty.gent.vsc ready
$ hostname -f
node2001.delcatty.gent.vsc
$ xclock
We have developed a little interactive program that shows the communication in 2 directions. It
will send information to your local screen, but also asks you to click a button.
Now run the message program:
$ cd /examples/Running-interactive-jobs
$ ./message.py
52
5.3 Interactive jobs, with graphical support
-----------------------
< Enjoy the day! Mooh >
-----------------------
^__^
(oo)\_______
(__)\ )\/\
||----w |
|| ||
53
Chapter 6
You have now learned how to start a batch job and how to start an interactive session. The
next question is how to deal with input and output files, where your standard output and error
messages will go to and where that you can collect your results.
$ cd /examples/Running-jobs-with-input-output-data
Now, let us inspect the contents of the first executable (which is just a Python script with execute
permission).
54
6.1 The current directory and output and error files
file1.py
1 #!/usr/bin/env python
2 #
3 # VSC : Flemish Supercomputing Centre
4 # Tutorial : Introduction to HPC
5 # Description: Writing to the current directory, stdout and stderr
6 #
7 import sys
8
9 # Step #1: write to a local file in your current directory
10 local_f = open("Hello.txt", w+)
11 local_f.write("Hello World!\n")
12 local_f.write("I am writing in the file:<Hello.txt>.\n")
13 local_f.write("in the current directory.\n")
14 local_f.write("Cheers!\n")
15 local_f.close()
16
17 # Step #2: Write to stdout
18 sys.stdout.write("Hello World!\n")
19 sys.stdout.write("I am writing to <stdout>.\n")
20 sys.stdout.write("Cheers!\n")
21
22 # Step #3: Write to stderr
23 sys.stderr.write("Hello World!\n")
24 sys.stderr.write("This is NO ERROR or WARNING.\n")
25 sys.stderr.write("I am just writing to <stderr>.\n")
26 sys.stderr.write("Cheers!\n")
file1a.pbs
1 #!/bin/bash
2
3 #PBS -l walltime=00:05:00
4
5 # go to the (current) working directory (optional, if this is the
6 # directory where you submitted the job)
7 cd $PBS_O_WORKDIR
8
9 # the program itself
10 echo Start Job
11 date
12 ./file1.py
13 echo End Job
55
Chapter 6. Running jobs with input/output data
Youll see that there are NO specific PBS directives for the placement of the output files. All
output files are just written to the standard paths.
Submit it:
$ qsub file1a.pbs
After the job has finished, inspect the local directory again, i.e., the directory where you executed
the qsub command:
$ ls -l
total 3072
-rw-rw-r-- 1 vsc40000 90 Sep 13 13:13 Hello.txt
-rwxrwxr-x 1 vsc40000 693 Sep 13 13:03 file1.py*
-rw-rw-r-- 1 vsc40000 229 Sep 13 13:01 file1a.pbs
-rw------- 1 vsc40000 91 Sep 13 13:13 file1a.pbs.e123456
-rw------- 1 vsc40000 105 Sep 13 13:13 file1a.pbs.o123456
-rw-rw-r-- 1 vsc40000 143 Sep 13 13:07 file1b.pbs
-rw-rw-r-- 1 vsc40000 177 Sep 13 13:06 file1c.pbs
-rw-r--r-- 1 vsc40000 1393 Sep 13 10:41 file2.pbs
-rwxrwxr-x 1 vsc40000 2393 Sep 13 10:40 file2.py*
-rw-r--r-- 1 vsc40000 1393 Sep 13 10:41 file3.pbs
-rwxrwxr-x 1 vsc40000 2393 Sep 13 10:40 file3.py*
Some observations:
2. The file file1a.pbs.o123456 contains all the text that was written to the standard
output stream (stdout).
3. The file file1a.pbs.e123456 contains all the text that was written to the standard
error stream (stderr).
$ cat Hello.txt
$ cat file1a.pbs.o123456
$ cat file1a.pbs.e123456
$ rm Hello.txt file1a.pbs.o123456 file1a.pbs.e123456
Tip: Type cat H and press the TAB button, and it will expand into full filename Hello.txt.
56
6.1 The current directory and output and error files
file1b.pbs
1 #!/bin/bash
2
3 # Specify the "name" of the job
4 #PBS -N my_serial_job
5
6 cd $PBS_O_WORKDIR
7 echo Start Job
8 date
9 ./file1.py
10 echo End Job
$ qsub file1b.pbs
$ ls
Hello.txt file1a.pbs file1c.pbs file2.pbs file3.pbs my_serial_job.e123456
file1.py* file1b.pbs file2.py* file3.py* my_serial_job.o123456
$ rm Hello.txt my_serial_job.*
Here, the option -N was used to explicitly assign a name to the job. This overwrote the
JOBNAME variable, and resulted in a different name for the stdout and stderr files. This name
is also shown in the second column of the qstat command. If no name is provided, it defaults
to the name of the job script.
You can also specify the name of stdout and stderr files explicitly by adding two lines in the
job-script, as in our third example:
file1c.pbs
1 #!/bin/bash
2
3 # redirect standard output (-o) and error (-e)
4 #PBS -o stdout.$PBS_JOBID
5 #PBS -e stderr.$PBS_JOBID
6
7 cd $PBS_O_WORKDIR
8 echo Start Job
9 date
10 ./file1.py
11 echo End Job
$ qsub file1c.pbs
$ ls
57
Chapter 6. Running jobs with input/output data
The HPC cluster offers their users several locations to store their data. Most of the data will
reside on the shared storage system, but all compute nodes also have their own (small) local
disk.
Three different pre-defined user directories are available, where each directory has been created
for different purposes. The best place to store your data depends on the purpose, but also the
size and type of usage of the data.
The following locations are available:
Variable Description
Long-term storage slow filesystem, intended for smaller files
$VSC_HOME For your configuration files and other small files, see 6.2.2.
The default directory is /user/gent/xxx/<vsc-account>.
The same file system is accessible from all sites, i.e., youll
see the same contents in $VSC_HOME on all sites.
$VSC_DATA A bigger workspace, for datasets, results, logfiles, etc. see
6.2.3.
The default directory is /data/gent/xxx/<vsc-account>.
The same file system is accessible from all sites.
Fast temporary storage
$VSC_SCRATCH_NODE For temporary or transient data on the local compute node,
where fast access is important; see 6.2.4.
This space that is available per node. The default directory
is /tmp. On different nodes, youll see different content.
$VSC_SCRATCH For temporary or transient data that has to be accessible
from all nodes of a cluster (including the login nodes)
The default directory is /scratch/gent/xxx/<vsc-account>
This directory is cluster- or site-specific: On different sites,
and sometimes on different clusters on the same site, youll
get a different directory with different content.
$VSC_SCRATCH_SITE Currently the same as $VSC_SCRATCH, but could be used
for a scratch space shared across all clusters at a site in the
future. See 6.2.4.
$VSC_SCRATCH_GLOBAL Currently the same as $VSC_SCRATCH, but could be used
for a scratch space shared across all clusters of the VSC in
the future. See 6.2.4.
$VSC_SCRATCH_CLUSTER The scratch filesystem closest to this cluster.
$VSC_SCRATCH_VO The scratch filesystem for your VO storage, this can have a
bigger quota than your personal scratch.
Since these directories are not necessarily mounted on the same locations over all sites, you
should always (try to) use the environment variables that have been created.
We ellaborate more on the specific function of these locations in the following sections.
58
6.2 Where to store your data on the HPC
Your home directory is where you arrive by default when you login to the cluster. Your shell refers
to it as (tilde), and its absolute path is also stored in the environment variable $VSC_HOME.
Your home directory is shared across all clusters of the VSC.
The data stored here should be relatively small (e.g., no files or directories larger than a few
megabytes), and preferably should only contain configuration files. Note that various kinds of
configuration files are also stored here, e.g., by MATLAB, Eclipse, . . .
The operating system also creates a few files and folders here to manage your account. Examples
are:
File or Direc- Description
tory
.ssh/ This directory contains some files necessary for you to login to the cluster
and to submit jobs on the cluster. Do not remove them, and do not alter
anything if you dont know what you are doing!
.bash_profile When you login (type username and password) remotely via ssh,
.bash_profile is executed to configure your shell before the initial com-
mand prompt.
.bashrc This script is executed every time you start a session on the cluster: when
you login to the cluster and when a job starts.
.bash_history This file contains the commands you typed at your shell prompt, in case
you need them again.
In this directory you can store all other data that you need for longer terms (such as the results
of previous jobs, . . . ). It is a good place for, e.g., storing big files like genome data.
The environment variable pointing to this directory is $VSC_DATA. This volume is shared
across all clusters of the VSC. There are however no guarantees about the speed you will achieve
on this volume. For guaranteed fast performance and very heavy I/O, you should use the scratch
space instead. If you are running out of quota on your $VSC_DATA filesystem you can request
a VO (Virtual organisation). VO members get an extra DATA filesystem: $VSC_DATA_VO)
We can give higher quota values for these filesystems. Dont hesitate to contact the UGent HPC
team staff.
To enable quick writing from your job, a few extra file systems are available on the compute nodes.
These extra file systems are called scratch folders, and can be used for storage of temporary
and/or transient data (temporary results, anything you just need during your job, or your batch
of jobs).
You should remove any data from these systems after your processing them has finished. There
are no guarantees about the time your data will be stored on this system, and we plan to clean
these automatically on a regular base. The maximum allowed age of files on these scratch file
systems depends on the type of scratch, and can be anywhere between a day and a few weeks. We
59
Chapter 6. Running jobs with input/output data
dont guarantee that these policies remain forever, and may change them if this seems necessary
for the healthy operation of the cluster.
Each type of scratch has its own use:
Node scratch ($VSC_SCRATCH_NODE). Every node has its own scratch space, which
is completely separated from the other nodes. On some clusters, it will be on a local disk
in the node, while on other clusters it will be emulated through another file server. In
many cases, it will be significantly slower than the cluster scratch as it typically consists
of just a single disk. Some drawbacks are that the storage can only be accessed on that
particular node and that the capacity is often very limited (e.g., 100 GB). The performance
will depend a lot on the particular implementation in the cluster. In many cases, it will
be significantly slower than the cluster scratch as it typically consists of just a single disk.
However, if that disk is local to the node (as on most clusters), the performance will not
depend on what others are doing on the cluster.
Cluster scratch ($VSC_SCRATCH). To allow a job running on multiple nodes (or multiple
jobs running on separate nodes) to share data as files, every node of the cluster (including
the login nodes) has access to this shared scratch directory. Just like the home and data
directories, every user has its own scratch directory. Because this scratch is also available
from the login nodes, you could manually copy results to your data directory after your job
has ended. Also, this type of scratch is usually implemented by running tens or hundreds
of disks in parallel on a powerful file server with fast connection to all the cluster nodes
and therefore is often the fastest file system available on a cluster.
You may not get the same file system on different clusters, i.e., you may see different
content on different clusters at the same intitute.
Site scratch ($VSC_SCRATCH_SITE). At the time of writing, the site scratch is just the
same volume as the cluster scratch, and thus contains the same data. In the future it may
point to a different scratch file system that is available across all clusters at a particular
site, which is in fact the case for the cluster scratch on some sites.
Quota is enabled on these directories, which means that the amount of data you can store there
is limited. This holds for both the total size of all files as well as the total number of files that
can be stored. The system works with a soft quota and a hard quota. You can temporarily
exceed the soft quota, but you can never exceed the hard quota. The user will get warnings as
soon as he exceeds the soft quota.
To see your current quota usage you can run the show_quota script on the UGent-HPC.
60
6.3 Writing Output files
bash-4.1\$ show_quota
User quota:
VSC_HOME: used 1.08 GiB (37%) quota 2.85 GiB (3 GiB hard limit)
VSC_DATA_VO: used 8.16 MiB (0%) quota 1.62 TiB (1.71 TiB hard limit)
VSC_DATA: used 0 B (0%) quota 23.8 GiB (25 GiB hard limit)
VSC_SCRATCH_GULPIN_VO: used 41.3 GiB (1%) quota 2.32 TiB (2.44 TiB hard limit)
VSC_SCRATCH_GULPIN: used 7.78 MiB (0%) quota 23.8 GiB (25 GiB hard limit)
VSC_SCRATCH_DELCATTY_VO: used 340 GiB (7%) quota 4.52 TiB (4.76 TiB hard limit)
VSC_SCRATCH_DELCATTY: used 11.4 GiB (47%) quota 23.8 GiB (25 GiB hard limit)
You can also visit the account page (https://account.vscentrum.be to see a list of your
current quota and VO moderators can see a list of VO quota usage per member of their VO.
The rules are:
1. You will only receive a warning when you have reached the soft limit of either quota.
2. You will start losing data and get I/O errors when you reach the hard limit. In this case,
data loss will occur since nothing can be written anymore (this holds both for new files as
well as for existing files), until you free up some space by removing some files. Also note
that you will not be warned when data loss occurs, so keep an eye open for the general
quota warnings!
3. The same holds for running jobs that need to write files: when you reach your hard quota,
jobs will crash.
We do realise that quota are often observed as a nuisance by users, especially if youre running
low on it. However, it is an essential feature of a shared infrastructure. Quota ensure that a
single user cannot accidentally take a cluster down (and break other users jobs) by filling up
the available disk space. And they help to guarantee a fair use of all available resources for all
users. Quota also help to ensure that each folder is used for its intended purpose.
In the next exercise, you will generate a file in the $VSC_SCRATCH directory. In order to
generate some CPU- and disk-I/O load, we will
1. take a random integer between 1 and 2000 and calculate all primes up to that limit;
Check the Python and the PBS file, and submit the job: Remember that this is already a more
serious (disk-I/O and computational intensive) job, which takes approximately 3 minutes on the
HPC.
61
Chapter 6. Running jobs with input/output data
$ cat file2.py
$ cat file2.pbs
$ qsub file2.pbs
$ qstat
$ ls -l
$ echo $VSC_SCRATCH
$ ls -l $VSC_SCRATCH
$ more $VSC_SCRATCH/primes_1.txt
Check the Python and the PBS file, and submit the job:
$ cat file3.py
$ cat file3.pbs
$ qsub file3.pbs
$ qstat
$ ls -l
$ more $VSC_SCRATCH/primes_2.txt
...
6.5.1 Quota
The available disk space on the HPC is limited. The actual disk capacity, shared by all users, can
be found on the Available hardware page on the website. (https://www.vscentrum.be/
infrastructure/hardware) As explained in 6.2.5, this implies that there are also limits
62
6.5 How much disk space do I get?
The show_quota command has been developed to show you the status of your quota in a
readable format:
$ show_quota
VSC_DATA: used 81MB (0%) quota 25600MB
VSC_HOME: used 33MB (1%) quota 3072MB
VSC_SCRATCH: used 28MB (0%) quota 25600MB
VSC_SCRATCH_GLOBAL: used 28MB (0%) quota 25600MB
VSC_SCRATCH_SITE: used 28MB (0%) quota 25600MB
With this command, you can follow up the consumption of your total disk quota easily, as it is
expressed in percentages. Depending of on which cluster you are running the script, it may not
be able to show the quota on all your folders. E.g., when running on the tier-1 system Muk, the
script will not be able to show the quota on $VSC_HOME or $VSC_DATA if your account is
a KU Leuven, UAntwerpen or VUB account.
Once your quota is (nearly) exhausted, you will want to know which directories are responsible
for the consumption of your disk space. You can check the size of all subdirectories in the current
directory with the du (Disk Usage) command:
63
Chapter 6. Running jobs with input/output data
$ du
256 ./ex01-matlab/log
1536 ./ex01-matlab
768 ./ex04-python
512 ./ex02-python
768 ./ex03-python
5632
This shows you first the aggregated size of all subdirectories, and finally the total size of the
current directory . (this includes files stored in the current directory).
If you also want this size to be human readable (and not always the total number of kilobytes),
you add the parameter -h:
$ du -h
256K ./ex01-matlab/log
1.5M ./ex01-matlab
768K ./ex04-python
512K ./ex02-python
768K ./ex03-python
5.5M .
If the number of lower level subdirectories starts to grow too big, you may not want to see the
information at that depth; you could just ask for a summary of the current directory:
$ du -s
5632 .
$ du -s -h
5.5M .
If you want to see the size of any file or top-level subdirectory in the current directory, you could
use the following command:
$ du -s -h *
1.5M ex01-matlab
512K ex02-python
768K ex03-python
768K ex04-python
256K example.sh
1.5M intro-HPC.pdf
Finally, if you dont want to know the size of the data in your current directory, but in some
other directory (e.g., your data directory), you just pass this directory as a parameter. The
command below will show the disk use in your home directory, even if you are currently in a
different directory:
$ du -h $VSC_HOME/*
22M /user/home/gent/vsc400/vsc40000 /dataset01
36M /user/home/gent/vsc400/vsc40000 /dataset02
22M /user/home/gent/vsc400/vsc40000 /dataset03
3.5M /user/home/gent/vsc400/vsc40000 /primes.txt
64
Chapter 7
1. Firstly, the need to decrease the time to solution: distributing your code over C cores
holds the promise of speeding up execution times by a factor C. All modern computers
(and probably even your smartphone) are equipped with multi-core processors capable of
parallel processing.
2. The second reason is problem size: distributing your code over N nodes increases the
available memory by a factor N, and thus holds the promise of being able to tackle problems
which are N times bigger.
On a desktop computer, this enables a user to run multiple programs and the operating system
simultaneously. For scientific computing, this means you have the ability in principle of splitting
up your computations into groups and running each group on its own core.
There are multiple different ways to achieve parallel programming. The table below gives a (non-
exhaustive) overview of problem independent approaches to parallel programming. In addition
there are many problem specific libraries that incorporate parallel capabilities. The next three
sections explore some common approaches: (raw) threads, OpenMP and MPI.
65
Chapter 7. Multi core jobs/Parallel Computing
Multi-threading is a widespread programming and execution model that allows multiple threads
to exist within the context of a single process. These threads share the process resources, but
are able to execute independently. The threaded programming model provides developers with
a useful abstraction of concurrent execution. Multi-threading can also be applied to a single
process to enable parallel execution on a multiprocessing system.
66
7.2 Parallel Computing with threads
$ cd /examples/Multi-core-jobs-Parallel-Computing
67
Chapter 7. Multi core jobs/Parallel Computing
T_hello.c
1 /*
2 * VSC : Flemish Supercomputing Centre
3 * Tutorial : Introduction to HPC
4 * Description: Showcase of working with threads
5 */
6 #include <stdio.h>
7 #include <stdlib.h>
8 #include <pthread.h>
9
10 #define NTHREADS 5
11
12 void *myFun(void *x)
13 {
14 int tid;
15 tid = *((int *) x);
16 printf("Hello from thread %d!\n", tid);
17 return NULL;
18 }
19
20 int main(int argc, char *argv[])
21 {
22 pthread_t threads[NTHREADS];
23 int thread_args[NTHREADS];
24 int rc, i;
25
26 /* spawn the threads */
27 for (i=0; i<NTHREADS; ++i)
28 {
29 thread_args[i] = i;
30 printf("spawning thread %d\n", i);
31 rc = pthread_create(&threads[i], NULL, myFun, (void *) &thread_args[i]);
32 }
33
34 /* wait for threads to finish */
35 for (i=0; i<NTHREADS; ++i) {
36 rc = pthread_join(threads[i], NULL);
37 }
38
39 return 1;
40 }
And compile it (whilst including the thread library) and run and test it on the login-node:
68
7.3 Parallel Computing with OpenMP
$ qsub T_hello.pbs
123456.master15.delcatty.gent.vsc
$ more T_hello.pbs.o123456
spawning thread 0
spawning thread 1
spawning thread 2
Hello from thread 0!
Hello from thread 1!
Hello from thread 2!
spawning thread 3
spawning thread 4
Hello from thread 3!
Hello from thread 4!
Tip: If you plan engaging in parallel programming using threads, this book may prove useful:
Professional Multicore Programming: Design and Implementation for C++ Developers. Cameron
Hughes and Tracey Hughes. Wrox 2008.
69
Chapter 7. Multi core jobs/Parallel Computing
By using the private() and shared() clauses, you can specify variables within the parallel region
as being shared, i.e., visible and accessible by all threads simultaneously, or private, i.e., private
to each thread, meaning each thread will have its own local copy. In the code example below
for parallelising a for loop, you can see that we specify the thread_id and nloops variables as
private.
Parallelising for loops is really simple (see code below). By default, loop iteration counters in
OpenMP loop constructs (in this case the i variable) in the for loop are set to private variables.
70
7.3 Parallel Computing with OpenMP
omp1.c
1 /*
2 * VSC : Flemish Supercomputing Centre
3 * Tutorial : Introduction to HPC
4 * Description: Showcase program for OMP loops
5 */
6 /* OpenMP_loop.c */
7 #include <stdio.h>
8 #include <omp.h>
9
10 int main(int argc, char **argv)
11 {
12 int i, thread_id, nloops;
13
14 #pragma omp parallel private(thread_id, nloops)
15 {
16 nloops = 0;
17
18 #pragma omp for
19 for (i=0; i<1000; ++i)
20 {
21 ++nloops;
22 }
23 thread_id = omp_get_thread_num();
24 printf("Thread %d performed %d iterations of the loop.\n", thread_id, nloops );
25 }
26
27 return 0;
28 }
And compile it (whilst including the openmp library) and run and test it on the login-node:
71
Chapter 7. Multi core jobs/Parallel Computing
$ qsub omp1.pbs
$ cat omp1.pbs.o*
Thread 1 performed 125 iterations of the loop.
Thread 4 performed 125 iterations of the loop.
Thread 3 performed 125 iterations of the loop.
Thread 0 performed 125 iterations of the loop.
Thread 5 performed 125 iterations of the loop.
Thread 7 performed 125 iterations of the loop.
Thread 2 performed 125 iterations of the loop.
Thread 6 performed 125 iterations of the loop.
Using OpenMP you can specify something called a critical section of code. This is code that
is performed by all threads, but is only performed one thread at a time (i.e., in serial). This
provides a convenient way of letting you do things like updating a global variable with local
results from each thread, and you dont have to worry about things like other threads writing to
that global variable at the same time (a collision).
72
7.3 Parallel Computing with OpenMP
omp2.c
1 /*
2 * VSC : Flemish Supercomputing Centre
3 * Tutorial : Introduction to HPC
4 * Description: OpenMP Test Program
5 */
6 #include <stdio.h>
7 #include <omp.h>
8
9 int main(int argc, char *argv[])
10 {
11 int i, thread_id;
12 int glob_nloops, priv_nloops;
13 glob_nloops = 0;
14
15 // parallelize this chunk of code
16 #pragma omp parallel private(priv_nloops, thread_id)
17 {
18 priv_nloops = 0;
19 thread_id = omp_get_thread_num();
20
21 // parallelize this for loop
22 #pragma omp for
23 for (i=0; i<100000; ++i)
24 {
25 ++priv_nloops;
26 }
27
28 // make this a "critical" code section
29 #pragma omp critical
30 {
31 printf("Thread %d is adding its iterations (%d) to sum (%d), ", thread_id,
priv_nloops, glob_nloops);
32 glob_nloops += priv_nloops;
33 printf("total is now %d.\n", glob_nloops);
34 }
35 }
36 printf("Total # loop iterations is %d\n", glob_nloops);
37 return 0;
38 }
And compile it (whilst including the openmp library) and run and test it on the login-node:
73
Chapter 7. Multi core jobs/Parallel Computing
$ qsub omp2.pbs
$ cat omp2.pbs.o*
Thread 2 is adding its iterations (12500) to sum (0), total is now 12500.
Thread 0 is adding its iterations (12500) to sum (12500), total is now 25000.
Thread 1 is adding its iterations (12500) to sum (25000), total is now 37500.
Thread 4 is adding its iterations (12500) to sum (37500), total is now 50000.
Thread 7 is adding its iterations (12500) to sum (50000), total is now 62500.
Thread 3 is adding its iterations (12500) to sum (62500), total is now 75000.
Thread 5 is adding its iterations (12500) to sum (75000), total is now 87500.
Thread 6 is adding its iterations (12500) to sum (87500), total is now 100000.
Total # loop iterations is 100000
7.3.4 Reduction
Reduction refers to the process of combining the results of several sub-calculations into a final
result. This is a very common paradigm (and indeed the so-called map-reduce framework used
by Google and others is very popular). Indeed we used this paradigm in the code example above,
where we used the critical code directive to accomplish this. The map-reduce paradigm is so
common that OpenMP has a specific directive that allows you to more easily implement this.
74
7.3 Parallel Computing with OpenMP
omp3.c
1 /*
2 * VSC : Flemish Supercomputing Centre
3 * Tutorial : Introduction to HPC
4 * Description: OpenMP Test Program
5 */
6 #include <stdio.h>
7 #include <omp.h>
8
9 int main(int argc, char *argv[])
10 {
11 int i, thread_id;
12 int glob_nloops, priv_nloops;
13 glob_nloops = 0;
14
15 // parallelize this chunk of code
16 #pragma omp parallel private(priv_nloops, thread_id) reduction(+:glob_nloops)
17 {
18 priv_nloops = 0;
19 thread_id = omp_get_thread_num();
20
21 // parallelize this for loop
22 #pragma omp for
23 for (i=0; i<100000; ++i)
24 {
25 ++priv_nloops;
26 }
27 glob_nloops += priv_nloops;
28 }
29 printf("Total # loop iterations is %d\n", glob_nloops);
30 return 0;
31 }
And compile it (whilst including the openmp library) and run and test it on the login-node:
$ qsub omp3.pbs
$ cat omp3.pbs.o*
Total # loop iterations is 100000
There are a host of other directives you can issue using OpenMP.
Some other clauses of interest are:
75
Chapter 7. Multi core jobs/Parallel Computing
1. barrier: each thread will wait until all threads have reached this point in the code, before
proceeding
3. schedule(type, chunk) allows you to specify how tasks are spawned out to threads in a for
loop. There are three types of scheduling you can specify
Tip: If you plan engaging in parallel programming using OpenMP, this book may prove useful:
Using OpenMP - Portable Shared Memory Parallel Programming. By Barbara Chapman Gabriele
Jost and Ruud van der Pas Scientific and Engineering Computation. 2005.
The Message Passing Interface (MPI) is a standard defining core syntax and semantics of library
routines that can be used to implement parallel programming in C (and in other languages as
well). There are several implementations of MPI such as Open MPI, Intel MPI, M(VA)PICH
and LAM/MPI.
In the context of this tutorial, you can think of MPI, in terms of its complexity, scope and
control, as sitting in between programming with Pthreads, and using a high-level API such as
OpenMP. For a Message Passing Interface (MPI) application, a parallel task usually consists of
a single executable running concurrently on multiple processors, with communication between
the processes. This is shown in the following diagram:
The process numbers 0, 1 and 2 represent the process rank and have greater or less significance
depending on the processing paradigm. At the minimum, Process 0 handles the input/output
and determines what other processes are running.
The MPI interface allows you to manage allocation, communication, and synchronisation of a
set of processes that are mapped onto multiple nodes, where each node can be a core within a
single CPU, or CPUs within a single machine, or even across multiple machines (as long as they
are networked together).
One context where MPI shines in particular is the ability to easily take advantage not just of
multiple cores on a single machine, but to run programs on clusters of several machines. Even if
76
7.4 Parallel Computing with MPI
you dont have a dedicated cluster, you could still write a program using MPI that could run your
program in parallel, across any collection of computers, as long as they are networked together.
Here is a Hello World program in MPI written in C. In this example, we send a Hello message
to each processor, manipulate it trivially, return the results to the main process, and print the
messages.
Study the MPI-programme and the PBS-file:
77
Chapter 7. Multi core jobs/Parallel Computing
mpi_hello.c
1 /*
2 * VSC : Flemish Supercomputing Centre
3 * Tutorial : Introduction to HPC
4 * Description: "Hello World" MPI Test Program
5 */
6 #include <stdio.h>
7 #include <mpi.h>
8
9 #include <mpi.h>
10 #include <stdio.h>
11 #include <string.h>
12
13 #define BUFSIZE 128
14 #define TAG 0
15
16 int main(int argc, char *argv[])
17 {
18 char idstr[32];
19 char buff[BUFSIZE];
20 int numprocs;
21 int myid;
22 int i;
23 MPI_Status stat;
24 /* MPI programs start with MPI_Init; all N processes exist thereafter */
25 MPI_Init(&argc,&argv);
26 /* find out how big the SPMD world is */
27 MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
28 /* and this processes rank is */
29 MPI_Comm_rank(MPI_COMM_WORLD,&myid);
30
31 /* At this point, all programs are running equivalently, the rank
32 distinguishes the roles of the programs in the SPMD model, with
33 rank 0 often used specially... */
34 if(myid == 0)
35 {
36 printf("%d: We have %d processors\n", myid, numprocs);
37 for(i=1;i<numprocs;i++)
38 {
39 sprintf(buff, "Hello %d! ", i);
40 MPI_Send(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD);
41 }
42 for(i=1;i<numprocs;i++)
43 {
44 MPI_Recv(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD, &stat);
45 printf("%d: %s\n", myid, buff);
46 }
47 }
48 else
49 {
50 /* receive from rank 0: */
51 MPI_Recv(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD, &stat);
52 sprintf(idstr, "Processor %d ", myid);
53 strncat(buff, idstr, BUFSIZE-1);
54 strncat(buff, "reporting for duty", BUFSIZE-1);
55 /* send to rank 0: */
56 MPI_Send(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD);
57 }
58
59 /* MPI programs end with MPI Finalize; this is a weak synchronization point */
60 MPI_Finalize();
61 return 0; 78
62 }
7.4 Parallel Computing with MPI
mpi_hello.pbs
1 #!/bin/bash
2
3 #PBS -N mpihello
4 #PBS -l walltime=00:05:00
5
6 # assume a 40 core job
7 #PBS -l nodes=2:ppn=20
8
9 # make sure we are in the right directory in case writing files
10 cd $PBS_O_WORKDIR
11
12 # load the environment
13
14 module load intel
15
16 mpirun ./mpi_hello
mpiicc is a wrapper of the Intel C++ compiler icc to compile MPI programs (see the chapter on
compilation for details).
Run the parallel program:
$ qsub mpi_hello.pbs
$ ls -l
total 1024
-rwxrwxr-x 1 vsc40000 8746 Sep 16 14:19 mpi_hello*
-rw-r--r-- 1 vsc40000 1626 Sep 16 14:18 mpi_hello.c
-rw------- 1 vsc40000 0 Sep 16 14:22 mpi_hello.o123456
-rw------- 1 vsc40000 697 Sep 16 14:22 mpi_hello.o123456
-rw-r--r-- 1 vsc40000 304 Sep 16 14:22 mpi_hello.pbs
$ cat mpi_hello.o123456
0: We have 16 processors
0: Hello 1! Processor 1 reporting for duty
0: Hello 2! Processor 2 reporting for duty
0: Hello 3! Processor 3 reporting for duty
0: Hello 4! Processor 4 reporting for duty
0: Hello 5! Processor 5 reporting for duty
0: Hello 6! Processor 6 reporting for duty
0: Hello 7! Processor 7 reporting for duty
0: Hello 8! Processor 8 reporting for duty
0: Hello 9! Processor 9 reporting for duty
0: Hello 10! Processor 10 reporting for duty
0: Hello 11! Processor 11 reporting for duty
0: Hello 12! Processor 12 reporting for duty
0: Hello 13! Processor 13 reporting for duty
0: Hello 14! Processor 14 reporting for duty
0: Hello 15! Processor 15 reporting for duty
79
Chapter 7. Multi core jobs/Parallel Computing
The runtime environment for the MPI implementation used (often called mpirun or mpiexec)
spawns multiple copies of the program, with the total number of copies determining the number
of process ranks in MPI_COMM_WORLD, which is an opaque descriptor for communication
between the set of processes. A single process, multiple data (SPMD = Single Program, Mul-
tiple Data) programming model is thereby facilitated, but not required; many MPI implemen-
tations allow multiple, different, executables to be started in the same MPI job. Each process
has its own rank, the total number of processes in the world, and the ability to communicate
between them either with point-to-point (send/receive) communication, or by collective com-
munication among the group. It is enough for MPI to provide an SPMD-style program with
MPI_COMM_WORLD, its own rank, and the size of the world to allow algorithms to decide
what to do. In more realistic situations, I/O is more carefully managed than in this exam-
ple. MPI does not guarantee how POSIX I/O would actually work on a given system, but it
commonly does work, at least from rank 0.
MPI uses the notion of process rather than processor. Program copies are mapped to processors
by the MPI runtime. In that sense, the parallel machine can map to 1 physical processor, or
N where N is the total number of processors available, or something in between. For maximum
parallel speedup, more physical processors are used. This example adjusts its behaviour to the
size of the world N, so it also seeks to scale to the runtime configuration without compilation for
each size variation, although runtime decisions might vary depending on that absolute amount
of concurrency available.
Tip: mpirun does not always do the optimal core pinning and requires a few extra arguments
to be the most efficient possible on a given system. At Ghent we have a wrapper around mpirun
called mympirun. run mympirun help for more information on how this works. You will
generally just start an mpi program on the UGent-HPC by using mympirun instead of mpirun
-n <nr of cores> <other settings> <other optimisations>
Tip: If you plan engaging in parallel programming using MPI, this book may prove useful:
Parallel Programming with MPI. Peter Pacheo. Morgan Kaufmann. 1996.
80
Chapter 8
As HPC system administrators, we often observe that the HPC resources are not optimally (or
wisely) used. For example, we regularly notice that several cores on a computing node are not
utilised, due to the fact that one sequential program uses only one core on the node. Or users
run I/O intensive applications on nodes with slow network connections.
Users often tend to run their jobs without specifying specific PBS Job parameters. As such,
their job will automatically use the default parameters, which are not necessarily (or rarely) the
optimal ones. This can slow down the run time of your application, but also block HPC resources
for other users.
Specifying the optimal Job Parameters requires some knowledge of your application (e.g., how
many parallel threads does my application uses, is there a lot of inter-process communication, how
much memory does my application need) and also some knowledge about the HPC infrastructure
(e.g., what kind of multi-core processors are available, which nodes have Infiniband).
There are plenty of monitoring tools on Linux available to the user, which are useful to analyse
your individual application. The HPC environment as a whole often requires different techniques,
metrics and time goals, which are not discussed here. We will focus on tools that can help to
optimise your Job Specifications.
Determining the optimal computer resource specifications can be broken down into different
parts. The first is actually determining which metrics are needed and then collecting that data
from the hosts. Some of the most commonly tracked metrics are CPU usage, memory consump-
tion, network bandwidth, and disk I/O stats. These provide different indications of how well
a system is performing, and may indicate where there are potential problems or performance
bottlenecks. Once the data have actually been acquired, the second task is analysing the data
and adapting your PBS Job Specifications.
Another different task is to monitor the behaviour of an application at run time and detect
anomalies or unexpected behaviour. Linux provides a large number of utilities to monitor the
performance of its components.
This chapter shows you how to measure:
1. Walltime
2. Memory usage
81
Chapter 8. Fine-tuning Job Specifications
3. CPU usage
5. Network bottlenecks
One of the most important and also easiest parameters to measure is the duration of your
program. This information is needed to specify the walltime.
The time utility executes and times your application. You can just add the time command in
front of your normal command line, including your command line options. After your executable
has finished, time writes the total time elapsed, the time consumed by system overhead, and
the time used to execute your executable to the standard error stream. The calculated times are
reported in seconds.
Test the time command:
$ time sleep 75
real 1m15.005s
user 0m0.001s
sys 0m0.002s
It is a good practice to correctly estimate and specify the run time (duration) of an application.
Of course, a margin of 10% to 20% can be taken to be on the safe side.
It is also wise to check the walltime on different compute nodes or to select the slowest compute
node for your walltime tests. Your estimate should appropriate in case your application will run
on the slowest (oldest) compute nodes.
The walltime can be specified in a job scripts as:
#PBS -l walltime=3:00:00:00
In many situations, it is useful to monitor the amount of memory an application is using. You
need this information to determine the characteristics of the required compute node, where that
application should run on. Estimating the amount of memory an application will use during
execution is often non-trivial, especially when one uses third-party software.
82
8.2 Specifying memory requirements
The first point is to be aware of the available free memory in your computer. The free command
displays the total amount of free and used physical and swap memory in the system, as well as
the buffers used by the kernel. We also use the options -m to see the results expressed in
Mega-Bytes and the -t option to get totals.
$ free -m -t
total used free shared buffers cached
Mem: 16049 4772 11277 0 107 161
-/+ buffers/cache: 4503 11546
Swap: 16002 4185 11816
Total: 32052 8957 23094
Important is to note the total amount of memory available in the machine (i.e., 16 GB in this
example) and the amount of used and free memory (i.e., 4.7 GB is used and another 11.2 GB is
free here).
It is not a good practice to use swap-space for your computational applications. A lot of swap-
ping can increase the execution time of your application tremendously.
To monitor the memory consumption of a running application, you can use the top or the
htop command.
top provides an ongoing look at processor activity in real time. It displays a listing of the most
CPU-intensive tasks on the system, and can provide an interactive interface for manipu-
lating processes. It can sort the tasks by memory usage, CPU usage and run time.
htop is similar to top, but shows the CPU-utilisation for all the CPUs in the machine and allows
to scroll the list vertically and horizontally to see all processes and their full command lines.
$ top
$ htop
Once you gathered a good idea of the overall memory consumption of your application, you can
define it in your job script. It is wise to foresee a margin of about 10%.
Sequential or single-node applications:
The maximum amount of physical memory used by the job can be specified in a job script as:
#PBS -l mem=4gb
83
Chapter 8. Fine-tuning Job Specifications
#PBS -l pmem=2gb
Users are encouraged to fully utilise all the available cores on a certain compute node. Once the
required numbers of cores and nodes are decently specified, it is also good practice to monitor
the CPU utilisation on these cores and to make sure that all the assigned nodes are working at
full load.
The number of core and nodes that a user shall request fully depends on the architecture of the
application. Developers design their applications with a strategy for parallelisation in mind. The
application can be designed for a certain fixed number or for a configurable number of nodes
and cores. It is wise to target a specific set of compute nodes (e.g., Westmere, Harpertown) for
your computing work and then to configure your software to nicely fill up all processors on these
compute nodes.
The /proc/cpuinfo stores info about your CPU architecture like number of CPUs, threads, cores,
information about CPU caches, CPU family, model and much more. So, if you want to detect
how many cores are available on a specific machine:
84
8.3 Specifying processors requirements
$ less /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz
stepping : 10
cpu MHz : 2500.088
cache size : 6144 KB
...
Remark: Unless you want information of the login nodes, youll have to issue these commands
on one of the workernodes. This is most easily achieved in an interactive job, see the chapter on
Running interactive jobs.
In order to specify the number of nodes and the number of processors per node in your job script,
use:
#PBS -l nodes=N:ppn=M
$ qsub -l nodes=N:ppn=M
This specifies the number of nodes (nodes=N) and the number of processors per node (ppn=M)
that the job should use. PBS treats a processor core as a processor, so a system with eight cores
per compute node can have ppn=8 as its maximum ppn request. You can also use this statement
in your job script:
#PBS -l nodes=N:ppn=all
#PBS -l nodes=N:ppn=half
85
Chapter 8. Fine-tuning Job Specifications
OpenMP, requesting more than a single processor on a single node is usually wasteful and can
impact the job start time.
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
22350 vsc00000 20 0 1729M 1071M 704 R 98.0 1.7 27:15.59 bwa index
7703 root 0 -20 10.1G 1289M 70156 S 11.0 2.0 36h10:11 /usr/lpp/mmfs/bin
27905 vsc00000 20 0 123M 2800 1556 R 7.0 0.0 0:17.51 htop
The advantage of htop is that it shows you the cpu utilisation for all processors as well as the
details per application. A nice exercise is to start 4 instances of the cpu_eat program in 4
different terminals, and inspect the cpu utilisation per processor with monitor and htop.
If htop reports that your program is taking 75% CPU on a certain processor, it means that 75%
of the samples taken by top found your process active on the CPU. The rest of the time your
application was in a wait. (It is important to remember that a CPU is a discrete state machine.
It really can be at only 100%, executing an instruction, or at 0%, waiting for something to do.
There is no such thing as using 45% of a CPU. The CPU percentage is a function of time.)
However, it is likely that your applications rest periods include waiting to be dispatched on a
CPU and not on external devices. That part of the wait percentage is then very relevant to
understanding your overall CPU usage pattern.
It is good practice to perform a number of run time stress tests, and to check the CPU utilisation
of your nodes. We (and all other users of the HPC) would appreciate that you use the maximum
of the CPU resources that are assigned to you and make sure that there are no CPUs in your
node who are not utilised without reasons.
But how can you maximise?
1. Configure your software. (e.g., to exactly use the available amount of processors in a node)
3. Demand a specific type of compute node (e.g., Harpertown, Westmere), which have a
specific number of cores.
86
8.4 The system load
On top of the CPU utilisation, it is also important to check the system load. The system load
is a measure of the amount of computational work that a computer system performs.
The system load is the number of applications running or waiting to run on the compute node.
In a system with for example four CPUs, a load average of 3.61 would indicate that there were,
on average, 3.61 processes ready to run, and each one could be scheduled into a CPU.
The load averages differ from CPU percentage in two significant ways:
1. load averages measure the trend of processes waiting to be run (and not only an instan-
taneous snapshot, as does CPU percentage); and
2. load averages include all demand for all resources, e.g. CPU and also I/O and network
(and not only how much was active at the time of measurement).
1. When you are running computational intensive applications, one application per pro-
cessor will generate the optimal load.
2. For I/O intensive applications (e.g., applications which perform a lot of disk-I/O), a
higher number of applications can generate the optimal load. While some applications are
reading or writing data on disks, the processors can serve other applications.
87
Chapter 8. Fine-tuning Job Specifications
The manner how the cores are spread out over CPUs does not matter for what regards the
load. Two quad-cores perform similar to four dual-cores, and again perform similar to eight
single-cores. Its all eight cores for these purposes.
The load average represents the average system load over a period of time. It conventionally
appears in the form of three numbers, which represent the system load during the last one-,
five-, and fifteen-minute periods.
The uptime command will show us the average load
$ uptime
10:14:05 up 86 days, 12:01, 11 users, load average: 0.60, 0.41, 0.41
Now, start a few instances of the eat_cpu program in the background, and check the effect on
the load again:
$ ./eat_cpu&
$ ./eat_cpu&
$ ./eat_cpu&
$ uptime
10:14:42 up 86 days, 12:02, 11 users, load average: 2.60, 0.93, 0.58
It is good practice to perform a number of run time stress tests, and to check the system load of
your nodes. We (and all other users of the HPC) would appreciate that you use the maximum
of the CPU resources that are assigned to you and make sure that there are no CPUs in your
node who are not utilised without reasons.
But how can you maximise?
2. Configure your software (e.g., to exactly use the available amount of processors in a node).
3. Develop your parallel program in a smart way, so that it fully utilises the available proces-
sors.
4. Demand a specific type of compute node (e.g., Harpertown, Westmere), which have a
specific number of cores.
88
8.5 Checking File sizes & Disk I/O
Some programs generate intermediate or output files, the size of which may also be a useful
metric.
Remember that your available disk space on the HPC online storage is limited, and that you have
environment variables which point to these directories available (i.e., $VSC_DATA, $VSC_SCRATCH
and $VSC_DATA). On top of those, you can also access some temporary storage (i.e., the /tmp
directory) on the compute node, which is defined by the $VSC_SCRATCH_LOCAL environ-
ment variable.
It is important to be aware of the sizes of the file that will be generated, as the available disk
space for each user is limited. We refer to section 6.5 on Quotas to check your quota and tools
to find which files consumed the quota.
Several actions can be taken, to avoid storage problems:
1. Be aware of all the files that are generated by your program. Also check out the hidden
files.
4. First work (i.e., read and write) with your big files in the local /tmp directory. Once
finished, you can move your files once to the VSC_DATA directories.
5. Make sure your programs clean up their temporary files after execution.
7. Anyone can request more disk space to the HPC staff, but you will have to duly justify
your request.
Users can examine their network activities with the htop command. When your processors are
100% busy, but you see a lot of red bars and only limited green bars in the htop screen, it is
mostly an indication that they loose a lot of time with inter-process communication.
Whenever your application utilises a lot of inter-process communication (as is the case in most
parallel programs), we strongly recommend to request nodes with an Infiniband network. The
Infiniband is a specialised high bandwidth, low latency network that enables large parallel jobs
to run as efficiently as possible.
The parameter to add in your job-script would be:
#PBS -l ib
If for some other reasons, a user is fine with the gigabit Ethernet network, he can specify:
89
Chapter 8. Fine-tuning Job Specifications
#PBS -l gbe
90
Chapter 9
HPC Policies
Stub chapter
91
Part II
Advanced Guide
92
Chapter 10
Multi-job submission
parameter variations i.e., many small jobs determined by a specific parameter set which is
stored in a .csv (comma separated value) input file.
job arrays i.e., each individual job got a unique numeric identifier.
Both use cases often have a common root: the user wants to run a program with a large number
of parameter settings, and the program does not allow for aggregation, i.e., it has to be run once
for each instance of the parameter values.
However, the Worker Frameworks scope is wider: it can be used for any scenario that can be
reduced to a MapReduce approach.1
1
MapReduce: Map refers to the map pattern in which every item in a collection is mapped onto a new value
by applying a given function, while reduce refers to the reduction pattern which condenses or reduces a collection
of previously computed results to a single value.
93
Chapter 10. Multi-job submission
$ cd /examples/Multi-job-submission/par_sweep
Suppose the program the user wishes to run the weather program, which takes three parameters:
a temperature, a pressure and a volume. A typical call of the program looks like:
For the purpose of this exercise, the weather program is just a simple bash script, which prints
the 3 variables to the standard output and waits a bit:
weather par_sweep/weather
1 #!/bin/bash
2 # Here you could do your calculations
3 echo "T: $2 P: $4 V: $6"
4 sleep 100
A job-script that would run this as a job for the first parameters (p01) would then look like:
weather_p01.pbs par_sweep/weather_p01.pbs
1 #!/bin/bash
2
3 #PBS -l nodes=1:ppn=8
4 #PBS -l walltime=01:00:00
5
6 cd $PBS_O_WORKDIR
7 ./weather -t 20 -p 1.05 -v 4.3
When submitting this job, the calculation is performed or this particular instance of the param-
eters, i.e., temperature = 20, pressure = 1.05, and volume = 4.3.
To submit the job, the user would use:
$ qsub weather_p01.pbs
However, the user wants to run this program for many parameter instances, e.g., he wants to run
the program on 100 instances of temperature, pressure and volume. The 100 parameter instances
can be stored in a comma separated value file (.csv) that can be generated using a spreadsheet
program such as Microsoft Excel or RDBMS or just by hand using any text editor (do not use
a word processor such as Microsoft Word). The first few lines of the file data.csv would look
like:
94
10.1 The worker Framework: Parameter Sweeps
$ more data.csv
temperature, pressure, volume
293, 1.0e5, 107
294, 1.0e5, 106
295, 1.0e5, 105
296, 1.0e5, 104
297, 1.0e5, 103
...
It has to contain the names of the variables on the first line, followed by 100 parameter instances
in the current example.
In order to make our PBS generic, the PBS file can be modified as follows:
weather.pbs par_sweep/weather.pbs
1 #!/bin/bash
2
3 #PBS -l nodes=1:ppn=8
4 #PBS -l walltime=04:00:00
5
6 cd $PBS_O_WORKDIR
7 ./weather -t $temperature -p $pressure -v $volume
8
9 # # This script is submitted to the cluster with the following 2 commands:
10 # module load worker
11 # wsub -data data.csv -batch weather.sh
Note that:
1. the parameter values 20, 1.05, 4.3 have been replaced by variables $temperature, $pressure
and $volume respectively, which were being specified on the first line of the data.csv file;
2. the number of processors per node has been increased to 8 (i.e., ppn=1 is replaced by
ppn=8);
3. the walltime has been increased to 4 hours (i.e., walltime=00:15:00 is replaced by wall-
time=04:00:00).
The walltime is calculated as follows: one calculation takes 15 minutes, so 100 calculations take
1500 minutes on one CPU. However, this job will use 8 CPUs, so the 100 calculations will be
done in 1500/8 = 187.5 minutes, i.e., 4 hours to be on the safe side.
The job can now be submitted as follows:
$ module load worker
$ wsub -batch weather.pbs -data data.csv
total number of work items: 41
123456.master15.delcatty.gent.vsc
Note that the PBS file is the value of the -batch option. The weather program will now be run for
all 100 parameter instances 8 concurrently until all computations are done. A computation
for such a parameter instance is called a work item in Worker parlance.
95
Chapter 10. Multi-job submission
$ cd /examples/Multi-job-submission/job_array
As a simple example, assume you have a serial program called myprog that you want to run on
various input files input[1-100].
The following bash script would submit these jobs all one by one:
1 #!/bin/bash
2 for i in seq 1 100; do
3 qsub -o output $i -i input $i myprog.pbs
4 done
2. individuals jobs are referenced as jobid-number, and the entire array can be referenced as
jobid for easy killing etc.; and
3. each jobs has PBS_ARRAYID set to its number which allows the script/program to spe-
cialise for that job
96
10.2 The Worker framework: Job arrays
The effect was that rather than 1 job, the user would actually submit 100 jobs to the queue
system. This was a popular feature of TORQUE, but as this technique puts quite a burden on
the scheduler, it is not supported by Moab (the current job scheduler).
To support those users who used the feature and since it offers a convenient workflow, the worker
framework implements the idea of job arrays in its own way.
A typical job-script for use with job arrays would look like this:
job_array.pbs job_array/job_array.pbs
1 #!/bin/bash -l
2 #PBS -l nodes=1:ppn=1
3 #PBS -l walltime=00:15:00
4 cd $PBS_O_WORKDIR
5 INPUT_FILE="input_${PBS_ARRAYID}.dat"
6 OUTPUT_FILE="output_${PBS_ARRAYID}.dat"
7 my_prog -input ${INPUT_FILE} -output ${OUTPUT_FILE}
In our specific example, we have prefabricated 100 input files in the ./input subdirectory. Each
of those files contains a number of parameters for the test_set program, which will perform
some tests with those parameters.
Input for the program is stored in files with names such as input_1.dat, input_2.dat, . . . ,
input_100.dat in the ./input subdirectory.
$ ls ./input
...
$ more ./input/input_99.dat
This is input file \#99
Parameter #1 = 99
Parameter #2 = 25.67
Parameter #3 = Batch
Parameter #4 = 0x562867
For the sole purpose of this exercise, we have provided a short test_set program, which reads
the input files and just copies them into a corresponding output file. We even add a few lines to
each output file. The corresponding output computed by our test_set program will be written
to the ./output directory in output_1.dat, output_2.dat, . . . , output_100.dat. files.
97
Chapter 10. Multi-job submission
test_set job_array/test_set
1 #!/bin/bash
2
3 # Check if the output Directory exists
4 if [ ! -d "./output" ] ; then
5 mkdir ./output
6 fi
7
8 # Here you could do your calculations...
9 echo "This is Job_array #" $1
10 echo "Input File : " $3
11 echo "Output File: " $5
12 cat ./input/$3 | sed -e "s/input/output/g" | grep -v "Parameter" > ./output/$5
13 echo "Calculations done, no results" >> ./output/$5
Using the worker framework, a feature akin to job arrays can be used with minimal modifications
to the job-script:
test_set.pbs job_array/test_set.pbs
1 #!/bin/bash -l
2 #PBS -l nodes=1:ppn=8
3 #PBS -l walltime=04:00:00
4 cd $PBS_O_WORKDIR
5 INPUT_FILE="input_${PBS_ARRAYID}.dat"
6 OUTPUT_FILE="output_${PBS_ARRAYID}.dat"
7 ./test_set ${PBS_ARRAYID} -input ${INPUT_FILE} -output ${OUTPUT_FILE}
Note that
The test_set program will now be run for all 100 input files 8 concurrently until all
computations are done. Again, a computation for an individual input file, or, equivalently, an
array id, is called a work item in Worker speak.
Note that in contrast to TORQUE job arrays, a worker job array only submits a single job.
98
10.3 MapReduce: prologues and epilogue
$ qstat
Job id Name User Time Use S Queue
--------------- ------------- --------- ---- ----- - -----
123456.master15.delcatty.gent.vsc test_set.pbs vsc40000 0 Q
1. a preparation phase in which the data is split up into smaller, more manageable chunks;
2. on these chunks, the same algorithm is applied independently (these are the work items);
and
3. the results of the computations on those chunks are aggregated into, e.g., a statistical
description of some sort.
The Worker framework directly supports this scenario by using a prologue (pre-processing) and
an epilogue (post-processing). The former is executed just once before work is started on the work
items, the latter is executed just once after the work on all work items has finished. Technically,
the master, i.e., the process that is responsible for dispatching work and logging progress, executes
the prologue and epilogue.
$ cd /examples/Multi-job-submission/map_reduce
The script pre.sh prepares the data by creating 100 different input-files, and the script post.sh
aggregates (concatenates) the data.
99
Chapter 10. Multi-job submission
pre.sh map_reduce/pre.sh
1 #!/bin/bash
2
3 # Check if the input Directory exists
4 if [ ! -d "./input" ] ; then
5 mkdir ./input
6 fi
7
8 # Just generate all dummy input files
9 for i in {1..100};
10 do
11 echo "This is input file #$i" > ./input/input_$i.dat
12 echo "Parameter #1 = $i" >> ./input/input_$i.dat
13 echo "Parameter #2 = 25.67" >> ./input/input_$i.dat
14 echo "Parameter #3 = Batch" >> ./input/input_$i.dat
15 echo "Parameter #4 = 0x562867" >> ./input/input_$i.dat
16 done
post.sh map_reduce/post.sh
1 #!/bin/bash
2
3 # Check if the input Directory exists
4 if [ ! -d "./output" ] ; then
5 echo "The output directory does not exist!"
6 exit
7 fi
8
9 # Just concatenate all output files
10 touch all_output.txt
11 for i in {1..100};
12 do
13 cat ./output/output_$i.dat >> all_output.txt
14 done
Note that the time taken for executing the prologue and the epilogue should be added to the
jobs total walltime.
100
10.4 Some more on the Worker Framework
The Worker Framework is implemented using MPI, so it is not restricted to a single compute
nodes, it scales well to multiple nodes. However, remember that jobs requesting a large number
of nodes typically spend quite some time in the queue.
The Worker Framework will be effective when
1. work items, i.e., individual computations, are neither too short, nor too long (i.e., from a
few minutes to a few hours); and,
2. when the number of work items is larger than the number of CPUs involved in the job
(e.g., more than 30 for 8 CPUs).
Since a Worker job will typically run for several hours, it may be reassuring to monitor its
progress. Worker keeps a log of its activity in the directory where the job was submitted. The
logs name is derived from the jobs name and the jobs ID, i.e., it has the form <jobname>.log<jobid>.
For the running example, this could be run.pbs.log123456, assuming the jobs ID is 123456. To
keep an eye on the progress, one can use:
$ tail -f run.pbs.log123456
Alternatively, wsummarize, a Worker command that summarises a log file, can be used:
$ watch -n 60 wsummarize run.pbs.log123456
Sometimes, the execution of a work item takes long than expected, or worse, some work items
get stuck in an infinite loop. This situation is unfortunate, since it implies that work items that
could successfully execute are not even started. Again, the Worker framework offers a simple
and yet versatile solution. If we want to limit the execution of each work item to at most 20
minutes, this can be accomplished by modifying the script of the running example.
1 #!/bin/bash -l
2 #PBS -l nodes=1:ppn=8
3 #PBS -l walltime=04:00:00
4 module load timedrun/1.0
5 cd $PBS_O_WORKDIR
6 timedrun -t 00:20:00 weather -t $temperature -p $pressure -v $volume
Note that it is trivial to set individual time constraints for work items by introducing a parameter,
and including the values of the latter in the CSV file, along with those for the temperature,
pressure and volume.
101
Chapter 10. Multi-job submission
Also note that timedrun is in fact offered in a module of its own, so it can be used outside the
Worker framework as well.
Unfortunately, it is not always easy to estimate the walltime for a job, and consequently, some-
times the latter is underestimated. When using the Worker framework, this implies that not
all work items will have been processed. Worker makes it very easy to resume such a job with-
out having to figure out which work items did complete successfully, and which remain to be
computed. Suppose the job that did not complete all its work items had ID 445948.
This will submit a new job that will start to work on the work items that were not done yet. Note
that it is possible to change almost all job parameters when resuming, specifically the requested
resources such as the number of cores and the walltime.
Work items may fail to complete successfully for a variety of reasons, e.g., a data file that is
missing, a (minor) programming error, etc. Upon resuming a job, the work items that failed are
considered to be done, so resuming a job will only execute work items that did not terminate
either successfully, or reporting a failure. It is also possible to retry work items that failed
(preferably after the glitch why they failed was fixed).
By default, a jobs prologue is not executed when it is resumed, while its epilogue is. wresume
has options to modify this default behaviour.
This how-to introduces only Workers basic features. The wsub command has some usage infor-
mation that is printed when the -help option is specified:
102
10.4 Some more on the Worker Framework
$ wsub -help
### usage: wsub -batch <batch-file> \
# [-data <data-files>] \
# [-prolog <prolog-file>] \
# [-epilog <epilog-file>] \
# [-log <log-file>] \
# [-mpiverbose] \
# [-dryrun] [-verbose] \
# [-quiet] [-help] \
# [-t <array-req>] \
# [<pbs-qsub-options>]
#
# -batch <batch-file> : batch file template, containing variables to be
# replaced with data from the data file(s) or the
# PBS array request option
# -data <data-files> : comma-separated list of data files (default CSV
# files) used to provide the data for the work
# items
# -prolog <prolog-file> : prolog script to be executed before any of the
# work items are executed
# -epilog <epilog-file> : epilog script to be executed after all the work
# items are executed
# -mpiverbose : pass verbose flag to the underlying MPI program
# -verbose : feedback information is written to standard error
# -dryrun : run without actually submitting the job, useful
# -quiet : dont show information
# -help : print this help message
# -t <array-req> : qsubs PBS array request options, e.g., 1-10
# <pbs-qsub-options> : options passed on to the queue submission
# command
103
Chapter 11
All nodes in the HPC cluster are running the CentOS 7.2 (phanpy, golett, swalot), Scientific
Linux 6.7 (raichu, delcatty) Operating system, which is a specific version of RedHat-Linux.
This means that all the software programs (executable) that the end-user wants to run on the
HPC first must be compiled for CentOS 7.2 (phanpy, golett, swalot), Scientific Linux 6.7 (raichu,
delcatty). It also means that you first have to install all the required external software packages
on the HPC.
Most commonly used compilers are already pre-installed on the HPC and can be used straight
away. Also many popular external software packages, which are regularly used in the scientific
community, are also pre-installed.
In order to check all the available modules and their version numbers, which are pre-installed on
the HPC enter:
104
11.2 Porting your code
As you are not aware of the capitals letters in the module name, we looked for a case-insensitive
name with the -i option.
When your required application is not available on the HPC please contact any HPC member.
Be aware of potential License Costs. Open Source software is often preferred.
To port a software-program is to translate it from the operating system in which it was developed
(e.g., Windows 7) to another operating system (e.g., Red Hat Enterprise Linux on our HPC)
so that it can be used there. Porting implies some degree of effort, but not nearly as much as
redeveloping the program in the new environment. It all depends on how portable you wrote
your code.
In the simplest case the file or files may simply be copied from one machine to the other. However,
in many cases the software is installed on a computer in a way, which depends upon its detailed
hardware, software, and setup, with device drivers for particular devices, using installed operating
system and supporting software components, and using different directories.
In some cases software, usually described as portable software is specifically designed to run
on different computers with compatible operating systems and processors without any machine-
dependent installation; it is sufficient to transfer specified directories and their contents. Hardware-
and software-specific information is often stored in configuration files in specified locations (e.g.,
the registry on machines running MS Windows).
Software, which is not portable in this sense, will have to be transferred with modifications to
support the environment on the destination machine.
Whilst programming, it would be wise to stick to certain standards (e.g., ISO/ANSI/POSIX).
This will ease the porting of your code to other platforms.
Porting your code to the CentOS 7.2 (phanpy, golett, swalot), Scientific Linux 6.7 (raichu,
delcatty) platform is the responsibility of the end-user.
Compiling refers to the process of translating code written in some programming language, e.g.,
Fortran, C, or C++, to machine code. Building is similar, but includes gluing together the
machine code resulting from different source files into an executable (or library). The text below
guides you through some basic problems typical for small software projects. For larger projects
it is more appropriate to use makefiles or even an advanced build system like CMake.
All the HPC nodes run the same version of the Operating System, i.e. CentOS 7.2 (phanpy,
golett, swalot), Scientific Linux 6.7 (raichu, delcatty). So, it is sufficient to compile your program
on any compute node. Once you have generated an executable with your compiler, this executable
105
Chapter 11. Compiling and testing your software on the HPC
3. Compile it;
4. Test it locally;
We assume youve copied your software to the HPC The next step is to request your private
compute node.
$ qsub -I
qsub: waiting for job 123456.master15.delcatty.gent.vsc to start
We now list the directory and explore the contents of the hello.c program:
$ ls -l
total 512
-rw-r--r-- 1 vsc40000 214 Sep 16 09:42 hello.c
-rw-r--r-- 1 vsc40000 130 Sep 16 11:39 hello.pbs*
-rw-r--r-- 1 vsc40000 359 Sep 16 13:55 mpihello.c
-rw-r--r-- 1 vsc40000 304 Sep 16 13:55 mpihello.pbs
106
11.3 Compiling and building on the HPC
hello.c
1 /*
2 * VSC : Flemish Supercomputing Centre
3 * Tutorial : Introduction to HPC
4 * Description: Print 500 numbers, whilst waiting 1 second in between
5 */
6 #include "stdio.h"
7 int main( int argc, char *argv[] )
8 {
9 int i;
10 for (i=0; i<500; i++)
11 {
12 printf("Hello #%d\n", i);
13 fflush(stdout);
14 sleep(1);
15 }
16 }
The hello.c program is a simple source file, written in C. Itll print 500 times Hello #<num>,
and waits one second between 2 printouts.
We first need to compile this C-file into an executable with the gcc-compiler.
First, check the command line options for gcc (GNU C-Compiler), then we compile and list
the contents of the directory again:
$ gcc -help
$ gcc -o hello hello.c
$ ls -l
total 512
-rwxrwxr-x 1 vsc40000 7116 Sep 16 11:43 hello*
-rw-r--r-- 1 vsc40000 214 Sep 16 09:42 hello.c
-rwxr-xr-x 1 vsc40000 130 Sep 16 11:39 hello.pbs*
A new file hello has been created. Note that this file has execute rights, i.e., it is an executable.
More often than not, calling gcc or any other compiler for that matter will provide you with
a list of errors and warnings referring to mistakes the programmer made, such as typos, syntax
errors. You will have to correct them first in order to make the code compile. Warnings pinpoint
less crucial issues that may relate to performance problems, using unsafe or obsolete language
features, etc. It is good practice to remove all warnings from a compilation process, even if they
seem unimportant so that a code change that produces a warning does not go unnoticed.
Lets test this program on the local compute node, which is at your disposal after the qsub I
command:
$ ./hello
Hello #0
Hello #1
Hello #2
Hello #3
Hello #4
...
107
Chapter 11. Compiling and testing your software on the HPC
$ cd /examples/Compiling-and-testing-your-software-on-the-HPC
List the directory and explore the contents of the mpihello.c program:
$ ls -l
total 512
total 512
-rw-r--r-- 1 vsc40000 214 Sep 16 09:42 hello.c
-rw-r--r-- 1 vsc40000 130 Sep 16 11:39 hello.pbs*
-rw-r--r-- 1 vsc40000 359 Sep 16 13:55 mpihello.c
-rw-r--r-- 1 vsc40000 304 Sep 16 13:55 mpihello.pbs
mpihello.c
1 /*
2 * VSC : Flemish Supercomputing Centre
3 * Tutorial : Introduction to HPC
4 * Description: Example program, to compile with MPI
5 */
6 #include <stdio.h>
7 #include <mpi.h>
8
9 main(int argc, char **argv)
10 {
11 int node, i, j;
12 float f;
13
14 MPI_Init(&argc,&argv);
15 MPI_Comm_rank(MPI_COMM_WORLD, &node);
16
17 printf("Hello World from Node %d.\n", node);
18 for (i=0; i<=100000; i++)
19 f=i*2.718281828*i+i+i*3.141592654;
20
21 MPI_Finalize();
22 }
The mpi_hello.c program is a simple source file, written in C with MPI library calls.
Then, check the command line options for mpicc (GNU C-Compiler with MPI extensions),
then we compile and list the contents of the directory again:
$ mpicc -help
$ mpicc -o mpihello mpihello.c
$ ls -l
A new file hello has been created. Note that this program has execute rights.
108
11.3 Compiling and building on the HPC
$ ./mpihello
Hello World from Node 0.
$ qsub mpihello.pbs
We will now compile the same program, but using the Intel Parallel Studio Cluster Edition
compilers. We stay in the examples directory for this chapter:
$ cd /examples/Compiling-and-testing-your-software-on-the-HPC
We will compile this C/MPI -file into an executable with the Intel Parallel Studio Cluster Edition.
First, clear the modules (purge) and then load the latest intel module:
$ module purge
$ module load intel
Then, compile and list the contents of the directory again. The Intel equivalent of mpicc is
mpiicc.
Note that the old mpihello file has been overwritten. Lets test this program on the login-node
first:
$ ./mpihello
Hello World from Node 0.
$ qsub mpihello.pbs
Note: The AUGent only has a license for the Intel Parallel Studio Cluster Edition for a fixed
number of users. As such, it might happen that you have to wait a few minutes before a floating
license becomes available for your use.
Note: The Intel Parallel Studio Cluster Edition contains equivalent compilers for all GNU com-
pilers. Hereafter the overview for C, C++ and Fortran compilers.
109
Chapter 11. Compiling and testing your software on the HPC
110
Chapter 12
Program examples
Go to our examples:
$ cd /examples/Program-examples
Here, we just have put together a number of examples for your convenience. We did an effort to
put comments inside the source files, so the source code files are (should be) self-explanatory.
1. 01_Python
2. 02_C_C++
3. 03_Matlab
4. 04_MPI_C
5. 05a_OMP_C
6. 05b_OMP_FORTRAN
7. 06_NWChem
8. 07_Wien2k
9. 08_Gaussian
10. 09_Fortran
11. 10_PQS
111
Chapter 12. Program examples
112
Chapter 13
Best Practices
2. Check your computer requirements upfront, and request the correct resources in your PBS
configuration script.
3. Check your jobs at runtime. You could login to the node and check the proper execution
of your jobs with, e.g., top or vmstat. Alternatively you could run an interactive job
(qsub -I).
4. Try to benchmark the software for scaling issues when using MPI or for I/O issues.
5. Use the scratch file system ($VSC_SCRATCH_NODE which is mapped to the local /tmp)
whenever possible. Local disk I/O is always much faster as it does not have to use the
network.
6. When your job starts, it will log on to the compute node(s) and start executing the com-
mands in the job script. It will start in your home directory ($VSC_HOME), so going to
the current directory with the cd $PBS_O_WORKDIR is the first thing which needs to
be done. You will have your default environment, so dont forget to load the software with
module load.
7. In case your job not running, use checkjob. It will show why your job is not yet running.
Sometimes commands might timeout with an overloaded scheduler.
113
Chapter 13. Best Practices
9. Submit small jobs by grouping them together. The Worker Framework has been designed
for these purposes.
10. The runtime is limited by the maximum walltime of the queues. For longer walltimes, use
checkpointing.
13. And above all . . . do not hesitate to contact the HPC staff. Were here to help you.
Important note: the PBS file on the HPC has to be in UNIX format, if it is not, your job will
fail and generate rather weird error messages.
If necessary, you can convert it using
$ dos2unix file.pbs
13.3.1 EasyBuild
(coming soon)
13.4.1 OpenFOAM
(coming soon)
114
Appendix A
Login
Login ssh <vsc-account>@login.hpc.ugent.be
Where am I? hostname
Copy to HPC scp foo.txt <vsc-account>@login.hpc.ugent.be:
Copy from HPC scp <vsc-account>@login.hpc.ugent.be:foo.txt .
Setup ftp session sftp <vsc-account>@login.hpc.ugent.be
Modules
List all available modules module av
List loaded modules module list
Load module module load <name>
Unload module module unload <name>
Unload all modules module purge
Help on use of module module help
Jobs
Submit Job qsub <script.pbs>
Status of the job qstat <jobid>
Possible start time (not available everywhere) showstart <jobid>
Check job (not available everywhere) checkjob <jobid>
Show compute node qstat -n <jobid>
Delete job qdel <jobid>
Status of all your jobs qstat
Show all jobs on queue (not available everywhere) showq
Submit Interactive job qsub -I
Disk quota
Check your disk quota mmlsquota
Check disk quota nice show_quota.py
Local disk usage du -h .
Overall disk usage df -a
115
Appendix A. HPC Quick Reference Guide
Worker Framework
Load worker module module load worker
Submit parameter sweep wsub -batch weather.pbs -data data.csv
Submit job array wsub -t 1-100 -batch test_set.pbs
Submit job array with prolog wsub -prolog pre.sh -batch test_set.pbs -epilog post.sh -t
and epilog 1-100
116
Appendix B
TORQUE options
117
Appendix B. TORQUE options
-V All Make sure that the environment in which the job runs is the
same as the environment in which it was submitted.
#PBS -V
Walltime All The maximum time a job can run before being stopped. If not
used a default of a few minutes is used. Use this flag to pre-
vent jobs that go bad running for hundreds of hours. Format is
HH:MM:SS
#PBS -l walltime=12:00:00
IMPORTANT!! All PBS directives MUST come before the first line of executable code in
your script, otherwise they will be ignored.
When a batch job is started, a number of environment variables are created that can be used in
the batch job script. A few of the most commonly used variables are described here.
Variable Description
PBS_ENVIRONMENT set to PBS_BATCH to indicate that the job is a batch job; oth-
erwise, set to PBS_INTERACTIVE to indicate that the job is a
PBS interactive job.
PBS_JOBID the job identifier assigned to the job by the batch system. This
is the same number you see when you do qstat.
PBS_JOBNAME the job name supplied by the user
118
B.2 Environment Variables in Batch Job Scripts
PBS_NODEFILE the name of the file that contains the list of the nodes assigned
to the job . Useful for Parallel jobs if you want to refer the node,
count the node etc.
PBS_QUEUE the name of the queue from which the job is executed
PBS_O_HOME value of the HOME variable in the environment in which qsub
was executed
PBS_O_LANG value of the LANG variable in the environment in which qsub was
executed
PBS_O_LOGNAME value of the LOGNAME variable in the environment in which
qsub was executed
PBS_O_PATH value of the PATH variable in the environment in which qsub was
executed
PBS_O_MAIL value of the MAIL variable in the environment in which qsub was
executed
PBS_O_SHELL value of the SHELL variable in the environment in which qsub
was executed
PBS_O_TZ value of the TZ variable in the environment in which qsub was
executed
PBS_O_HOST the name of the host upon which the qsub command is running
PBS_O_QUEUE the name of the original queue to which the job was submitted
PBS_O_WORKDIR the absolute path of the current working directory of the qsub
command. This is the most useful. Use it in every job-script.
The first thing you do is, cd $PBS_O_WORKDIR after defining
the resource list. This is because, pbs throw you to your $HOME
directory.
PBS_O_NODENUM node offset number
PBS_O_VNODENUM vnode offset number
PBS_VERSION Version Number of TORQUE, e.g., TORQUE-2.5.1
PBS_MOMPORT active port for mom daemon
PBS_TASKNUM number of tasks requested
PBS_JOBCOOKIE job cookie
PBS_SERVER Server Running TORQUE
119
Appendix C
All the HPC clusters run some variant of the Red Hat Enterprise Linux operating system. This
means that, when you connect to one of them, you get a command line interface, which looks
something like this:
vsc40000@ln01[203] $
When you see this, we also say you are inside a shell. The shell will accept your commands,
and execute them.
ls Shows you a list of files in the current directory
cd Change current working directory
rm Remove file or directory
nano Text editor
echo Prints its parameters to the screen
Most commands will accept or even need parameters, which are placed after the command,
separated by spaces. A simple example with the echo command:
Important here is the $ sign in front of the first line. This should not be typed, but is a
convention meaning the rest of this line should be typed at your shell prompt. The lines not
starting with the $ sign are usually the feedback or output from the command.
More commands will be used in the rest of this text, and will be explained then if necessary. If
not, you can usually get more information about a command, say the item or command ls, by
trying either of the following:
120
C.2 How to get started with shell scripts
$ ls -help
$ man ls
$ info ls
(You can exit the last two manuals by using the q key.) For more exhaustive tutorials
about Linux usage, please refer to the following sites: http://www.linux.org/lessons/
http://linux.about.com/od/nwb_guide/a/gdenwb06.htm
In a shell script, you will put the commands you would normally type at your shell prompt in
the same order. This will enable you to execute all those commands at any time by only issuing
one command: starting the script.
Scripts are basically non-compiled pieces of code: they are just text files. Since they dont
contain machine code, they are executed by what is called a parser or an interpreter. This
is another program that understands the command in the script, and converts them to machine
code. There are many kinds of scripting languages, including Perl and Python.
Another very common scripting language is shell scripting. In a shell script, you will put the
commands you would normally type at your shell prompt in the same order. This will enable you
to execute all those commands at any time by only issuing one command: starting the script.
Typically in the following examples theyll have on each line the next command to be executed
although it is possible to put multiple commands on one line. A very simple example of a script
may be:
1 echo "Hello! This is my hostname:"
2 hostname
You can type both lines at your shell prompt, and the result will be the following:
$ echo "Hello! This is my hostname:"
Hello! This is my hostname:
$ hostname
gligar01.gligar.gent.vsc
Suppose we want to call this script foo. You open a new file for editing, and name it foo, and
edit it with your favourite editor
\ifgent
$ nano foo
\else
$ vi foo
\fi
The easiest ways to run a script is by starting the interpreter and pass the script as parameter.
In case of our script, the interpreter may either be sh or bash (which are the same on the
121
Appendix C. Useful Linux Commands
$ bash foo
Hello! This is my hostname:
gligar01.gligar.gent.vsc
Congratulations, you just created and started your first shell script!
A more advanced way of executing your shell scripts is by making them executable by their own,
so without invoking the interpreter manually. The system can not automatically detect which
interpreter you want, so you need to tell this in some way. The easiest way is by using the so
called shebang-notation, explicitly created for this function: you put the following line on top
of your shell script #!/path/to/your/interpreter.
You can find this path with the which command. In our case, since we use bash as an interpreter,
we get the following path:
$ which bash
/bin/bash
1 #!/bin/bash
2 echo "Hello! This is my hostname:"
3 hostname
Note that the shebang must be the first line of your script! Now the operating system knows
which program should be started to run the script.
Finally, we tell the operating system that this script is now executable. For this we change its
file attributes:
$ chmod +x foo
$ ./foo
Hello! This is my hostname:
gligar01.gligar.gent.vsc
The same technique can be used for all other scripting languages, like Perl and Python.
Most scripting languages understand that lines beginning with # are comments, and should
be ignored. If the language you want to use does not ignore these lines, you may get strange
results . . .
122
C.3 Linux Quick reference Guide
C.3.3 Editor
emacs
nano Nanos ANOther editor, an enhanced free Pico clone
vi A programmers text editor
cat Read one or more files and print them to standard output
cmp Compare two files byte by byte
cp Copy files from a source to the same or different target(s)
du Estimate disk usage of each file and recursively for directories
find Search for files in directory hierarchy
grep Print lines matching a pattern
ls List directory contents
mv Move file to different targets
rm Remove files
sort Sort lines of text files
wc Print the number of new lines, words, and bytes in files
man Displays the manual page of a command with its name, synopsis, description,
author, copyright etc.
123
Appendix C. Useful Linux Commands
124