High Performance Computing
High Performance Computing
net/publication/255739031
High-Performance Computing
CITATIONS READS
51 998
5 authors, including:
David Henty
The University of Edinburgh
85 PUBLICATIONS 1,538 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Martyn F. Guest on 05 January 2015.
• Core (and most useful functions) are pretty easy to grasp with a bit of
experience.
1 3
2 4
Compiling MPI Applications Compiling MPI Applications written in C
• MPI Applications have their own set of compilers - this handles the • Applications compiled with the MPI compilers cannot be run from the
include locations and libraries without any additional user interaction. command line because OpenMPI is not available.
• This will also compile your code for running on Myrinet and under • You might get errors or the code may partially run.
OpenPBS/Torque.
• Either way - best to run your codes using the OpenPBS/Torque system
• Compilers for C, C++ and Fortran are installed (only C is needed for the in the next section.
coursework unless you choose to completely rewrite the application!).
5 7
• To compile an application written in C replace gcc with mpicc • If you want to know any OpenMPI installation information for your
coursework then use the following command:
• Example:
ompi_info
mpicc -o hw-mpi -O2 helloworld-mpi.c
6 8
Logging In
• You can collect your password by coming to the HPSG lab (CS2.04) in
How to use the HPSG Cluster week 5. We will make an announcement via the Teaching
Announcements and in lectures.
9 11
High Performance Systems Group Cluster Checking the Current Runtime Queue
• During the coursework you will have access to the HPSG IBM Cluster, • To check the current execution queue:
this is not a high performance system but it will be good for
performance models. qstat
• 42 x dual processor (Pentium III, 1.4Ghz), 2GB system RAM per node, • The current queue will be shown.
Myrinet fibre-optics interconnect.
Job id Name User Time Use S Queue
• Various head nodes and system management machines. -------------------------
5277.frankie
----------------
BOINC
---------------
sdh
--------
612:08:2
-
R
-----
boinc
5278.frankie BOINC sdh 600:48:5 R boinc
•
5279.frankie BOINC sdh 553:19:1 R boinc
The system uses an OpenPBS/Torque queue to manage and batch 5280.frankie BOINC sdh 496:34:2 R boinc
5281.frankie BOINC sdh 584:17:1 R boinc
jobs. 5282.frankie BOINC sdh 505:19:2 R boinc
5283.frankie BOINC sdh 501:20:0 R boinc
5284.frankie BOINC sdh 523:02:3 R boinc
• Note: Best performance is achieved when you densely pack your jobs 5285.frankie
5286.frankie
BOINC
BOINC
sdh
sdh
507:42:1
489:33:5
R
R
boinc
boinc
(i.e. use both processors on the same node).
10 12
Checking the Current Runtime Queue Checking the Current Runtime Queue
• To check the current execution queue: • To check the current execution queue:
qstat qstat
• The current queue will be shown. • The current queue will be shown.
Job id Name User Time Use S Queue Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - ----- ------------------------- ---------------- --------------- -------- - -----
5277.frankie BOINC sdh 612:08:2 R boinc 5277.frankie BOINC sdh 612:08:2 R boinc
5278.frankie BOINC sdh 600:48:5 R boinc 5278.frankie BOINC sdh 600:48:5 R boinc
5279.frankie BOINC sdh 553:19:1 R boinc 5279.frankie BOINC sdh 553:19:1 R boinc
5280.frankie BOINC sdh 496:34:2 R boinc 5280.frankie BOINC sdh 496:34:2 R boinc
5281.frankie BOINC sdh 584:17:1 R boinc 5281.frankie BOINC sdh 584:17:1 R boinc
5282.frankie BOINC sdh 505:19:2 R boinc 5282.frankie BOINC sdh 505:19:2 R boinc
5283.frankie BOINC sdh 501:20:0 R boinc 5283.frankie BOINC sdh 501:20:0 R boinc
5284.frankie BOINC sdh 523:02:3 R boinc 5284.frankie BOINC sdh 523:02:3 R boinc
5285.frankie BOINC sdh 507:42:1 R boinc 5285.frankie BOINC sdh 507:42:1 R boinc
5286.frankie BOINC sdh 489:33:5 R boinc 5286.frankie BOINC sdh 489:33:5 R boinc
13 15
Checking the Current Runtime Queue Checking the Current Runtime Queue
• To check the current execution queue: • To check the current execution queue:
qstat qstat
• The current queue will be shown. • The current queue will be shown.
Job id Name User Time Use S Queue Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - ----- ------------------------- ---------------- --------------- -------- - -----
5277.frankie BOINC sdh 612:08:2 R boinc 5277.frankie BOINC sdh 612:08:2 R boinc
5278.frankie BOINC sdh 600:48:5 R boinc 5278.frankie BOINC sdh 600:48:5 R boinc
5279.frankie BOINC sdh 553:19:1 R boinc 5279.frankie BOINC sdh 553:19:1 R boinc
5280.frankie BOINC sdh 496:34:2 R boinc 5280.frankie BOINC sdh 496:34:2 R boinc
5281.frankie BOINC sdh 584:17:1 R boinc 5281.frankie BOINC sdh 584:17:1 R boinc
5282.frankie BOINC sdh 505:19:2 R boinc 5282.frankie BOINC sdh 505:19:2 R boinc
5283.frankie BOINC sdh 501:20:0 R boinc 5283.frankie BOINC sdh 501:20:0 R boinc
5284.frankie BOINC sdh 523:02:3 R boinc 5284.frankie BOINC sdh 523:02:3 R boinc
5285.frankie BOINC sdh 507:42:1 R boinc 5285.frankie BOINC sdh 507:42:1 R boinc
5286.frankie BOINC sdh 489:33:5 R boinc 5286.frankie BOINC sdh 489:33:5 R boinc
14 16
Job Status Information Getting Something Run on the Cluster
• Jobs can be in one of many states: • The cluster queuing system batches up jobs and runs them so that the
resource use is shared fairly between users.
• C - Job completed after having run (in tidy up and completed state)
• You can run jobs on the cluster in two ways:
• E - Job is exiting after having run (“ending”)
• H - Job is held by user or system • Via a submit script (a ‘batch’ job)
• Interactively
• Q - Job is being held in the queue for resources
• R - Job is running
• Do not run jobs outside of the queue - this will give you incorrect
• T - Job is being moved to a new queue/server results and cause unfair resource use. We will monitor this and disable
accounts for users who do not use the queue correctly.
• W - Job is waiting for its execution to be reached
• S - Job is suspended (not supported on our cluster) • Jobs are submitted using the qsub command.
17 19
• To check the current execution queue: • Interactive jobs allocate one (or more) nodes for you to use interactively
- i.e. you can run commands on the node in a similar way to using a
pbsnodes -a shell.
• The current queue will be shown. • Useful if you need to see the application executing (for debug etc).
• Can be used for X-windows jobs (if you really need to).
vogon41.deepthought.hpsg.dcs.warwick.ac.uk
state = free
np = 2
• Your job submission will block until a node becomes free for you to
ntype = cluster
status = opsys=linux,uname=Linux vogon41 2.6.24-vogon-stripped0 #5 SMP Thu Sep 18
use.
10:19:48 BST 2008 i686,sessions=881
1061,nsessions=2,nusers=2,idletime=928290,totmem=4174136kb,availmem=4096888kb,physmem=207
6496kb,ncpus=2,loadave=0.00,netload=696181053,state=free,jobs=,varattr=,rectime=123287857 • To request an interactive session on one node:
0
qsub -V -I
18 20
Batch Jobs Sample OpenPBS Submit File:
• Batch jobs allow you to submit an application to be executed into the #!/bin/bash
#PBS -V
queue and then leave it to be run (i.e. you don’t need to sit there typing
commands in and watching the output). cd $PBS_O_WORKDIR
• Requires you to write a submit script (to say what you want to be
executed).
qsub -V submit.pbs
21 23
#!/bin/bash #!/bin/bash
#PBS -V #PBS -V
cd $PBS_O_WORKDIR cd $PBS_O_WORKDIR
22 24
Sample OpenPBS Submit File: Some more options...
#!/bin/bash
#PBS -V
• -N <name> (sets the job name)
25 27
• The output and error streams of your job will be written to a file after • Write a script as before.
your job has completed.
• Need additional parameters to tell the scheduler how many processors
• This may take a minute or two after completion - be patient! to allocate to the job.
• Default file names are <jobname>.o<jobid> and <jobname>.e<jobid> • MPI runtime environment automatically knows how many processors
are allocated by looking this up in the PBS shell variables (and TM
interface).
26 28
Submitting Parallel Jobs Removing Jobs
qsub -l nodes=X:ppn=Y -V submit.pbs • To remove a job from the current execution queue:
• nodes=X requests a number of machines • Get the job number from the queue (qstat)
29 31
• You can limit the resource requests for your job on time, memory, • To get extended information on jobs in the system:
processors and user specified attributes.
qstat -a
• The queues running on the cluster automatically apply defaults to your
jobs during submission.
30 32
Resources
• Condor/HPSG Resources:
http://www2.warwick.ac.uk/fac/sci/dcs/people/research/csrcbc/
hpcsystems/dthought/
End of Seminar
• High Performance Computing coursework + MPI resources (FAQs,
Common Errors etc)
Thanks for coming, next week -
http://www2.warwick.ac.uk/fac/sci/dcs/people/research/csrcbc/
teaching/hpcseminars/
How to Build a Performance Model
33 35
Final Notes...
• This is the first year students are running jobs on a cluster using
OpenPBS - there probably will be some bugs and faults in our queue
definitions.
• Email us as soon as its goes wrong, we will try to fix it (crosses fingers)!
• Do not try to run jobs directly, if we find them we will lock the account -
always use the queue, this ensures fair execution and more reliable
results (which are crucial to a good performance model).
• The job queue will get busy but your jobs will get run (in the end).
Please be responsible, only submit what you need and delete jobs
which you know will go wrong.
34