PBSProUserGuide9 1
PBSProUserGuide9 1
PBS Professional
TM
9.1
User’s Guide
UNIX®, LINUX® and Windows®
PBS ProfessionalTM User’s Guide
Altair® PBS ProfessionalTM 9.1, Updated: October 24, 2007
Edited by Anne Urban
For more information, copies of documentation, and sales, contact Altair at:
Technical Support
Location Telephone e-mail
Table of Contents
Preface .............................................................................. vii
Acknowledgements ........................................................... ix
1 Introduction................................................................ 1
Book Organization....................................................1
Supported Platforms .................................................2
What is PBS Professional? .......................................2
History of PBS..........................................................3
About the PBS Team ................................................4
About Altair Engineering .........................................4
Why Use PBS? .........................................................4
2 Concepts and Terms .................................................. 7
PBS Components......................................................8
Defining PBS Concepts and Terms ..........................9
3 Getting Started With PBS ....................................... 15
New Features in PBS Professional 9.1 ...................15
Deprecations ..........................................................15
Using PBS ..............................................................16
PBS Interfaces ........................................................16
User’s PBS Environment........................................18
Usernames Under PBS ...........................................18
Setting Up Your UNIX/Linux Environment ..........18
Setting Up Your Windows Environment................20
Environment Variables ...........................................22
Temporary Scratch Space: TMPDIR......................24
iv
Preface
Intended Audience
PBS Professional is the professional workload management system from Altair that pro-
vides a unified queuing and job management interface to a set of computing resources.
This document provides the user with the information required to use PBS Professional,
including creating, submitting, and manipulating batch jobs; querying status of jobs,
queues, and systems; and otherwise making effective use of the computer resources under
the control of PBS.
Related Documents
The following publications contain information that may also be useful to the user of PBS:
To order additional copies of this and other PBS publications, or to purchase additional
software licenses, contact an authorized reseller, or the PBS Sales Department. Contact
information is included on the copyright page of this document.
Document Conventions
command This fixed width font is used to denote literal commands, filena-
mes, error messages, and program output.
Acknowledgements
PBS Professional is the enhanced commercial version of the PBS software originally
developed for NASA. The NASA version had a number of corporate and individual con-
tributors over the years, for which the PBS developers and PBS community is most grate-
ful. Below we provide formal legal acknowledgements to corporate and government
entities, then special thanks to individuals.
The NASA version of PBS contained software developed by NASA Ames Research Cen-
ter, Lawrence Livermore National Laboratory, and MRJ Technology Solutions. In addi-
tion, it included software developed by the NetBSD Foundation, Inc., and its contributors
as well as software developed by the University of California, Berkeley and its contribu-
tors.
Other contributors to the NASA version of PBS include Bruce Kelly and Clark Streeter of
NERSC; Kent Crispin and Terry Heidelberg of LLNL; John Kochmar and Rob Penning-
ton of Pittsburgh Supercomputing Center; and Dirk Grunwald of University of Colorado,
Boulder. The ports of PBS to the Cray T3e and the IBM SP SMP were funded by DoD
USAERDC; the port of PBS to the Cray SV1 was funded by DoD MSIC.
No list of acknowledgements for PBS would possibly be complete without special recog-
nition of the first two beta test sites. Thomas Milliman of the Space Sciences Center of the
University of New Hampshire was the first beta tester. Wendy Lin of Purdue University
was the second beta tester and holds the honor of submitting more problem reports than
anyone else outside of NASA.
x
PBS Professional 9.1 1
User’s Guide
Chapter 1
Introduction
This book, the User’s Guide to PBS Professional is intended as your knowledgeable com-
panion to the PBS Professional software. The information herein pertains to PBS in gen-
eral, with specific information for PBS Professional 9.1.
This book is organized into 10 chapters, plus two appendices. Depending on your intended
use of PBS, some chapters will be critical to you, and others may be safely skipped.
Chapter 1 gives an overview of this book, PBS, and the PBS team.
Chapter 2 discusses the various components of PBS and how they interact, fol-
lowed by definitions of terms used in PBS and in distributed work-
load management.
Chapter 3 introduces PBS, describing both user interfaces and suggested set-
tings to the user’s environment.
Chapter 4 describes the structure and components of a PBS job, and explains
how to create and submit a PBS job.
2 Chapter 1
Introduction
Chapter 5 introduces the xpbs graphical user interface, and shows how to
submit a PBS job using xpbs.
Chapter 8 describes and explains how to use the more advanced features
of PBS.
Chapter 10 explains how PBS interacts with multi-vnode and parallel appli-
cations, and illustrates how to run such applications under PBS.
PBS Professional is the professional version of the Portable Batch System (PBS), a flexi-
ble workload management system, originally developed to manage aerospace computing
resources at NASA. PBS has since become the leader in supercomputer workload man-
agement and the de facto standard on Linux clusters.
Today, growing enterprises often support hundreds of users running thousands of jobs
across different types of machines in different geographical locations. In this distributed
heterogeneous environment, it can be extremely difficult for administrators to collect
detailed, accurate usage data, or to set system-wide resource priorities. As a result, many
computing resources are left under-utilized, while others are over-utilized. At the same
time, users are confronted with an ever expanding array of operating systems and plat-
PBS Professional 9.1 3
User’s Guide
forms. Each year, scientists, engineers, designers, and analysts must waste countless hours
learning the nuances of different computing environments, rather than being able to focus
on their core priorities. PBS Professional addresses these problems for computing-inten-
sive industries such as science, engineering, finance, and entertainment.
Now you can use the power of PBS Professional to better control your computing
resources. This allows you to unlock the potential in the valuable assets you already have,
while at the same time, reducing dependency on system administrators and operators,
freeing them to focus on other actives. PBS Professional can also help you effectively
manage growth by tracking real usage levels across your systems and enhancing utiliza-
tion of future purchases.
In the past, UNIX systems were used in a completely interactive manner. Background jobs
were just processes with their input disconnected from the terminal. However, as UNIX
moved onto larger and larger machines, the need to be able to schedule tasks based on
available resources increased in importance. The advent of networked compute servers,
smaller general systems, and workstations led to the requirement of a networked batch
scheduling capability. The first such UNIX-based system was the Network Queueing Sys-
tem (NQS) funded by NASA Ames Research Center in 1986. NQS quickly became the de
facto standard for batch queueing.
Over time, distributed parallel systems began to emerge, and NQS was inadequate to han-
dle the complex scheduling requirements presented by such systems. In addition, com-
puter system managers wanted greater control over their compute resources, and users
wanted a single interface to the systems. In the early 1990’s NASA needed a solution to
this problem, but found nothing on the market that adequately addressed their needs. So
NASA led an international effort to gather requirements for a next-generation resource
management system. The requirements and functional specification were later adopted as
an IEEE POSIX standard (1003.2d). Next, NASA funded the development of a new
resource management system compliant with the standard. Thus the Portable Batch Sys-
tem (PBS) was born.
PBS was quickly adopted on distributed parallel systems and replaced NQS on traditional
supercomputers and server systems. Eventually the entire industry evolved toward distrib-
uted parallel systems, taking the form of both special purpose and commodity clusters.
Managers of such systems found that the capabilities of PBS mapped well onto cluster
systems. (For information on converting from NQS to PBS, see Appendix B.)
4 Chapter 1
Introduction
The PBS story continued when MRJ-Veridian (the R&D contractor that developed PBS
for NASA) released the Portable Batch System Professional Edition (PBS Pro), a com-
mercial, enterprise-ready, workload management solution. Three years later, the MRJ-Ver-
idian PBS Products business unit was acquired by Altair Engineering, Inc. Altair set up
the PBS Products unit as a subsidiary company named Altair Grid Technologies focused
on PBS Professional and related Grid software. This unit then became part of Altair Engi-
neering.
The PBS Professional product is developed by the same team that originally designed PBS
for NASA. In addition to the core engineering team, Altair Engineering includes individu-
als who have supported PBS on computers around the world, including some of the largest
supercomputers in existence. The staff includes internationally-recognized experts in
resource-management and job-scheduling, supercomputer optimization, message-passing
programming, parallel computation, and distributed high-performance computing. In
addition, the PBS team includes co-architects of the NASA Metacenter (the first full-pro-
duction geographically distributed meta-computing grid), co-architects of the Department
of Defense MetaQueueing (prototype Grid) Project, co-architects of the NASA Informa-
tion Power Grid, and co-chair of the Global Grid Forum’s Scheduling Group.
PBS Professional provides many features and benefits to both the computer system user
and to companies as a whole. A few of the more important features are listed below to give
the reader both an indication of the power of PBS, and an overview of the material that
will be covered in later chapters in this book.
PBS Professional 9.1 5
User’s Guide
Enterprise-wide Resource Sharing provides transparent job scheduling on any PBS sys-
tem by any authorized user. Jobs can be submitted from any client system both local and
remote, crossing domains where needed.
Multiple User Interfaces provides a graphical user interface for submitting batch and
interactive jobs; querying job, queue, and system status; and monitoring job progress. PBS
also provides a traditional command line interface.
Security and Access Control Lists permit the administrator to allow or deny access to PBS
systems on the basis of username, group, host, and/or network domain.
Job Accounting offers detailed logs of system activities for charge-back or usage analysis
per user, per group, per project, and per compute host.
Automatic File Staging provides users with the ability to specify any files that need to be
copied onto the execution host before the job runs, and any that need to be copied off after
the job completes. The job will be scheduled to run only after the required files have been
successfully transferred.
Parallel Job Support works with parallel programming libraries such as MPI, PVM and
HPF. Applications can be scheduled to run within a single multi-processor computer or
across multiple systems.
System Monitoring includes a graphical user interface for system monitoring. Displays
vnode status, job placement, and resource utilization information for both stand-alone sys-
tems and clusters.
Distributed Clustering allows customers to utilize physically distributed systems and clus-
ters, even across wide-area networks.
Common User Environment offers users a common view of the job submission, job query-
ing, system status, and job tracking over all systems.
Cross-System Scheduling ensures that jobs do not have to be targeted to a specific com-
puter system. Users may submit their job, and have it run on the first available system that
meets their resource requirements.
Job Priority allows users the ability to specify the priority of their jobs; defaults can be
provided at both the queue and system level.
Username Mapping provides support for mapping user account names on one system to
the appropriate name on remote server systems. This allows PBS to fully function in envi-
ronments where users do not have a consistent username across all hosts.
Fully Configurable. PBS was designed to be easily tailored to meet the needs of different
sites. Much of this flexibility is due to the unique design of the scheduler module which
permits significant customization.
Broad Platform Availability is achieved through support of Windows 2000 and XP, and
every major version of UNIX and Linux, from workstations and servers to supercomput-
ers. New platforms are being supported with each new release.
Job Arrays are a mechanism for containerizing related work, making it possible to submit,
query, modify and display a set of jobs as a single unit.
PBS Professional 9.1 7
User’s Guide
Chapter 2
Concepts and Terms
PBS is a distributed workload management system. As such, PBS handles the manage-
ment and monitoring of the computational workload on a set of one or more computers.
Modern workload management solutions like PBS Professional include the features of tra-
ditional batch queueing but offer greater flexibility and control than first generation batch
systems (such as NQS).
Scheduling The process of selecting which jobs to run, when, and where,
according to a predetermined policy. Sites balance competing needs
and goals on the system(s) to maximize efficient use of resources
(both computer time and people time).
Monitoring The act of tracking and reserving system resources and enforcing
usage policy. This includes both software enforcement of usage
limits and user or administrator monitoring of scheduling policies
to see how well they are meeting stated goals.
8 Chapter 2
Concepts and Terms
2.1 PBS Components
PBS consist of two major component types: user-level commands and system daemons/
services. A brief description of each is given here to help you understand how the pieces
fit together, and how they affect you.
PBS
Commands
Kernel
Server
Jobs
MOM
Scheduler
Batch
Job
Commands PBS supplies both command line programs that are POSIX
1003.2d conforming and a graphical interface. These are used
to submit, monitor, modify, and delete jobs. These client com-
mands can be installed on any system type supported by PBS
and do not require the local presence of any of the other com-
ponents of PBS.
Server The Job Server daemon/service is the central focus for PBS.
Within this document, it is generally referred to as the Server or
by the execution name pbs_server. All commands and the other
PBS Professional 9.1 9
User’s Guide
daemons/services communicate with the Server via an Internet Pro-
tocol (IP) network. The Server’s main function is to provide the
basic batch services such as receiving/creating a batch job, modify-
ing the job, and running the job. Normally, there is one Server man-
aging a given set of resources. However if the Server Failover
feature is enabled, there will be two Servers.
Job Executor The Job Executor or MOM is the daemon/service which actually
(MOM) places the job into execution. This process, pbs_mom, is informally
called MOM as it is the mother of all executing jobs. (MOM is a
reverse-engineered acronym that stands for Machine Oriented
Mini-server.) MOM places a job into execution when it receives a
copy of the job from a Server. MOM creates a new session that is as
identical to a user login session as is possible. (For example under
UNIX, if the user’s login shell is csh, then MOM creates a session
in which .login is run as well as .cshrc.) MOM also has the
responsibility for returning the job’s output to the user when
directed to do so by the Server. One MOM runs on each computer
which will execute PBS jobs.
The following section defines important terms and concepts of PBS. The reader should
review these definitions before beginning the planning process prior to installation of
PBS. The terms are defined in an order that best allows the definitions to build on previous
terms.
If a host has more than one virtual processor, the VPs may be
assigned to different jobs or used to satisfy the requirements of
a single job (exclusive). This ability to temporarily allocate the
entire host to the exclusive use of a single job is important for
some multi-host parallel applications. Note that PBS enforces a
one-to-one allocation scheme of cluster host VPs ensuring that
the VPs are not over-allocated or over-subscribed between mul-
tiple jobs. (See also vnode and virtual processors.)
Load Balance A policy wherein jobs are distributed across multiple timeshared
hosts to even out the workload on each host. Being a policy, the dis-
tribution of jobs across execution hosts is solely a function of the
Job Scheduler.
Queue A queue is a named container for jobs within a Server. There are
two types of queues defined by PBS, routing and execution. A rout-
ing queue is a queue used to move jobs to other queues including
those that exist on different PBS Servers. A job must reside in an
execution queue to be eligible to run and remains in an execution
queue during the time it is running. In spite of the name, jobs in a
queue need not be processed in queue order (first-come first-served
or FIFO).
Vnode Attribute Vnodes have attributes associated with them that provide control
information. The attributes defined for vnodes are: state, the list of
jobs to which the vnode is allocated, properties, max_running,
max_user_run, max_group_run, and both assigned and
available resources (“resources_assigned” and
“resources_available”).
Virtual Processor A vnode may be declared to consist of one or more virtual proces-
(VP) sors (VPs). The term virtual is used because the number of VPs
declared does not have to equal the number of real processors
(CPUs) on the physical vnode. The default number of virtual pro-
cessors on a vnode is the number of currently functioning physical
processors; the PBS Manager can change the number of VPs as
required by local policy.
The remainder of this chapter provides additional terms, listed in alphabetical order.
Group ID (GID) This unique number represents a specific group (see Group).
PBS Professional 9.1 13
User’s Guide
Group Group refers to collection of system users (see Users). A user must
be a member of a group and may be a member of more than one.
Membership in a group establishes one level of privilege, and is
also often used to control or limit access to system resources.
Job or Batch Job The basic execution object managed by the batch subsystem. A job
is a collection of related processes which is managed as a whole. A
job can often be thought of as a shell script running a set of tasks.
Operator A person authorized to use some but not all of the restricted
capabilities of PBS is an operator.
PBS_HOME Refers to the path under which PBS was installed on the local
system. Your local system administrator can provide the specific
location.
Rerunnable If a PBS job can be terminated and its execution restarted from the
beginning without harmful side effects, the job is rerunnable.
Stage In This process refers to moving a file or files to the execution host
prior to the PBS job beginning execution.
14 Chapter 2
Concepts and Terms
Stage Out This process refers to moving a file or files off of the execution
host after the PBS job completes execution.
Job Array A collection of jobs submitted under a single job id. These
jobs can be modified, queried and displayed as a set.
PBS Professional 9.1 15
User’s Guide
Chapter 3
Getting Started With PBS
This chapter introduces the user to PBS Professional. It describes new user-level features
in this release, explains the different user interfaces, introduces the concept of a PBS
“job”, and shows how to set up your environment for running batch jobs with PBS.
PBS Professional has new features. The sort_priority option to job_sort_key is replaced
with the job_priority option. The following is a list of new features and changes in PBS
Professional release 9.1. More detail is given in the indicated sections.
Support has been added for SLES 10 on x86, x86_64, and IA64.
Important: The full list of new features in this release of PBS Professional is
given in the PBS Professional Administrator’s Guide.
3.2 Deprecations
From the user's perspective, a workload management system allows you to make more
efficient use of your time. You specify the tasks you need executed. The system takes care
of running these tasks and returning the results to you. If the available computers are full,
then the workload management system holds your work and runs it when the resources are
available.
With PBS you create a batch job which you then submit to PBS. A batch job is a file (a
shell script under UNIX or a cmd batch file under Windows) containing the set of com-
mands you want to run on some set of execution machines. It also contains directives
which specify the characteristics (attributes) of the job, and resource requirements (e.g.
memory or CPU time) that your job needs. Once you create your PBS job, you can reuse it
if you wish. Or, you can modify it for subsequent runs. For example, here is a simple PBS
batch job:
UNIX: Windows:
#!/bin/sh #PBS -l walltime=1:00:00
#PBS -l walltime=1:00:00 #PBS -l mem=400mb,ncpus=4
#PBS -l mem=400mb,ncpus=4 my_application
./my_application
Don’t worry about the details just yet; the next chapter will explain how to create a batch
job of your own.
PBS provides two user interfaces: a command line interface (CLI) and a graphical user
interface (GUI). The CLI lets you type commands at the system prompt. The GUI is a
graphical point-and-click interface. The “user commands” are discussed in this book; the
“administrator commands” are discussed in the PBS Professional Administrator’s
Guide. The subsequent chapters of this book will explain how to use both the CLI and
GUI versions of the user commands to create, submit, and manipulate PBS jobs.
PBS Professional 9.1 17
User’s Guide
.
Table 1: PBS Professional User and Manager Commands
In order to have your system environment interact seamlessly with PBS, there are several
items that need to be checked. In many cases, your system administrator will have already
set up your environment to work with PBS.
In order to use PBS to run your work, the following are needed:
User must have access to the resources/hosts that the site has configured for PBS
User must have a valid account (username and group) on the execution hosts
User must be able to transfer files between hosts (e.g. via rcp or scp)
The subsequent sections of this chapter discuss these requirements in detail, and provide
various setup procedures.
By default PBS will use your login identifier as the username under which to run your job.
This can be changed via the “-u” option to qsub. See section 4.13.14 “Specifying Job
User ID” on page 69. The user submitting the job must be authorized to run the job under
the execution user name (whether explicitly specified or not).
A user's job may not run if the user's start-up files (i.e .cshrc, .login, or .profile)
contain commands which attempt to set terminal characteristics. Any such command
sequence within these files should be skipped by testing for the environment variable
PBS_ENVIRONMENT. This can be done as shown in the following sample .login:
...
setenv MANPATH /usr/man:/usr/local/man:$MANPATH
if ( ! $?PBS_ENVIRONMENT ) then
do terminal settings here
endif
PBS Professional 9.1 19
User’s Guide
You should also be aware that commands in your startup files should not generate output
when run under PBS. As in the previous example, commands that write to stdout should
not be run for a PBS job. This can be done as shown in the following sample .login:
...
setenv MANPATH /usr/man:/usr/local/man:$MANPATH
if ( ! $?PBS_ENVIRONMENT ) then
do terminal settings here
run command with output here
endif
When a PBS job runs, the “exit status” of the last command executed in the job is reported
by the job’s shell to PBS as the “exit status” of the job. (We will see later that this is impor-
tant for job dependencies and job chaining.) However, the last command executed might
not be the last command in your job. This can happen if your job’s shell is csh on the exe-
cution host and you have a .logout there. In that case, the last command executed is
from the .logout and not your job. To prevent this, you need to preserve the job’s exit
status in your .logout file, by saving it at the top, then doing an explicit exit at the
end, as shown below:
Likewise, if the user’s login shell is csh the following message may appear in the stan-
dard output of a job:
This message is produced by many csh versions when the shell determines that its input
is not a terminal. Short of modifying csh, there is no way to eliminate the message. For-
tunately, it is just an informative message and has no effect on the job.
An interactive job comes complete with a pseudotty suitable for running those commands
that set terminal characteristics. But more importantly, it does not caution the user that
starting something in the background that would persist after the user has exited from the
interactive environment might cause trouble for some moms. They could believe that once
the interactive session terminates, all the user's processes are gone with it. For example,
applications like ssh-agent background themselves into a new session and would prevent a
20 Chapter 3
Getting Started With PBS
CPU set-enabled mom from deleting the CPU set for the job. This in turn might cause
subsequent failed attempts to run new jobs, resulting in them being placed in a held state.
The PBS “man pages” (UNIX manual entries) are installed on SGI systems under /usr/
bsd, or for the Altix, in /usr/pbs/man. In order to find the PBS man pages, users will
need to ensure that /usr/bsd is set within their MANPATH. The following example illus-
trates this for the C shell:
setenv MANPATH /usr/man:/usr/local/man:/usr/bsd:$MANPATH
This section discusses the setup steps needed for running PBS Professional in a Microsoft
Windows environment, including host and file access, passwords, and restrictions on
home directories.
Each Windows user is assumed to have a home directory (HOMEDIR) where his/her PBS
job would initially be started. (The home directory is also the starting location of files
when users specify relative path arguments to qsub/qalter -W stagein/stage-
out options.)
If a user has not been explicitly assigned a home directory, then PBS will use this Win-
dows-assigned default as the base location for the user’s default home directory. More
specifically, the actual home path will be:
For instance, if a userA has not been assigned a home directory, it will default to a local
home directory of:
UserA’s job will use the above path as working directory. Any relative pathnames in
stagein, stageout, output, error file delivery will resolve to the above path.
PBS Professional 9.1 21
User’s Guide
Note that Windows can return as PROFILE_PATH one of the following forms:
A PBS job is run from a user account and the associated username string must conform to
the POSIX-1 standard for portability. That is, the username must contain only alphanu-
meric characters, dot (.), underscore (_), and/or hyphen “-”. The hyphen must not be the
first letter of the username. If “@” appears in the username, then it will assumed to be in
the context of a Windows domain account: username@domainname. An exception to
the above rule is the space character, which is allowed. If a space character appears in a
username string, then it will be displayed quoted and must be specified in a quoted man-
ner. The following example requests the job to run under account “Bob Jones”.
The Windows rhosts file is located in the user's [PROFILE_PATH], for example:
\Documents and Settings\username\.rhosts, with the format:
hostname username
This file can also determine if a remote user is allowed to submit jobs to the local PBS
Server, if the mapped user is an Administrator account. For example, the following entry
in user susan’s .rhosts file on the server would permit user susan to run jobs sub-
mitted from her workstation wks031:
wks031 susan
Furthermore, in order for Susan’s output files from her job to be returned to her automati-
22 Chapter 3
Getting Started With PBS
cally by PBS, she would need to add an entry to her .rhosts file on her workstation
naming the execution host Host1.
Host1 susan
If instead, Susan has access to several execution hosts, she would need to add all of them
to her .rhosts file:
Host1 susan
Host2 susan
Host3 susan
Note that Domain Name Service (DNS) on Windows may return different permutations
for a full hostname, thus it is important to list all the names that a host may be known. For
instance, if Host4 is known as "Host4", "Host4.<subdomain>", or "Host4.<subdo-
main>.<domain>" you should list all three in the .rhosts file.
Host4 susan
Host4.subdomain susan
Host4.subdomain.domain susan
As discussed in the previous section, usernames with embedded white space must also be
quoted if specified in any hosts.equiv or .rhosts files, as shown below.
In Windows XP (unlike Windows 2000), when you map a drive, it is mapped "locally" to
your session. The mapped drive cannot be seen by other processes outside of your session.
A drive mapped on one session cannot be un-mapped in another session even if it's the
same user. This has implications for running jobs under PBS. Specifically if you map a
drive, chdir to it, and submit a job from that location, the vnode that executes the job
may not be able to deliver the files back to the same location from which you issued
qsub. The workaround is to use the “-o” or “-e” options to qsub and specify a local
(non-mapped) directory location for the job output and error files. For details see section
4.13.2 “Redirecting Output and Error Files” on page 63.
There are a number of environment variables provided to the PBS job. Some are taken
PBS Professional 9.1 23
User’s Guide
from the user’s environment and carried with the job. Others are created by PBS. Still oth-
ers can be explicitly created by the user for exclusive use by PBS jobs. All PBS-provided
environment variable names start with the characters “PBS_”. Some are then followed by
a capital O (“PBS_O_”) indicating that the variable is from the job’s originating environ-
ment (i.e. the user’s). Appendix A gives a full listing of all environment variables provided
to PBS jobs and their meaning. The following short example lists some of the more useful
variables, and typical values.
PBS_O_HOME=/u/user1
PBS_O_LOGNAME=user1
PBS_O_PATH=/usr/new/bin:/usr/local/bin:/bin
PBS_O_SHELL=/sbin/csh
PBS_O_HOST=cray1
PBS_O_WORKDIR=/u/user1
PBS_O_QUEUE=submit
PBS_JOBID=16386.cray1
PBS_QUEUE=crayq
PBS_ENVIRONMENT=PBS_INTERACTIVE
There are a number of ways that you can use these environment variables to make more
efficient use of PBS. In the example above we see PBS_ENVIRONMENT, which we used
earlier in this chapter to test if we were running under PBS. Another commonly used vari-
able is PBS_O_WORKDIR which contains the name of the directory from which the user
submitted the PBS job.
There are also two environment variables that you can set to affect the behavior of PBS.
The environment variable PBS_DEFAULT defines the name of the default PBS Server.
Typically, it corresponds to the system name of the host on which the Server is running. If
PBS_DEFAULT is not set, the default is defined by an administrator established file (usu-
ally /etc/pbs.conf on UNIX, and [PBS Destination Folder]\pbs.conf
on Windows).
The environment variable PBS_DPREFIX determines the prefix string which identifies
directives in the job script. The default prefix string is “#PBS”; however the Windows
user may wish to change this as discussed in section 4.11 “Changing the Job’s PBS Direc-
tive” on page 57.
24 Chapter 3
Getting Started With PBS
3.10 Temporary Scratch Space: TMPDIR
PBS creates an environment variable, TMPDIR, which contains the full path name to a
temporary “scratch” directory created for each PBS job. The directory will be removed
when the job terminates.
Under Windows, TMP will also be set to the value of %TMPDIR%. The temporary directory
will be created under either \winnt\temp or \windows\temp, unless an alternative
directory was specified by the administrator in the MOM configuration file.
Users can access the job-specific temporary space, by changing directory to it inside their
job script. For example:
UNIX: Windows:
... ...
cd $TMPDIR cd %TMPDIR%
... ...
PBS Professional 9.1 25
User’s Guide
Chapter 4
Submitting a PBS Job
This chapter describes virtual nodes, how to submit a PBS job, how to use resources for
jobs, how to place your job on vnodes, job attributes, and several related areas.
A virtual node, or vnode, is an abstract object representing a set of resources which form a
usable part of a machine. This could be an entire host, or a nodeboard or a blade. A single
host can be made up of multiple vnodes. Each vnode can be managed and scheduled inde-
pendently. PBS views hosts as being composed of one or more vnodes. Jobs run on one
or more vnodes. See the pbs_node_attributes(7B) man page.
A host is any computer. Execution hosts used to be called nodes. However, some
machines such as the Altix can be treated as if they are made up of separate pieces contain-
ing CPUs, memory, or both. Each piece is called a vnode. Some hosts have a single vnode
and some have multiple vnodes. PBS treats all vnodes alike in most respects. Chunks can-
not be split across hosts, but they can be split across vnodes on the same host.
Resources that are defined at the host level are applied to vnodes. A host-level resource is
shared among the vnodes on that host. This sharing is managed by the MOM.
26 Chapter 4
Submitting a PBS Job
4.1.2 Vnode Types
What were called nodes are now called vnodes. All vnodes are treated alike, and are
treated the same as what were called “time-shared nodes”. The types “time-shared” and
“cluster” are deprecated. The :ts suffix is deprecated. It is silently ignored, and not pre-
served during rewrite. The vnode attribute ntype is only used to distinguish between
PBS and Globus vnodes. It is read-only.
Resources can be available on the server and queues, and on vnodes. Jobs can request
resources. Resources are allocated to jobs, and some resources such as memory are con-
sumed by jobs. The scheduler matches requested resources with available resources,
according to rules defined by the administrator. PBS can enforce limits on resource usage
by jobs.
PBS provides built-in resources, and in addition, allows the administrator to define custom
resources. The administrator can specify which resources are available on a given vnode,
as well as at the server or queue level (e.g. floating licenses.) Vnodes can share resources.
The administrator can also specify default arguments for qsub. These arguments can
include resources. See the qsub(1B) man page.
Resources made available by defining them via resources_available at the server level are
only used as job-wide resources. These resources (e.g. walltime, server_dyn_res) are
requested using -l RESOURCE=VALUE. Resources made available at the host (vnode)
level are only used as chunk resources, and can only be requested within chunks using -l
select=RESOURCE=VALUE. Resources such as mem and ncpus can only be used at the
vnode level.
Resources are allocated to jobs both by explicitly requesting them and by applying speci-
fied defaults. Jobs explicitly request resources either at the vnode level in chunks defined
in a selection statement, or in job-wide resource requests. See the pbs_resources(7B) man-
ual page.
Jobs are assigned limits on the amount of resources they can use. These limits apply to
how much the job can use on each vnode (per-chunk limit) and to how much the whole job
can use (job-wide limit). Limits are derived from both requested resources and applied
default resources.
Each chunk's per-chunk limits determine how much of any resource can be used in that
PBS Professional 9.1 27
User’s Guide
chunk. Per-chunk resource usage limits are the amount of per-chunk resources requested,
both from explicit requests and from defaults.
Job resource limits set a limit for per-job resource usage. Job resource limits are derived
in this order from:
explicitly requested job-wide resources (e.g. -l resource=value)
the select specification (e.g. -l select =...)
the queue’s default_resources.RES
the server’s default_resources.RES
the queue’s resources_max.RES
the server’s resources_max.RES
Various limit checks are applied to jobs. If a job's job resource limit exceeds queue or
server restrictions, it will not be put in the queue or accepted by the server. If, while run-
ning, a job exceeds its limit for a consumable or time-based resource, it will be terminated.
A “consumable” resource is one that is reduced by being used, for example, ncpus,
licenses, or mem. A “non-consumable” resource is not reduced through use, for example,
walltime or a boolean resource.
Resources are tracked in server, queue, vnode and job attributes. Servers, queues and
vnodes have two attributes, resources_available.RESOURCE and
resources_assigned.RESOURCE. The resources_available.RESOURCE attribute tracks
the total amount of the resource available at that server, queue or vnode, without regard to
how much is in use. The resources_assigned.RESOURCE attribute tracks how much of
that resource has been assigned to jobs at that server, queue or vnode. Jobs have an
attribute called resources_used.RESOURCE which tracks the amount of that resource
used by that job.
The administrator can set server and queue defaults for resources used in chunks. See the
PBS Professional Administrator’s Guide and the pbs_server_attributes(7B) and
pbs_queue_attributes(7B) manual pages.
28 Chapter 4
Submitting a PBS Job
4.2.0.1 Unset Resources
When job resource requests are being matched with available resources, a numerical
resource that is unset on a server, queue or host is treated as if it were zero, and an unset
string cannot be matched. An unset Boolean resource is treated as if it is set to “False”.
The resource name is any string made up of alphanumeric characters, where the first char-
acter is alphabetic. Resource names must start with an alphabetic character and can con-
tain alphanumeric, underscore (“_”), and dash (“-”) characters.
If a string resource value contains spaces or shell metacharacters, enclose the string in
quotes, or otherwise escape the space and metacharacters. Be sure to use the correct
quotes for your shell and the behavior you want. If the string resource value contains
commas, the string must be enclosed in an additional set of quotes so that the command
(e.g. qsub, qalter) will parse it correctly. If the string resource value contains quotes, plus
signs, equal signs, colons or parentheses, the string resource value must be enclosed in yet
another set of additional quotes.
b or w bytes or words.
kb or kw Kilo (210, 1024) bytes or words.
PBS Professional 9.1 29
User’s Guide
string array Comma-separated list of strings. Strings in string arrays may not
contain commas. Non-consumable. Resource request will suc-
ceed if request matches one of the values. Resource request can
contain only one string.
time specifies a maximum time period the resource can be used. Time is
expressed in seconds as an integer, or in the form:
[[hours:]minutes:]seconds[.milliseconds]
The table below lists the built-in resources that can be requested by PBS jobs on any sys-
30 Chapter 4
Submitting a PBS Job
tem.
Resource Description
arch System architecture. For use inside chunks only. One architecture can
be defined for a vnode. One architecture can be requested per vnode.
Allowable values and effect on job placement are site-dependent. Type:
string.
cput Amount of CPU time used by the job for all processes on all vnodes.
Establishes a job resource limit. Non-consumable. Type: time.
file Size of any single file that may be created by the job. Type: size.
host Name of execution host. For use inside chunks only. Automatically set
to the short form of the hostname in the Mom attribute. Cannot be
changed. Site-dependent. Type: string.
mem Amount of physical memory i.e. workingset allocated to the job, either
job-wide or vnode-level. Consumable. Type: size.
mpiprocs Number of MPI processes for this chunk. Defaults to 1 if ncpus > 0, 0
otherwise. For use inside chunks only. Type: integer.
Resource Description
ompthreads Number of OpenMP threads for this chunk. Defaults to ncpus if not
specified. For use inside chunks only. Type: integer.
For the MPI process with rank 0, the environment variables NCPUS
and OMP_NUM_THREADS are set to the value of ompthreads. For
other MPI processes, behavior is dependent on MPI implementation.
pcput Amount of CPU time allocated to any single process in the job. Estab-
lishes a job resource limit. Non-consumable. Type: time.
pmem Amount of physical memory (workingset) for use by any single process
of the job. Establishes a job resource limit. Consumable. Type: size
pvmem Amount of virtual memory for use by the job. Establishes a job resource
limit. Not consumable. Type: size.
software Site-specific software specification. For use only in job-wide resource
requests. Allowable values and effect on job placement are site-depen-
dent. Type: string.
vmem Amount of virtual memory for use by all concurrent processes in the job.
Establishes a job resource limit, or when used within a chunk, estab-
lishes a per-chunk limit. Consumable. Type: size.
vnode Name of virtual node (vnode) on which to execute. For use inside
chunks only. Site-dependent. Type: string. See the
pbs_node_attributes(7B) man page.
walltime Actual elapsed (wall-clock, except during Daylight Savings transitions)
time during which the job can run. Establishes a job resource limit.
Non-consumable. Default: 5 years. Type: time.
The "place" specification cannot be used without the "select" specification. See section
4.6 “Placing Jobs on Vnodes” on page 44.
32 Chapter 4
Submitting a PBS Job
A "select" specification cannot be used with -lncpus, -lmem, -lvmem, -larch, -lhost.
The built-in resource "software" is not a vnode-level resource. See “PBS Resources” on
page 26.
At the command line, the user can create a job script, and submit it. During submission it
is possible to override elements in the job script. Alternatively, PBS will read from input
typed at the command line.
UNIX Users: Since the job file under UNIX is a “shell script”, the first line of
the job file specifies which shell to use to execute the script.
The Bourne shell (sh) is the default, but you can change this to
your favorite shell. This first line can be omitted if it is accept-
able for the job file to be interpreted using the Bourne shell. The
remainder of the examples in this manual will assume these
conditions are true. If this is not true for your site, simply add
the shell specifier.
Windows Users: Windows does not use a shell specification. This line will not
appear for a Windows job.
PBS directives are at the top of the script file. They are used to request resources or set
PBS Professional 9.1 33
User’s Guide
attributes. A directive begins with the default string “#PBS”. Attributes can also be set
using options to the qsub command, which will override directives.
These can be programs or commands. This is where the user specifies an application to be
run.
Important: In Windows, if you use notepad to create a job script, the last line
does not automatically get newline-terminated. Be sure to put one
explicitly, otherwise, PBS job will get the following error message:
More?
Job attributes can be set either by using directives or by giving options to the qsub com-
mand. These two methods have the same functionality. Options to the qsub command
will override PBS directives, which override defaults. Some job attributes have default
values preset in PBS. Some job attributes’ default values are set at the user’s site.
There are a few ways to submit a PBS job using the command line. The first is to create a
job script and submit it using qsub.
For example, with job script “myjob”, the user can submit it by typing
qsub myjob
16387.foo.exampledomain
sequence-number.servername
34 Chapter 4
Submitting a PBS Job
sequence-number[].servername.domain
You’ll need the job identifier for any actions involving the job, such as checking job sta-
tus, modifying the job, tracking the job, or deleting the job.
If “my_job” contains the following, the user is naming the job “testjob”, and running a
program called “myprogram”.
#!/bin/sh
#PBS -N testjob
./myprogram
The largest possible job ID is the 7-digit number 9,999,999. After this has been reached,
job IDs start again at zero.
PBS directives in a script can be overridden by using the equivalent options to qsub. For
example, to override the PBS directive naming the job, and name it “newjob”, the user
could type
qsub -N newjob my_job
Jobs can also be submitted without specifying values for attributes. The simplest way to
submit a job is to type
qsub myjobscript <ret>
If myjobscript contains
#!/bin/sh
./myapplication
It is possible to submit a job to PBS without first creating a job script file. If you run the
qsub command, with the resource requests on the command line, and then press “enter”
without naming a job file, PBS will read input from the keyboard. (This is often referred to
as a “here document”.) You can direct qsub to stop reading input and submit the job by
PBS Professional 9.1 35
User’s Guide
typing on a line by itself a control-d (UNIX) or control-z, then enter (Win-
dows).
Note that, under UNIX, if you enter a control-c while qsub is reading input, qsub
will terminate the process and the job will not be submitted. Under Windows, however,
often the control-c sequence will, depending on the command prompt used, cause
qsub to submit the job to PBS. In such case, a control-break sequence will usually
terminate the qsub command.
qsub <ret>
[directives]
[tasks]
ctrl-D
If you need to pass arguments to a job script, you can either use the -v option to qsub,
where you set and use environment variables, or use standard input. When using standard
input, any #PBS directives in the job script will be ignored. You can replace directives
with the equivalent options to qsub. To use standard input, you can either use this form:
With this form, you can type the #PBS directives on lines the name of the job script.
If you do not use the -n option to qsub, or specify it via a #PBS directive (second form
only), the job will be named STDIN.
PBS provides built-in resources, and allows the administrator to define custom resources.
The administrator can specify which resources are available on a given vnode, as well as
at the queue or server level (e.g. floating licenses.) See “PBS Resources” on page 26 for
a listing of built-in resources.
36 Chapter 4
Submitting a PBS Job
Resources defined at the queue or server level apply to an entire job. If they are defined
at the vnode level, they apply only to the part of the job running on that vnode.
Jobs request resources, which are allocated to the job, along with any defaults specified by
the administrator.
Custom resources are used for application licenses, scratch space, etc., and are defined by
the administrator. See “Customizing PBS Resources” on page 371 of the PBS Profes-
sional Administrator’s Guide. Custom resources are used the same way built-in resources
are used.
Jobs request resources in two ways. They can use the select statement to define chunks
and specify the quantity of each chunk. A chunk is a set of resources that are to be allo-
cated as a unit. Jobs can also use a job-wide resource request, which uses
resource=value pairs. The -l nodes= form is deprecated, and if it is used, it will
be converted into a request for chunks and job-wide resources.
The qsub, qalter and pbs_rsub commands are used to request resources.
Most jobs submitted with "-lnodes" will continue to work as expected. These jobs will be
automatically converted to the new syntax. However, job tasks may execute in an unex-
pected order, because vnodes may be assigned in a different order.
Jobs submitted with old syntax that ran successfully on versions of PBS Professional prior
to 8.0 can fail because a limit that was per-chunk is now job-wide. This is an example of a
job submitted using -l nodes=X -lmem=M that would fail because the mem limit is now
job-wide. If the following conditions are true:
a. PBS Professional 9.0 or later using standard MPICH
b. The job is submitted with qsub -lnodes=5 -lmem=10GB
c. The master process of this job tries to use more than 2GB
The job will be killed, where in <= 7.0 the master process could use 10GB before being
killed. 10GB is now a job-wide limit, divided up into a 2GB limit per chunk.
Resources are allocated to jobs both because jobs explicitly request them and because
specified default resources are applied to jobs. Jobs explicitly request resources either at
the vnode level in chunks defined in a selection statement, or in job-wide resource
requests. An explicit resource request can appear in the following, in order of precedence:
qalter
qsub
PBS job script directives
A chunk declares the value of each resource in a set of resources which are to be allocated
as a unit to a job. It is the smallest set of resources that will be allocated to a job.
All of a chunk must be taken from a single host. A chunk request is a vnode-level request.
Chunks are described in a selection statement, which specifies how many of each kind of
chunk. A selection statement has this form:
-l select=[N:]chunk[+[N:]chunk ...]
ncpus=2:mem=10GB:host=Host1
ncpus=1:mem=20GB:arch=linux
-l select=2:ncpus=1:mem=10GB+3:ncpus=2:mem=8GB:arch=solaris
Each job submission can have only one “-l select” statement.
Host-level resources can only be requested as part of a chunk. Server or queue resources
cannot be requested as part of a chunk.
A job-wide resource request is for resource(s) at the server or queue level. Job-wide
resources are requested outside of a selection statement, in this form:
38 Chapter 4
Submitting a PBS Job
-l keyword=value[,keyword=value ...]
Job-wide resources are used for requesting floating licenses or other resources not tied to
specific vnodes, such as cput and walltime.
A resource request can specify whether a boolean resource should be true or false. For
example, if some vnodes have green=true and some are red=true, a selection statement for
two vnodes, each with one CPU, all green and no red, would be:
-l select=2:green=true:red=false:ncpus=1
The next example Windows script shows a job-wide request for walltime and a chunk
request for ncpus and memory.
#PBS -l walltime=1:00:00
#PBS -l select=ncpus=4:mem=400mb
#PBS -j oe
date /t
.\my_application
date /t
Keep in mind the difference between requesting a vnode-level boolean and a job-wide
boolean.
qsub -l select=1:green=True
will request a vnode with green set to True. However,
qsub -l green=True
will request green set to True on the server and/or queue.
Jobs get default resources, both job-wide and per-chunk, with the following order of pre-
cedence, from
PBS Professional 9.1 39
User’s Guide
For each chunk in the job's selection statement, first queue chunk defaults are applied,
then server chunk defaults are applied. If the chunk does not contain a resource defined in
the defaults, the default is added. For a resource RESOURCE, a chunk default is called
"default_chunk.RESOURCE".
For example, if the queue in which the job is enqueued has the following defaults defined:
default_chunk.ncpus=1
default_chunk.mem=2gb
select=2:ncpus=4+1:mem=9gb
will have this specification after the default_chunk elements are applied:
select=2:ncpus=4:mem=2gb+1:ncpus=1:mem=9gb.
The job-wide resource request is checked against queue resource defaults, then against
server resource defaults. If a default resource is defined which is not specified in the
resource request, it is added to the resource request.
Application licenses are set up as resources defined by the administrator. PBS doesn't
actually check out the licenses, the application being run inside the job's session does that.
PBS queries the license server to find out how many floating licenses are available at the
beginning of each scheduling cycle. If you wish to request a site-wide floating license, it
will typically have been set up as a server-level (job-wide) resource. To request an appli-
cation license called AppF, use:
40 Chapter 4
Submitting a PBS Job
If only certain hosts can run the application, they will typically have a host-level boolean
resource set to True. To request the application license and the vnodes on which to run the
application, use:
PBS doesn't actually check out the licenses, the application being run inside the job's ses-
sion does that.
Per-host node-locked licenses are typically set up as either a boolean resource on the
vnode(s) that are licensed for the application. The resource request should include one
license for each host. To request a host with a per-host node-locked license for AppA in
one chunk:
Per-use node-locked licenses are typically set up so that the host(s) that run the application
have the number of licenses that can be used at one time. The number of licenses the job
requests should be the same as the number of instances of the application that will be run.
To request a host with a per-use node-locked license for AppB, where you’ll run one
instance of AppB on two CPUs in one chunk:
qsub -l select=1:ncpus=2:AppB=1
Per-CPU node-locked licenses are set up so that the host has one license for each licensed
CPU. You must request one license for each CPU. To request a host with a node-locked
license for AppC, where you’ll run a job using two CPUs in one chunk:
qsub -l select=1:ncpus=2:AppC=2
Scratch space on a machine is set up as a host-level dynamic resource. The resource will
have a name such as “dynscratch”. To request 10MB of scratch space in one chunk, a
resource request would include:
PBS Professional 9.1 41
User’s Guide
-l select=1:ncpus=N:dynscratch=10MB
The default for walltime is 5 years. The scheduler uses walltime to predict when resources
will become available. Therefore it is useful to request a reasonable walltime for each job.
If neither a node specification nor a selection directive is specified, then a selection direc-
tive will be created requesting 1 chunk with resources specified by the job, and with those
from the queue or server default resource list. These are: ncpus, mem, arch, host, and soft-
ware, as well as any other default resources specified by the administrator.
qsub -l ncpus=4:mem=123mb:arch=linux
select=1:ncpus=4:mem=123mb:arch=linux
Do not mix old style resource or node specification with the select and place statements.
Do not use one in a job script and the other on the command line. This will result in an
error.
If the job is moved from the current queue to a new queue, any default resources in the
job's resource list that were contributed by the current queue are removed. This includes a
select specification and place directive generated by the rules for conversion from the old
syntax. If a job's resource is unset (undefined) and there exists a default value at the new
queue or server, that default value is applied to the job's resource list. If either select or
place is missing from the job's new resource list, it will be automatically generated, using
any newly inherited default values.
Example:
Queue QA
resources_default.ncpus=2
default_chunk.mem=2gb
Queue QB
default_chunk.mem=1gb
no default for ncpus
The following illustrate the equivalent select specification for jobs submitted into queue
QA and then moved to (or submitted directly to) queue QB:
qsub -l ncpus=1
In QA: select=1:ncpus=1:mem=2gb
- Picks up 2gb from queue default chunk and 1 ncpus from qsub
In QB: select=1:ncpus=1:mem=1gb
- Picks up 1gb from queue default chunk and 1 ncpus from qsub
qsub -lmem=4gb
In QA: select=1:ncpus=2:mem=4gb
- Picks up 2 ncpus from queue level job-wide resource default
and 4gb mem from qsub
In QB: select=1:ncpus=1:mem=4gb
- Picks up 1 ncpus from server level job-wide default and 4gb mem from qsub
qsub -l nodes=4
In QA: select=4:ncpus=1:mem=2gb
- Picks up a queue level default memory chunk of 2gb.
(This is not 4:ncpus=2 because in prior versions, "nodes=x" implied
1 CPU per node unless otherwise explicitly stated.)
In QB: select=4:ncpus=1:mem=1gb
(In prior versions, "nodes=x" implied 1 CPU per node unless otherwise
PBS Professional 9.1 43
User’s Guide
explicitly stated, so the ncpus=1 is not inherited from the server default.)
A job’s resource request is converted from old-style to new according to various rules, one
of which is that the conversion is dependent upon where resources are defined. For exam-
ple: The boolean resource “Red” is defined on the server, and the boolean resource “Blue”
is defined at the host level. A job requests “qsub -l Blue=True”. This looks like an old-
style resource request, and PBS checks to see where Blue is defined. Since Blue is
defined at the host level, the request is converted into “-l select=1:Blue=True”. However,
if a job requests “qsub -l Red=True”, while this looks like an old-style resource request,
PBS does not convert it to a chunk request because Red is defined at the server.
Any job submitted with undefined resources, specified either with "-l select" or with "-l
nodes", will not be rejected at submission. The job will be aborted upon being enqueued
in an execution queue if the resources are still undefined. This preserves backward com-
patibility.
Each chunk's per-chunk limits determine how much of any resource can be used in that
chunk. Per-chunk resource usage limits are established by per-chunk resources, both from
explicit requests and from defaults.
Job resource limits set a limit for per-job resource usage. Job resource limits are estab-
lished both by requesting job-wide resources and when per-chunk consumable resources
are summed. Job resource limits from sums of all chunks, including defaults, override
those from job-wide defaults and resource requests. Limits include both explicitly
requested resources and default resources.
44 Chapter 4
Submitting a PBS Job
If a job's job resource limit exceeds queue or server restrictions, it will not be put in the
queue or accepted by the server. If, while running, a job exceeds its limit for a consum-
able or time-based resource, it will be terminated. See The PBS Professional Adminis-
trator's Guide.
If both job resource limits and a selection directive are specified when a job is submitted,
the sum of the resources in the directive must not exceed the specified limits.
For example,
qsub -l ncpus=4:mem=200mb-lselect=2:ncpus=2:mem=100mb
is accepted because neither the sum of the number of CPUs nor the sum of the requested
memory exceeds the specified limits.
However,
will be rejected because the requested number of CPUs, 3, is greater than the specified
limit of 2.
If a select directive is supplied and the corresponding job limits are not specified, then job
limits are created from the directive for each consumable resource.
For example,
qsub -lselect=2:ncpus=3:mem=4gb:arch=linux
The place statement controls how the job is placed on the vnodes from which resources
may be allocated for the job. The place statement can be specified, in order of precedence,
via:
The place statement may be not be used without the select statement.
where
and where
Modifier Meaning
Note that vnodes can have sharing attributes that override job placement requests. See the
pbs_node_attributes(7B) man page.
46 Chapter 4
Submitting a PBS Job
The nodes file contains the names of the vnodes allocated to a job. The nodes file's name
is given by the environment variable PBS_NODEFILE. The order in which hosts appear
in the file is the order in which chunks are specified in the selection directive. The order in
which hostnames appear in the file is hostA X times, hostB Y times, where X is the num-
ber of MPI processes on hostA, Y is the number of MPI processes on hostB, etc. See the
definition of the resources “mpiprocs” and “ompthreads” in “PBS Resources” on page 26.
See also “The mpiprocs Resource” on page 170.
4.6.2 PBS_NODEFILE
The file containing the vnodes allocated to a job lists vnode names. This file's name is
given by the environment variable PBS_NODEFILE.
For jobs which request vnodes via the -lselect= option, the nodes file will contain the
names of the allocated vnodes with each name repeated M times, where M is the number
of mpiprocs specified for that vnode. For example, qsub -l select=3:ncpus=2 -lplace=scat-
ter will result in this PBS_NODEFILE:
vnodeA
vnodeB
vnodeC
vnodeA
vnodeA
vnodeB
vnodeB
vnodeC
vnodeC
For jobs which requested a set of nodes via the -lnodes=nodespec option to qsub, each
vnode allocated to the job will be listed N times, where N is the total number of CPUs
allocated from the vnode divided by the number of threads requested. For example, qsub
-lnodes=4:ncpus=3:ppn=2 will result in each of the four vnodes being written twice (6
PBS Professional 9.1 47
User’s Guide
CPUs divided by 3 from ncpus.) The file will contain the name of the first vnode twice,
followed by the second vnode twice, etc.
The resources allocated from a vnode are only those specified in the job’s schedselect.
This job attribute is created internally by starting with the select specification and apply-
ing any server and queue default_chunk resource defaults that are missing from the select
statement. The schedselect job attribute contains only vnode-level resources. The
exec_vnode job attribute shows which resources are allocated from which vnodes.
The Resource_List attribute is the list of resources requested via qsub, with job-wide
defaults applied. Vnode-level resources from Resource_List are used in the converted
select when the user doesn’t specify a select statement. The converted select statement is
used to fill in gaps in schedselect.
Values for ncpus or mem in the job's Resource_List come from three places:
(1) Resources specified via qsub,
(2) the sum of the values in the select specification (not including default_chunk),
or
(3) resources inherited from queue and/or server resources_default.
Case 3 applies only when the user does not specify -l select, but uses -lnodes or -lncpus
instead.
Examples:
A job requesting -l select=2:ncpus=2 will take 100mb (default_chunk) value from each
48 Chapter 4
Submitting a PBS Job
vnode and have a job wide limit of 200mb (2 * 100mb). The job's Resource_List.mem
will show 200mb.
A job requesting -l ncpus=2 will take 200mb (inherited from resources_default and used
to create the select spec) from one vnode and a job-wide limit of 200mb. The job's
Resource_List.mem will show 200mb.
A job requesting -l nodes=2 will inherit the 200mb from resources_default.mem which
will be the job-wide limit. The memory will be taken from the two vnodes, half (100mb)
from each. The generated select spec is 2:ncpus=1:mem=100mb. The job's
Resource_List.mem will show 200mb.
Unless otherwise specified, the vnodes allocated to the job will be allocated as shared or
exclusive based on the setting of the vnode’s sharing attribute. Each of the following
shows how you would use -l select= and -l place=.
1. A job that will fit in a single host such as an Altix but not in any of the vnodes, packed
into the fewest vnodes:
-l select=1:ncpus=10:mem=20gb
-l place=pack
In earlier versions, this would have been:
-lncpus=10,mem=20gb
2. Request four chunks, each with 1 CPU and 4GB of memory taken from anywhere.
-l select=4:ncpus=1:mem=4GB
-l place=free
3. Allocate 4 chunks, each with 1 CPU and 2GB of memory from between one and four
vnodes which have an arch of “linux”.
-l select=4:ncpus=1:mem=2GB:arch=linux -l place=free
4. Allocate four chunks on 1 to 4 vnodes where each vnode must have 1 CPU, 3GB of
memory and 1 node-locked dyna license available for each chunk.
-l select=4:dyna=1:ncpus=1:mem=3GB -l place=free
5. Allocate four chunks on 1 to 4 vnodes, and 4 floating dyna licenses. This assumes
“dyna” is specified as a server dynamic resource.
-l dyna=4 -l select=4:ncpus=1:mem=3GB -l place=free
PBS Professional 9.1 49
User’s Guide
6. This selects exactly 4 vnodes where the arch is linux, and each vnode will be on a sep-
arate host. Each vnode will have 1 CPU and 2GB of memory allocated to the job.
-lselect=4:mem=2GB:ncpus=1:arch=linux -lplace=scatter
7. This will allocate 3 chunks, each with 1 CPU and 10GB of memory. This will also
reserve 100mb of scratch space if scratch is to be accounted . Scratch is assumed to be on
a file system common to all hosts. The value of “place” depends on the default which is
“place=free”.
-l scratch=100mb -l select=3:ncpus=1:mem=10GB
8. This will allocate 2 CPUs and 50GB of memory on a host named zooland. The value
of “place” depends on the default which defaults to “place=free”:
-l select=1:ncpus=2:mem=50gb:host=zooland
9. This will allocate 1 CPU and 6GB of memory and one host-locked swlicense from each
of two hosts:
-l select=2:ncpus=1:mem=6gb:swlicense=1
-lplace=scatter
11. Here is an odd-sized job that will fit on a single Altix, but not on any one node-
board. We request an odd number of CPUs that are not shared, so they must be “rounded
up”:
-l select=1:ncpus=3:mem=6gb
-l place=pack:excl
12. Here is an odd-sized job that will fit on a single Altix, but not on any one node-
board. We are asking for small number of CPUs but a large amount of memory:
-l select=1:ncpus=1:mem=25gb
-l place=pack:excl
13. Here is a job that may be run across multiple Altix systems, packed into the fewest
vnodes:
-l select=2:ncpus=10:mem=12gb
-l place=free
50 Chapter 4
Submitting a PBS Job
14. Submit a job that must be run across multiple Altix systems, packed into the fewest
vnodes:
-l select=2:ncpus=10:mem=12gb
-l place=scatter
20. Align a large job within one router, if it fits within a router:
-l select=1:ncpus=100:mem=200gb
-l place=pack:group=router
21. Fit large jobs that do not fit within a single router into as few available routers as
possible. Here, RES is the resource used for node grouping:
-l select=1:ncpus=300:mem=300gb
-l place=pack:group=<RES>
22. To submit an MPI job, specify one chunk per MPI task. For a 10-way MPI job
with 2gb of memory per MPI task:
-l select=10:ncpus=1:mem=2gb
PBS Professional 9.1 51
User’s Guide
23. To submit a non-MPI job (including a 1-CPU job or an OpenMP or shared mem-
ory) job, use a single chunk. For a 2-CPU job requiring 10gb of memory:
-l select=1:ncpus=2:mem=10gb
24. Request CPUs and memory on a single host using old syntax:
-l ncpus=5,mem=10gb
will be converted into the equivalent:
-l select=1:ncpus=5:mem=10gb
-l place=pack
25. Request CPUs and memory on a named host along with custom resources includ-
ing a floating license using old syntax:
-l ncpus=1,mem=5mb,host=origin3,opti=1,platform=IRIX64
is converted to the equivalent:
-l select=1:ncpus=1:mem=5gb:host=origin3:platform=IRIX64
-l place=pack
-l opti=1
26. Request one host with a certain property using old syntax:
-lnodes=1:property
is converted to the equivalent:
-l select=1:ncpus=1:property=True
-l place=scatter
27. Request 2 CPUs on each of four hosts with a given property using old syntax:
-lnodes=4:property:ncpus=2
is converted to the equivalent:
-l select=4: ncpus=2:property=True
-l place=scatter
28. Request 1 CPU on each of 14 hosts asking for certain software, licenses and a job
limit amount of memory using old syntax:
-lnodes=14:mpi-fluent:ncpus=1 -lfluent=1,fluent-all=1,fluent-par=13
-l mem=280mb
is converted to the equivalent:
-l select=14:ncpus=1:mem=20mb:mpi_fluent=True
-l place=scatter
52 Chapter 4
Submitting a PBS Job
-l fluent=1,fluent-all=1,fluent-par=13
32. Allocate 4 vnodes, each with 6 CPUs with 3 MPI processes per vnode, with each
vnode on a separate host. The memory allocated would be one-fourth of the memory
specified by the queue or server default if one existed. This results in a different place-
ment of the job from version 5.4:
-l nodes=4:ppn=3:ncpus=2
is converted to:
-l select=4:ncpus=6:mpiprocs=3 -l place=scatter
33. Allocate 4 vnodes, from 4 separate hosts, with the property blue. The amount of
memory allocated from each vnode is 2560MB ( = 10GB / 4) rather than 10GB from each
vnode.
-l nodes=4:blue:ncpus=2 -l mem=10GB
is converted to:
-l select=4:blue=True:ncpus=2:mem=2560mb \
PBS Professional 9.1 53
User’s Guide
-lplace=scatter
When a nodespec is converted into a select statement, the job will have the environment
variables NCPUS and OMP_NUM_THREADS set to the value of ncpus in the first piece
of the nodespec. This may produce incompatibilities with prior versions when a complex
node specification using different values of ncpus and ppn in different pieces is converted.
-lnodes=[N:spec_list | spec_list]
[[+N:spec_list | +spec_list] ...]
[#suffix ...][-lncpus=Z]
where:
The node specification is converted into selection and placement directives as follows:
Each spec_list is converted into one chunk, so that N:spec_list is converted into N chunks.
If spec is hostname :
The chunk will include host=hostname
54 Chapter 4
Submitting a PBS Job
If spec matches any vnode's resources_available.host value:
The chunk will include host=hostname
If spec is property :
The chunk will include property=true
Property must be a site-defined vnode-level boolean resource.
If spec is ppn=P :
The chunk will include mpiprocs=P
If the nodespec is
-lnodes=N:ppn=P
It is converted to
-lselect=N:ncpus=P:mpiprocs=P
Example:
-lnodes=4:ppn=2
is converted into
-lselect=4:ncpus=2:mpiprocs=2
If property is a suffix :
All chunks will include property=true
If excl is a suffix :
The placement directive will be -lplace=scatter:excl
If shared is a suffix :
PBS Professional 9.1 55
User’s Guide
The placement directive will be -lplace=scatter:shared
Example:
-l nodes=3:green:ncpus=2:ppn=2+2:red
is converted to:
-l select=3:green=true:ncpus=4:mpiprocs=2+2 \
:red=true:ncpus=1
-l place=scatter
Node specification syntax for requesting properties is deprecated. The boolean resource
syntax "property=true" is only accepted in a selection directive. It is erroneous to mix old
and new syntax.
The resource specification is converted to select and place statements after any defaults
have been applied.
-lresource=value[:resource=value ...]
-lselect=1[:resource=value ...]
-lplace=pack
with one instance of resource=value for each of the following vnode-level resources in the
resource request:
The qsub command scans the lines of the script file for directives. Scanning will continue
until the first executable line, that is, a line that is not blank, not a directive line, nor a line
whose first non white space character is “#”. If directives occur on subsequent lines, they
will be ignored.
A line in the script file will be processed as a directive to qsub if and only if the string of
characters starting with the first non white space character on the line and of the same
length as the directive prefix matches the directive prefix (i.e. “#PBS”). The remainder of
the directive line consists of the options to qsub in the same syntax as they appear on the
command line. The option character is to be preceded with the “-” character.
If an option is present in both a directive and on the command line, that option and its
argument, if any, will be ignored in the directive. The command line takes precedence. If
an option is present in a directive and not on the command line, that option and its argu-
ment, if any, will be taken from there.
#!/bin/sh 1
#PBS -l walltime=1:00:00 2 #PBS -l walltime=1:00:00
#PBS -l select=mem=400mb 3 #PBS -l select=mem=400mb
#PBS -j oe 4 #PBS -j oe
5
date 6 date /t
./my_application 7 my_application
date 8 date /t
On line one in the example above Windows does not show a shell directive. (The default
on Windows is the batch command language.) Also note that it is possible under both Win-
dows and UNIX to specify to PBS the scripting language to use to interpret the job script
(see the “-S” option to qsub in section 4.13.9 “Specifying Scripting Language to Use” on
page 66). The Windows script will be a .exe or .bat file.
Lines 2-8 of both files are almost identical. The primary differences will be in file and
directory path specification (such as the use of drive letters and slash vs. backslash as the
PBS Professional 9.1 57
User’s Guide
path separator).
Lines 2-4 are PBS directives. PBS reads down the shell script until it finds the first line
that is not a valid PBS directive, then stops. It assumes the rest of the script is the list of
commands or tasks that the user wishes to run. In this case, PBS sees lines 6-8 as being
user commands.
The section “Job Submission Options” on page 61 describes how to use the qsub com-
mand to submit PBS jobs. Any option that you specify to the qsub command line (except
“-I”) can also be provided as a PBS directive inside the PBS script. PBS directives come
in two types: resource requirements and attribute settings.
In our example above, lines 2-3 specify the “-l” resource list option, followed by a spe-
cific resource request. Specifically, lines 2-3 request 1 hour of wall-clock time as a job-
wide request, and 400 megabytes (MB) of memory in a chunk. .
Line 4 requests that PBS join the stdout and stderr output streams of the job into a
single stream.
Finally lines 6-8 are the command lines for executing the program(s) we wish to run. You
can specify as many programs, tasks, or job steps as you need.
By default, the text string “#PBS” is used by PBS to determine which lines in the job file
are PBS directives. The leading “#” symbol was chosen because it is a comment delimiter
to all shell scripting languages in common use on UNIX systems. Because directives look
like comments, the scripting language ignores them.
Under Windows, however, the command interpreter does not recognize the ‘#’ symbol as
a comment, and will generate a benign, non-fatal warning when it encounters each
“#PBS” string. While it does not cause a problem for the batch job, it can be annoying or
disconcerting to the user. Therefore Windows users may wish to specify a different PBS
directive, via either the PBS_DPREFIX environment variable, or the “-C” option to
qsub. For example, we can direct PBS to use the string “REM PBS” instead of “#PBS”
58 Chapter 4
Submitting a PBS Job
and use this directive string in our job script:
REM PBS -l walltime=1:00:00
REM PBS -l select=mem=400mb
REM PBS -j oe
date /t
.\my_application
date /t
Given the above job script, we can submit it to PBS in one of two ways:
For additional details on the “-C” option to qsub, see section 4.13 “Job Submission
Options” on page 61.
Any .bat files that are to be executed within a PBS job script have to be prefixed with
"call" as in:
---[job_b.bat]----------
@echo off
call E:\step1.bat
call E:\step2.bat
------------------------
Without the "call", only the first .bat file gets executed and it doesn't return control to the
calling interpreter.
An example:
--[job_a.bat]---------
PBS Professional 9.1 59
User’s Guide
@echo off
E:\step1.bat
E:\step2.bat
--[job_a.bat]---------
@echo off
call E:\step1.bat
call E:\step2.bat
4.12.2 Passwords
When running PBS in a password-protected Windows environment, you will need to spec-
ify to PBS the password needed in order to run your jobs. There are two methods of doing
this: (1) by providing PBS with a password once to be used for all jobs (“single signon
method”), or (2) by specifying the password for each job when submitted (“per job
method”). Check with your system administrator to see which method was configured at
your site.
To provide PBS with a password to be used for all your PBS jobs, use the
pbs_password command. This command can be used whether or not you have jobs
enqueued in PBS. The command usage syntax is:
When no options are given to pbs_password, the password credential on the default PBS
server for the current user, i.e. the user who executes the command, is updated to the
prompted password. Any user jobs previously held due to an invalid password are not
released.
2. User user has given the current user explicit access via
the ruserok() mechanism:
a. The hostname of the machine from which the current
user is logged in appears in the server's hosts.equiv
file, or
b. The current user has an entry in user's
HOMEDIR\.rhosts file.
Note that pbs_password encrypts the password obtained from the user before sending
it to the PBS Server. The pbs_password command does not change the user's pass-
word on the current host, only the password that is cached in PBS.
The password specified will be shown on screen and will be passed onto the program,
which will then encrypt it and save it securely for use by the job. The password should be
enclosed in double quotes. If you only type the pair of double quotes, you will be
prompted for the password.
The password can also be specified in xpbs using the “SUBMIT-PASSWORD” entry box
in the Submit window. The password you type in will not be shown on the screen.
Important: Both the -Wpwd option to qsub, and the xpbs SUBMIT-
PASSWORD entry box can only be used when submitting jobs to
PBS Professional 9.1 61
User’s Guide
Windows. The UNIX qsub does not support the -Wpwd
option; and if you type a password into the xpbs SUBMIT-
PASSWORD entry box under UNIX, the job will be rejected.
Keep in mind that in a multi-host job, the password supplied will be propagated to all the
sister hosts. This requires that the password be the same on the user's accounts on all the
hosts. The use of domain accounts for a multi-host job will be ideal in this case.
Accessing network share drives/resources within a job session also requires that you sub-
mit the job with a password via qsub -W pwd="" or the “SUBMIT-PASSWORD” entry
box in xpbs.
Furthermore, if the job is submitted without a password, do not use the native rcp com-
mand from within the job script, as it will generate the error: “unable to get user name”.
Instead, please use pbs_rcp.
There are many options to the qsub command. The table below gives a quick summary of
the available options; the rest of this chapter explains how to use each one.
The “-q destination” option to qsub allows you to specify a particular destination
to which you want the job submitted. The destination names a queue, a Server, or a queue
at a Server. The qsub command will submit the script to the Server defined by the desti-
nation argument. If the destination is a routing queue, the job may be routed by the Server
to a new destination. If the -q option is not specified, the qsub command will submit the
script to the default queue at the default Server. (See also the discussion of
PBS_DEFAULT in “Environment Variables” on page 22.) The destination specification
takes the following form:
-q [queue[@host]]
PBS, by default, always copies the standard output (stdout) and standard error (stderr)
files back to $PBS_O_WORKDIR on the submission host when a job finishes. When
qsub is run, it sets $PBS_O_WORKDIR to the current working directory where the qsub
command is executed.
The “-o path” and “-e path” options to qsub allows you to specify the name of the
64 Chapter 4
Submitting a PBS Job
files to which the stdout and the stderr file streams should be written. The path argument is
of the form: [hostname:]path_name where hostname is the name of a host to which
the file will be returned and path_name is the path name on that host. You may specify rel-
ative or absolute paths. If you specify only a file name, it is assumed to be relative to your
home directory. Do not use variables in the path. The following examples illustrate these
various options.
#PBS -o /u/user1/myOutputFile
#PBS -e /u/user1/myErrorFile
Note that if the PBS client commands are used on a Windows host, then special characters
like spaces, backslashes (\), and colons (:) can be used in command line arguments such as
for specifying pathnames, as well as drive letter specifications. The following are allowed:
qsub -o \temp\my_out job.scr
qsub -e "host:e:\Documents and Settings\user\Desktop\output"
The error output of the above job is to be copied onto the e: drive on host using the
path "\Documents and Settings\user\Desktop\output". The quote marks
are required when arguments to qsub contain spaces.
The “-V” option declares that all environment variables in the qsub command’s environ-
ment are to be exported to the batch job.
The “-v variable_list” option to qsub allows you to specify additional environ-
PBS Professional 9.1 65
User’s Guide
ment variables to be exported to the job. variable_list names environment variables from
the qsub command environment which are made available to the job when it executes.
The variable_list is a comma separated list of strings of the form variable or vari-
able=value. These variables and their values are passed to the job.
The “-m MailOptions” defines the set of conditions under which the execution server
will send a mail message about the job. The MailOptions argument is a string which con-
sists of either the single character “n”, or one or more of the characters “a”, “b”, and “e”.
If no email notification is specified, the default behavior will be the same as for “-m a” .
The “-M user_list” option declares the list of users to whom mail is sent by the exe-
cution server when it sends mail about the job. The user_list argument is of the form:
user[@host][,user[@host],...]
If unset, the list defaults to the submitting user at the qsub host, i.e. the job owner.
Important: PBS on Windows can only send email to addresses that specify
an actual hostname that accepts port 25 (sendmail) requests. For
the above example on Windows you will need to specify:
66 Chapter 4
Submitting a PBS Job
qsub -M [email protected]
The “-N name” option declares a name for the job. The name specified may be up to and
including 15 characters in length. It must consist of printable, non-whitespace characters
with the first character alphabetic, and contain no “special characters”. If the -N option is
not specified, the job name will be the base name of the job script file specified on the
command line. If no script file name was specified and the script was read from the stan-
dard input, then the job name will be set to STDIN.
The “-r y|n” option declares whether the job is rerunnable. To rerun a job is to termi-
nate the job and requeue it in the execution queue in which the job currently resides. The
value argument is a single character, either “y” or “n”. If the argument is “y”, the job is
rerunnable. If the argument is “n”, the job is not rerunnable. The default value is “y”,
rerunnable.
The “-S path_list” option declares the path and name of the scripting language to be
used in interpreting the job script. The option argument path_list is in the form:
path[@host][,path[@host],...] Only one path may be specified for any host
named, and only one path may be specified without the corresponding host name. The
path selected will be the one with the host name that matched the name of the execution
host. If no matching host is found, then the path specified without a host will be selected,
if present. If the -S option is not specified, the option argument is the null string, or no
entry from the path_list is selected, then PBS will use the user’s login shell on the execu-
PBS Professional 9.1 67
User’s Guide
tion host.
#PBS -S /bin/bash@mars,/usr/bin/bash@jupiter
...
The “-p priority” option defines the priority of the job. The priority argument must
be an integer between -1024 (lowest priority) and +1023 (highest priority) inclusive. The
default is no priority which is equivalent to a priority of zero.
This option allows the user to specify a priority for their jobs. However, this option is
dependant upon the local scheduling policy. By default the “sort jobs by job-priority” fea-
ture is disabled. If your local PBS administrator has enabled it, then all queued jobs will be
sorted based on the user-specified priority. (If you need an absolute ordering of your own
jobs, see “Specifying Job Dependencies” on page 131.)
The “-a date_time” option declares the time after which the job is eligible for execu-
tion. The date_time argument is in the form: [[[[CC]YY]MM]DD]hhmm[.SS] where
CC is the first two digits of the year (the century), YY is the second two digits of the year,
MM is the two digits for the month, DD is the day of the month, hh is the hour, mm is the
minute, and the optional SS is the seconds. If the month, MM, is not specified, it will
default to the current month if the specified day DD, is in the future. Otherwise, the month
will be set to next month. Likewise, if the day, DD, is not specified, it will default to today
if the time hhmm is in the future. Otherwise, the day will be set to tomorrow. For example,
if you submit a job at 11:15am with a time of “1110”, the job will be eligible to run at
68 Chapter 4
Submitting a PBS Job
11:10am tomorrow. Other examples include:
The “-h” option specifies that a user hold be applied to the job at submission time. The
job will be submitted, then placed in a hold state. The job will remain ineligible to run
until the hold is released. (For details on releasing a held job see “Holding and Releasing
Jobs” on page 120.)
The “-c interval” option defines the interval (in minutes) at which the job will be
checkpointed, if this capability is provided by the operating system (i.e. under SGI IRIX
and Cray Unicos). If the job executes upon a host which does not support checkpointing,
this option will be ignored. The interval argument is specified as:
n No checkpointing is to be performed.
If “-c” is not specified, the checkpoint attribute is set to the value “u”.
PBS Professional 9.1 69
User’s Guide
PBS requires that a user’s name be consistent across a server and its execution hosts, but
not across a submission host and a server. A user may have access to more than one
server, and may have a different username on each server. In this environment, if a user
wishes to submit a job to any of the available servers, the username for each server is
specified. The wildcard username will be used if the job ends up at yet another server not
specified, but only if that wildcard username is valid.
For example, our user is UserS on the submission host HostS, UserA on server ServerA,
and UserB on server ServerB, and is UserC everywhere else. Note that this user must be
UserA on all ExecutionA and UserB on all ExecutionB machines. Then our user can use
“qsub -u UserA@ServerA,UserB@ServerB,UserC” for the job. The job owner will
always be UserS.
The server’s flatuid attribute determines whether it assumes that identical usernames mean
identical users. If true, it assumes that if UserS exists on both the submission host and the
server host, then UserS can run jobs on that server. If not true, the server calls ruserok()
which uses /etc/hosts.equiv and .rhosts to authorize UserS to run as UserS.
Value of
Submission host username/server host username
flatuid
True Server assumes user has permis- Server checks whether UserS can
sion to run job run job as UserA
70 Chapter 4
Submitting a PBS Job
Table 5: UNIX User ID and flatuid
Value of
Submission host username/server host username
flatuid
Note that if different names are listed via the -u option, then they are checked regardless
of the value of flatuid.
Under Windows, if a user has a non-admin account, the server’s hosts.equiv file is used to
determine whether that user can run a job on a given server. For an admin account,
[PROFILE_PATH].\rhosts is used, and the server’s acl_roots attribute must be set to allow
job submissions. Usernames containing spaces are allowed as long as the username length
is no more than 15 characters, and the usernames are quoted when used in the command
line.
The “-W group_list=g_list” option defines the group name under which the job is
to run on the execution system. The g_list argument is of the form:
group[@host][,group[@host],...]
Only one group name may be given per specified host. Only one of the group specifica-
tions may be supplied without the corresponding host specification. That group name will
used for execution on any host not named in the argument list. If not set, the group_list
defaults to the primary group of the user under which the job will be run. Under Win-
dows, the primary group is the first group found for the user by PBS when querying the
accounts database.
The “-A account_string” option defines the account string associated with the job.
The account_string is an opaque string of characters and is not interpreted by the Server
which executes the job. This value is often used by sites to track usage by locally defined
account names.
The “-j join” option declares if the standard error stream of the job will be merged
with the standard output stream of the job. A join argument value of oe directs that the
two streams will be merged, intermixed, as standard output. A join argument value of eo
directs that the two streams will be merged, intermixed, as standard error. If the join argu-
72 Chapter 4
Submitting a PBS Job
ment is n or the option is not specified, the two streams will be two separate files.
The “-k keep” option defines which (if either) of standard output (STDOUT) or stan-
dard error (STDERR) of the job will be retained on the execution host. If set, this option
overrides the path name for the corresponding file. If not set, neither file is retained on the
execution host. The argument is either the single letter “e” or “o”, or the letters “e” and “o”
combined in either order. Or the argument is the letter “n”. If “-k” is not specified, neither
file is retained.
The “-z” option directs the qsub command to not write the job identifier assigned to the
PBS Professional 9.1 73
User’s Guide
job to the command’s standard output.
PBS provides a special kind of batch job called interactive-batch. An interactive-batch job
is treated just like a regular batch job (in that it is queued up, and has to wait for resources
to become available before it can run). Once it is started, however, the user's terminal
input and output are connected to the job in a matter similar to a login session. It
appears that the user is logged into one of the available execution machines, and the
resources requested by the job are reserved for that job. Many users find this useful for
debugging their applications or for computational steering. The “-I” option declares that
the job is an interactive-batch job.
If the -I option is specified on the command line, the job is an interactive job. If a script is
given, it will be processed for directives, but any executable commands will be discarded.
When the job begins execution, all input to the job is from the terminal session in which
qsub is running. The -I option is ignored in a script directive.
When an interactive job is submitted, the qsub command will not terminate when the job
is submitted. qsub will remain running until the job terminates, is aborted, or the user
interrupts qsub with a SIGINT (the control-C key). If qsub is interrupted prior to job
start, it will query if the user wishes to exit. If the user responds “yes”, qsub exits and the
job is aborted.
Once the interactive job has started execution, input to and output from the job pass
through qsub. Keyboard-generated interrupts are passed to the job. Lines entered that
begin with the tilde ('~') character and contain special sequences are interpreted by qsub
itself. The recognized special sequences are:
~susp If running under the UNIX C shell, suspends the qsub pro-
gram. “susp” is the suspend character, usually CNTL-Z.
~asusp If running under the UNIX C shell, suspends the input half of
qsub (terminal to job), but allows output to continue to be dis-
played. “asusp” is the auxiliary suspend character, usually con-
trol-Y.
A PBS job has the following attributes, which may be set by the various options to qsub
(for details see section 4.13 “Job Submission Options” on page 61).
block When true, specifies that qsub will wait for the job to com-
plete, and return the exit value of the job. Default: false. Set
via the -W block option to qsub. If qsub receives one of the
signals: SIGHUP, SIGINT, SIGQUIT or SIGTERM, it will
print the following message on stderr: qsub: wait for
job <jobid> interrupted by signal <signal>
Error_Path The final path name for the file containing the job’s standard
error stream. See the qsub and qalter command description
for more detail.
Execution_Time The time after which the job may execute. The time is main-
tained in seconds since Epoch. If this time has not yet been
PBS Professional 9.1 75
User’s Guide
reached, the job will not be scheduled for execution and the job
is said to be in wait state.
Hold_Types The set of holds currently applied to the job. If the set is not
null, the job will not be scheduled for execution and is said to
be in the hold state. Note, the hold state takes precedence over
the wait state.
Job_Name The name assigned to the job by the qsub or qalter com-
mand.
Join_Path If the Join_Path attribute is oe, then the job’s standard error
stream will be merged, inter-mixed, with the job’s standard out-
put stream and placed in the file determined by the
Output_Path attribute. The Error_Path attribute is main-
tained, but ignored. However, if the Join_Path attribute is
eo, then the job’s standard output stream will be merged, inter-
mixed, with the job’s standard error stream and placed in the
file determined by the Error_Path attribute, and the
Output_Path attribute will be ignored.
Mail_Points Identifies when the Server will send email about the job.
Mail_Users The set of users to whom mail may be sent when the job makes
certain state changes.
no_stdio_sockets
Flag to indicate whether a multi-host job should have the stan-
dard output and standard error streams of tasks running on other
76 Chapter 4
Submitting a PBS Job
hosts returned to mother superior via sockets. These sockets
may cause a job to be not checkpointable. Default: false (sock-
ets are created.)
Output_Path The final path name for the file containing the job’s standard
output stream. See the qsub and qalter command descrip-
tion for more detail.
Resource_List The resource list is a set of resources required by the job. The
value also establishes the limit of usage of that resource. If not
set, the value for a resource may be determined by a queue or
Server default established by the administrator.
Shell_Path_List
A set of absolute paths of the program to process the job’s script
file.
umask The initial umask of the job is set to the value of this
attribute when the job is created. This may be changed by
umask commands in the shell initialization files such as .profile
or .cshrc. Default value: 077
Variable_List This is the list of environment variables passed with the Queue
Job batch request.
comment An attribute for displaying comments about the job from the
system. Visible to any client. Under Windows, comments can
contain only ASCII characters.
The following attributes are read-only, they are established by the Server and are visible to
the user but cannot be set or changed by a user.
PBS Professional 9.1 77
User’s Guide
alt_id For a few systems, such as Irix 6.x running Array Services, the
session id is insufficient to track which processes belong to the
job. Where a different identifier is required, it is recorded in this
attribute. If set, it will also be recorded in the end-of-job
accounting record. For IRIX 6.x running Array Services, the
alt_id attribute is set to the Array Session Handle (ASH)
assigned to the job.
array_id string; applies to subjob; job array identifier for given subjob
array_indices_remaining
string; applies to job array; list of indices of subjobs still
queued. Range or list of ranges
array_indices_submitted
string; applies to job array; complete list of indices of subjobs
given at submission time. Given as a range.
array_state_count
string; applies to job array; lists number of subjobs in each state
etime The time that the job became eligible to run, i.e. in a queued
state while residing in an execution queue.
exec_host If the job is running, string set to the name of each vnode on
which the job is executing, along with the vnode-level, consum-
able resources allocated from that vnode.
Format:
”(vnode:ncpus=N:mem=M+vnode:ncpus=N:mem=M[+...])”,
where vnode is the name of a vnode, N is the number of CPUs
78 Chapter 4
Submitting a PBS Job
on that vnode allocated to the job and M is the amount of mem-
ory on that vnode allocated to the job. Other resources may
show up as well.
hashname The name used as a basename for various files, such as the job
file, script file, and the standard output and error of the job.
[This attribute is available only to the batch administrator.]
Job_Owner The login name on the submitting host of the user who submit-
ted the batch job.
mtime The time that the job was last modified, changed state, or
changed locations.
qtime The time that the job entered the current queue.
queue The name of the queue in which the job currently resides.
run_count The number of times the server has run the job. Format: inte-
ger.
schedselect This is set to the union of the "select" resource of the job and
the queue and server defaults for resources in a chunk. Visible
PBS Professional 9.1 79
User’s Guide
only to PBS Manager.
server The name of the server which is currently managing the job.
session_id If the job is running, this is set to the session id of the first
executing task.
stime The time when the job started execution. Set by the server.
Displayed in date/time format.
80 Chapter 4
Submitting a PBS Job
PBS Professional 9.1 81
User’s Guide
Chapter 5
Using the xpbs GUI
The PBS graphical user interface is called xpbs, and provides a user-friendly, point and
click interface to the PBS commands. xpbs utilizes the tcl/tk graphics tool suite, while
providing the user with most of the same functionality as the PBS CLI commands. In this
chapter we introduce xpbs, and show how to create a PBS job using xpbs.
If PBS is installed on your local workstation, or if you are running under Windows, you
can launch xpbs by double-clicking on the xpbs icon on the desktop. You can also start
xpbs from the command line with the following command.
UNIX: Windows:
Before running xpbs for the first time under UNIX, you may need to configure your work-
station for it. Depending on how PBS is installed at your site, you may need to allow
82 Chapter 5
Using the xpbs GUI
xpbs to be displayed on your workstation. However, if the PBS client commands are
installed locally on your workstation, you can skip this step. (Ask your PBS administrator
if you are unsure.)
The most secure method of running xpbs remotely and displaying it on your local XWin-
dows session is to redirect the XWindows traffic through ssh (secure shell), via setting
the "X11Forwarding yes" parameter in the sshd_config file. (Your local system
administrator can provide details on this process if needed.)
An alternative, but less secure, method is to direct your X-Windows session to permit the
xpbs client to connect to your local X-server. Do this by running the xhost command
with the name of the host from which you will be running xpbs, as shown in the example
below:
xhost + server.mydomain.com
Next, on the system from which you will be running xpbs, set your X-Windows DIS-
PLAY variable to your local workstation. For example, if using the C-shell:
However, if you are using the Bourne or Korn shell, type the following:
export DISPLAY=myWorkstation:0.0
The various panels, boxes, and regions (collectively called “widgets”) of xpbs and how
they are manipulated are described in the following sections. A listbox can be multi-select-
able (a number of entries can be selected/highlighted using a mouse click) or single-select-
able (one entry can be highlighted at a time).
An entry widget is brought into focus with a left-click. To manipulate this widget, simply
type in the text value. Use of arrow keys and mouse selection of text for deletion, over-
write, copying and pasting with sole use of mouse buttons are permitted. This widget has a
scrollbar for horizontally scanning a long text entry string.
A matrix of entry boxes is usually shown as several rows of entry widgets where a number
of entries (called fields) can be found per row. The matrix is accompanied by up/down
arrow buttons for paging through the rows of data, and each group of fields gets one
scrollbar for horizontally scanning long entry strings. Moving from field to field can be
done using the <Tab> (move forward), <Cntrl-f> (move forward), or <Cntrl-b> (move
backward) keys.
A spinbox is a combination of an entry widget and a horizontal scrollbar. The entry widget
will only accept values that fall within a defined list of valid values, and incrementing
through the valid values is done by clicking on the up/down arrows.
A button is a rectangular region appearing either raised or pressed that invokes an action
when clicked with the left mouse button. When the button appears pressed, then hitting the
<RETURN> key will automatically select the button.
A text region is an editor-like widget. This widget is brought into focus with a left-click.
To manipulate this widget, simply type in the text. Use of arrow keys, backspace/delete
key, mouse selection of text for deletion or overwrite, and copying and pasting with sole
use of mouse buttons are permitted. This widget has a scrollbar for vertically scanning a
long entry.
The Menu Bar is composed of a row of command buttons that signal some action with a
click of the left mouse button. The buttons are:
84 Chapter 5
Using the xpbs GUI
Manual Update forces an update of the information on hosts, queues, and jobs.
Auto Update sets an automatic update of information every user-specified
number of minutes.
Track Job for periodically checking for returned output files of jobs.
Preferences for setting parameters such as the list of Server host(s) to query.
Help contains some help information.
About gives general information about the xpbs GUI.
Close for exiting xpbs plus saving the current setup information.
.
PBS Professional 9.1 85
User’s Guide
5.3.2 xpbs Hosts Panel
The Hosts panel is composed of a leading horizontal HOSTS bar, a listbox, and a set of
command buttons. The HOSTS bar contains a minimize/maximize button, identified by a
dot or a rectangular image, for displaying or iconizing the Hosts region. The listbox dis-
plays information about favorite Server host(s), and each entry is meant to be selected via
a single left-click, shift-left-click for contiguous selection, or control-left-click for non-
contiguous selection.
To the right of the Hosts Panel are buttons that represent actions that can be performed on
selected host(s). Use of these buttons will be explained in detail below.
Important: Note that some buttons are only visible if xpbs is started with the
“-admin” option, which requires manager or operator privilege to
function.
The middle portion of the Hosts Panel has abbreviated column names indicating the infor-
mation being displayed, as the following table shows:
Heading Meaning
Heading Meaning
The Queues panel is composed of a leading horizontal QUEUES bar, a listbox, and a set of
command buttons. The QUEUES bar lists the hosts that are consulted when listing queues;
the bar also contains a minimize/maximize button for displaying or iconizing the Queues
panel. The listbox displays information about queues managed by the Server host(s)
selected from the Hosts panel; each listbox entry can be selected as described above for
the Hosts panel.
To the right of the Queues Panel area are buttons for actions that can be performed on
selected queue(s).
The middle portion of the Queues Panel has abbreviated column names indicating the
information being displayed, as the following table shows:
Table 9: xpbs Queue Column Headings
Heading Meaning
Heading Meaning
The Jobs panel is composed of a leading horizontal JOBS bar, a listbox, and a set of com-
mand buttons. The JOBS bar lists the queues that are consulted when listing jobs; the bar
also contains a minimize/maximize button for displaying or iconizing the Jobs region. The
listbox displays information about jobs that are found in the queue(s) selected from the
Queues listbox; each listbox entry can be selected as described above for the Hosts panel.
The region just above the Jobs listbox shows a collection of command buttons whose
labels describe criteria used for filtering the Jobs listbox contents. The list of jobs can be
selected according to the owner of jobs (Owners), job state (Job_States), name of the job
(Job_Name), type of hold placed on the job (Hold_Types), the account name associated
with the job (Account_Name), checkpoint attribute (Checkpoint), time the job is eligible
for queueing/execution (Queue_Time), resources requested by the job (Resources), prior-
ity attached to the job (Priority), and whether or not the job is rerunnable (Rerunnable).
The selection criteria can be modified by clicking on any of the appropriate command but-
tons to bring up a selection box. The criteria command buttons are accompanied by a
Select Jobs button, which when clicked, will update the contents of the Jobs listbox based
on the new selection criteria. Note that only jobs that meet all the selected criteria will be
displayed.
88 Chapter 5
Using the xpbs GUI
Finally, to the right of the Jobs panel are the following command buttons, for operating on
selected job(s):
The middle portion of the Jobs Panel has abbreviated column names indicating the infor-
mation being displayed, as the following table shows:
Heading Meaning
The Info panel shows the progress of the commands executed by xpbs. Any errors are
written to this area. The INFO panel also contains a minimize/maximize button for dis-
playing or iconizing the Info panel.
PBS Professional 9.1 89
User’s Guide
5.3.6 xpbs Keyboard Tips
There are a number of shortcuts and key sequences that can be used to speed up using
xpbs. These include:
The “Preferences” button is in the Menu Bar at the top of the main xpbs window. Click-
ing it will bring up a dialog box that allows you to customize the behavior of xpbs:
1. Define Server hosts to query
2. Select wait timeout in seconds
3. Specify xterm command (for interactive jobs, UNIX only)
4. Specify which rsh/ssh command to use
90 Chapter 5
Using the xpbs GUI
5.5 Relationship Between PBS and xpbs
xpbs is built on top of the PBS client commands, such that all the features of the com-
mand line interface are available through the GUI. Each “task” that you perform using
xpbs is converted into the necessary PBS command and then run.
* Indicates command button is visible only if xpbs is started with the “-admin” option.
PBS Professional 9.1 91
User’s Guide
5.6 How to Submit a Job Using xpbs
First, select a host from the HOSTS listbox in the main xpbs display to which you wish
to submit the job.
Next, click on the Submit button located next to the HOSTS panel. The Submit button
brings up the Submit Job Dialog box (see below) which is composed of four distinct
regions. The Job Script File region is at the upper left. The OPTIONS region containing
various widgets for setting job attributes is scattered all over the dialog box. The OTHER
OPTIONS is located just below the Job Script file region, and COMMAND BUTTONS
region is at the bottom.
92 Chapter 5
Using the xpbs GUI
The job script region is composed of a header box, the text box, FILE entry box, and two
buttons labeled load and save. If you have a script file containing PBS options and execut-
able lines, then type the name of the file on the FILE entry box, and then click on the load
button. Alternatively, you may click on the FILE button, which will display a File Selec-
tion browse window, from which you may point and click to select the file you wish to
open. The File Selection Dialog window is shown below. Clicking on the Select File but-
ton will load the file into xpbs, just as does the load button described above.
The various fields in the Submit window will get loaded with values found in the script
file. The script file text box will only be loaded with executable lines (non-PBS) found in
the script. The job script header box has a Prefix entry box that can be modified to specify
the PBS directive to look for when parsing a script file for PBS options.
If you don’t have a existing script file to load into xpbs, you can start typing the execut-
able lines of the job in the file text box.
Next, review the Destination listbox. This box shows the queues found in the host that you
selected. A special entry called “@host” refers to the default queue at the indicated host.
Select appropriately the destination queue for the job.
The resources specified in the “Resource List” section will be job-wide resources only. In
order to specify chunks or job placement, use a script.
PBS Professional 9.1 93
User’s Guide
To run an array job, use a script. You will not be able to query individual subjobs or the
whole job array using xpbs. Type the script into the “File: entry” box. Do not click the
“Load” button. Instead, use the “Submit” button.
Finally, review the optional settings to see if any should apply to this job.
For example:
o Use the one of the buttons in the “Output” region to merge out-
put and error files.
o Use “Stdout File Name” to define standard output file and to
redirect output
o Use the “Environment Variables to Export” subwindow to have
current environment variables exported to the job.
o Use the “Job Name” field in the OPTIONS subwindow to give
the job a name.
o Use the “Notify email address” and one of the buttons in the
OPTIONS subwindow to have PBS send you mail when the job
terminates.
Now that the script is built you have four options of what to do next:
Reset clears all the information from the submit job dialog box, allowing you to create a
job from a fresh start.
Use the FILE. field (in the upper left corner) to define a filename for the script. Then press
the Save button. This will cause a PBS script file to be generated and written to the named
file.
Pressing the Confirm Submit button at the bottom of the Submit window will submit the
PBS job to the selected destination. xpbs will display a small window containing the job
identifier returned for this job. Clicking OK on this window will cause it and the Submit
window to be removed from your screen.
94 Chapter 5
Using the xpbs GUI
On UNIX systems (not Windows) you can alternatively submit the job as an interactive-
batch job, by clicking the Interactive button at the bottom of the Submit Job window.
Doing so will cause an X-terminal window (xterm) to be launched, and within that win-
dow a PBS interactive-batch job submitted. The path for the xterm command can be set
via the preferences, as discussed above in section 5.4 “Setting xpbs Preferences” on page
89. For further details on usage, and restrictions, see “Interactive-batch Jobs” on page 73.)
Click on the Close button located in the Menu bar to leave xpbs. If any settings have
been changed, xpbs will bring up a dialog box asking for a confirmation in regards to
saving state information. The settings will be saved in the .xpbsrc configuration file, and
will be used the next time you run xpbs, as discussed in the following section.
Upon exit, the xpbs state may be written to the .xpbsrc file in the user’s home direc-
tory. (See also section 3.8.1 “Windows User's HOMEDIR” on page 20.) Information
saved includes: the selected host(s), queue(s), and job(s); the different jobs listing criteria;
the view states (i.e. minimized/maximized) of the Hosts, Queues, Jobs, and INFO regions;
and all settings in the Preferences section. In addition, there is a system-wide xpbs con-
figuration file, maintained by the PBS Administrator, which is used in the absence of a
user’s personal .xpbsrc file.
The resources that can be set in the xpbs configuration file, ˜/.xpbsrc, are:
Chapter 6
Checking Job / System Status
This chapter introduces several PBS commands useful for checking status of jobs, queues,
and PBS Servers. Examples for use are included, as are instructions on how to accomplish
the same task using the xpbs graphical interface.
The qstat command is used to the request the status of jobs, queues, and the PBS
Server. The requested status is written to standard output stream (usually the user’s termi-
nal). When requesting job status, any jobs for which the user does not have view privilege
are not displayed. For detailed usage information, see the qstat(1B) man page or the PBS
Professional External Reference Specification.
Executing the qstat command without any options displays job information in the
default format. (An alternative display format is also provided, and is discussed below.)
The default display includes the following information:
Job States
State Description
B Job arrays only: job array has started
E Job is exiting after having run
H Job is held. A job is put into a held state by the server or by a user or
administrator. A job stays in a held state until it is released by a user or
administrator.
Q Job is queued, eligible to run or be routed
R Job is running
S Job is suspended by server. A job is put into the suspended state when a
higher priority job needs the resources.
T Job is in transition (being moved to a new location)
U Job is suspended due to workstation becoming busy
W Job is waiting for its requested execution time to be reached, or the job’s
specified stagein request has failed for some reason.
X Subjobs only; subjob is finished (expired.)
qstat
Job id Name User Time Use S Queue
--------- ----------- ----------- -------- - -----
16.south aims14 user1 0 H workq
18.south aims14 user1 0 W workq
26.south airfoil barry 00:21:03 R workq
27.south airfoil barry 21:09:12 R workq
28.south myjob user1 0 Q workq
29.south tns3d susan 0 Q workq
30.south airfoil barry 0 Q workq
31.south seq_35_3 donald 0 Q workq
An alternative display (accessed via the “-a” option) is also provided that includes extra
information about jobs, including the following additional fields:
Session ID
Number of vnodes requested
Number of parallel tasks (or CPUs)
Requested amount of memory
Requested amount of wallclock time
Walltime or CPU time, whichever submitter specified, if job is run-
ning.
qstat -a
Req'd Elap
Job ID User Queue Jobname Ses NDS TSK Mem Time S Time
-------- ------ ----- ------- --- --- --- --- ---- - ----
16.south user1 workq aims14 -- -- 1 -- 0:01 H --
18.south user1 workq aims14 -- -- 1 -- 0:01 W --
51.south barry workq airfoil 930 -- 1 -- 0:13 R 0:01
52.south user1 workq myjob -- -- 1 -- 0:10 Q --
53.south susan workq tns3d -- -- 1 -- 0:20 Q --
54.south barry workq airfoil -- -- 1 -- 0:13 Q --
55.south donald workq seq_35_ -- -- 1 -- 2:00 Q --
Other options which utilize the alternative display are discussed in subsequent sections of
100 Chapter 6
Checking Job / System Status
this chapter.
When requesting queue or Server status qstat will output information about each desti-
nation. The various options to qstat take as an operand either a job identifier or a desti-
nation. If the operand is a job identifier, it must be in the following form:
sequence_number[.server_name][@server]
If the operand is a destination identifier, it takes one of the following three forms:
queue
@server
queue@server
If queue is specified, the request is for status of all jobs in that queue at the default
Server. If the @server form is given, the request is for status of all jobs at that Server. If
a full destination identifier, queue@server, is given, the request is for status of all jobs
in the named queue at the named server.
Important: If a PBS Server is not specified on the qstat command line,
the default Server will be used. (See discussion of
PBS_DEFAULT in “Environment Variables” on page 22.)
The “-B” option to qstat displays the status of the specified PBS Batch Server. One line
of output is generated for each Server queried. The three letter abbreviations correspond to
various job limits and counts as follows: Maximum, Total, Queued, Running, Held, Wait-
ing, Transiting, and Exiting. The last column gives the status of the Server itself: active,
idle, or scheduling.
PBS Professional 9.1 101
User’s Guide
qstat -B
Server Max Tot Que Run Hld Wat Trn Ext Status
----------- --- ---- ---- ---- ---- ---- ---- ---- ------
fast.domain 0 14 13 1 0 0 0 0 Active
When querying jobs, Servers, or queues, you can add the “-f” option to qstat to change
the display to the full or long display. For example, the Server status shown above would
be expanded using “-f” as shown below:
qstat -Bf
Server: fast.mydomain.com
server_state = Active
scheduling = True
total_jobs = 14
state_count = Transit:0 Queued:13 Held:0 Waiting:0
Running:1 Exiting:0
managers = [email protected]
default_queue = workq
log_events = 511
mail_from = adm
query_other_jobs = True
resources_available.mem = 64mb
resources_available.ncpus = 2
resources_default.ncpus = 1
resources_assigned.ncpus = 1
resources_assigned.nodect = 1
scheduler_iteration = 600
pbs_version = PBSPro_9.1.41640
The “-Q” option to qstat displays the status of all (or any specified) queues at the
(optionally specified) PBS Server. One line of output is generated for each queue queried.
The three letter abbreviations correspond to limits, queue states, and job counts as follows:
Maximum, Total, Enabled Status, Started Status, Queued, Running, Held, Waiting, Tran-
102 Chapter 6
Checking Job / System Status
siting, and Exiting. The last column gives the type of the queue: routing or execution.
qstat -Q
Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type
----- --- --- --- --- --- --- --- --- --- --- ---------
workq 0 10 yes yes 7 1 1 1 0 0 Execution
qstat -Qf
Queue: workq
queue_type = Execution
total_jobs = 10
state_count = Transit:0 Queued:7 Held:1 Waiting:1
Running:1 Exiting:0
resources_assigned.ncpus = 1
hasnodes = False
enabled = True
started = True
We saw above that the “-f” option could be used to display full or long information for
queues and Servers. The same applies to jobs. By specifying the “-f” option and a job
identifier, PBS will print all information known about the job (e.g. resources requested,
resource limits, owner, source, destination, queue, etc.) as shown in the following exam-
ple. (See “Job Attributes” on page 74 for a description of attribute.)
PBS Professional 9.1 103
User’s Guide
qstat -f 89
Job Id: 89.south
Job_Name = tns3d
Job_Owner = [email protected]
resources_used.cput = 00:00:00
resources_used.mem = 2700kb
resources_used.ncpus = 1
resources_used.vmem = 5500kb
resources_used.walltime = 00:00:00
job_state = R
queue = workq
server = south
Checkpoint = u
ctime = Thu Aug 23 10:11:09 2004
Error_Path = south:/u/susan/tns3d.e89
exec_host = south/0
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = a
mtime = Thu Aug 23 10:41:07 2004
Output_Path = south:/u/susan/tns3d.o89
Priority = 0
qtime = Thu Aug 23 10:11:09 2004
Rerunnable = True
Resource_List.mem = 300mb
Resource_List.ncpus = 1
Resource_List.walltime = 00:20:00
session_id = 2083
Variable_List = PBS_O_HOME=/u/susan,PBS_O_LANG=en_US,
PBS_O_LOGNAME=susan,PBS_O_PATH=/bin:/usr/bin,
PBS_O_SHELL=/bin/csh,PBS_O_HOST=south,
PBS_O_WORKDIR=/u/susan,PBS_O_SYSTEM=Linux,
PBS_O_QUEUE=workq
euser = susan
egroup = myegroup
queue_type = E
comment = Job run on host south - started at 10:41
etime = Thu Aug 23 10:11:09 2004
104 Chapter 6
Checking Job / System Status
6.1.6 List User-Specific Jobs
The “-u” option to qstat displays jobs owned by any of a list of user names specified.
The syntax of the list of users is:
user_name[@host][,user_name[@host],...]
Host names are not required, and may be “wild carded” on the left end, e.g. “*.mydo-
main.com”. user_name without a “@host” is equivalent to “user_name@*”, that
is at any host.
qstat -u user1
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ------ ----- ------- ---- --- --- --- ---- - ----
16.south user1 workq aims14 -- -- 1 -- 0:01 H --
18.south user1 workq aims14 -- -- 1 -- 0:01 W --
52.south user1 workq my_job -- -- 1 -- 0:10 Q --
qstat -u user1,barry
The “-r” option to qstat displays the status of all running jobs at the (optionally speci-
fied) PBS Server. Running jobs include those that are running and suspended. One line of
output is generated for each job reported, and the information is presented in the alterna-
tive display.
The “-i” option to qstat displays the status of all non-running jobs at the (optionally
specified) PBS Server. Non-running jobs include those that are queued, held, and waiting.
One line of output is generated for each job reported, and the information is presented in
the alternative display (see description above).
PBS Professional 9.1 105
User’s Guide
6.1.9 Display Size in Gigabytes
The “-G” option to qstat displays all jobs at the requested (or default) Server using the
alternative display, showing all size information in gigabytes (GB) rather than the default
of smallest displayable units. Note that if the size specified is less than 1 GB, then the
amount if rounded up to 1 GB.
The “-M” option to qstat displays all jobs at the requested (or default) Server using the
alternative display, showing all size information in megawords (MW) rather than the
default of smallest displayable units. A word is considered to be 8 bytes.
The “-n” option to qstat displays the hosts allocated to any running job at the (option-
ally specified) PBS Server, in addition to the other information presented in the alternative
display. The host information is printed immediately below the job (see job 51 in the
example below), and includes the host name and number of virtual processors assigned to
the job (i.e. “south/0”, where “south” is the host name, followed by the virtual pro-
cessor(s) assigned.). A text string of “--” is printed for non-running jobs. Notice the differ-
ences between the queued and running jobs in the example below:
qstat -n
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ------ ----- ------- ---- --- --- --- ---- - ----
16.south user1 workq aims14 -- -- 1 -- 0:01 H --
--
18.south user1 workq aims14 -- -- 1 -- 0:01 W --
--
51.south barry workq airfoil 930 -- 1 -- 0:13 R 0:01
south/0
52.south user1 workq my_job -- -- 1 -- 0:10 Q --
--
106 Chapter 6
Checking Job / System Status
6.1.12 Display Job Comments
The “-s” option to qstat displays the job comments, in addition to the other informa-
tion presented in the alternative display. The job comment is printed immediately below
the job. By default the job comment is updated by the Scheduler with the reason why a
given job is not running, or when the job began executing. A text string of “--” is printed
for jobs whose comment has not yet been set. The example below illustrates the different
type of messages that may be displayed:
qstat -s
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ----- ----- ------- ---- --- --- --- ---- - ----
16.south user1 workq aims14 -- -- 1 -- 0:01 H --
Job held by user1 on Wed Aug 22 13:06:11 2004
18.south user1 workq aims14 -- -- 1 -- 0:01 W --
Waiting on user requested start time
51.south barry workq airfoil 930 -- 1 -- 0:13 R 0:01
Job run on host south - started Thu Aug 23 at 10:56
52.south user1 workq my_job -- -- 1 -- 0:10 Q --
Not Running: No available resources on nodes
57.south susan workq solver -- -- 2 -- 0:20 Q --
--
The “-q” option to qstat displays any limits set on the requested (or default) queues.
Since PBS is shipped with no queue limits set, any visible limits will be site-specific. The
limits are listed in the format shown below.
qstat -q
server: south
The “-t” option to qstat will show the state of a job, a job array object, and all non-X sub-
jobs. In combination with “-J”, qstat will show only the state of subjobs.
PBS Professional 9.1 107
User’s Guide
6.1.15 Show state of Job Arrays
The “-J” option to qstat will show only the state of job arrays. In combination with “-t”,
qstat will show only the state of subjobs.
The “-p” option to qstat prints the default display, with a column for Percentage Com-
pleted. For a job array, this is the number of subjobs completed and deleted, divided by
the total number of subjobs.
If your site is using peer scheduling, your job may be moved to a server that is not your
default server. When that happens, you will need to give the job ID as an argument to
qstat. If you use only “qstat”, your job will not appear to exist. For example: you submit
a job to ServerA, and it returns the jobid as “123.ServerA”. Then 123.ServerA is moved
to ServerB. In this case, use
qstat 123
or
qstat 123.ServerA
to get information about your job. ServerA will query ServerB for the information. To list
all jobs at ServerB, you can use:
qstat @ServerB
If you use “qstat” without the job ID, the job will not appear to exist.
The exec_vnode attribute displayed via qstat shows the allocated resources on each
vnode.
exec_vnode = hostA:ncpus=1
-l select=2:ncpus=1:mem=1gb+1:ncpus=4:mem=2gb
108 Chapter 6
Checking Job / System Status
would get an exec_vnode of
exec_vnode =
(VNA:ncpus=1:mem=1gb)+(VNB:ncpus=1:mem=1gb)
+(VNC:ncpus=4:mem=2gb)
Note that the vnodes and resources required to satisfy a chunk are grouped by parentheses.
In the example above, if two vnodes on a single host were required to satisfy the last
chunk, the exec_vnode might be:
exec_vnode =(VNA:ncpus=1:mem=1gb)+(VNB:ncpus=1:mem=1gb)
+(VNC1:ncpus=2:mem=1gb+VNC2:ncpus =2:mem=1gb)
The main display of xpbs shows a brief listing of all selected Servers, all queues on those
Servers, and any jobs in those queues that match the selection criteria (discussed below).
Servers are listed in the HOST panel near the top of the display.
To view detailed information about a given Server (i.e. similar to that produced by
“qstat -fB”) select the Server in question, then click the “Detail” button. Likewise, for
details on a given queue (i.e. similar to that produced by “qstat -fQ”) select the queue
in question, then click its corresponding “Detail” button. The same applies for jobs as well
(i.e. “qstat -f”). You can view detailed information on any displayed job by selecting
it, and then clicking on the “Detail” button. Note that the list of jobs displayed will be
dependent upon the Selection Criteria currently selected. This is discussed in the xpbs
portion of the next section.
The qselect command provides a method to list the job identifier of those jobs, job
arrays or subjobs which meet a list of selection criteria. Jobs are selected from those
owned by a single Server. When qselect successfully completes, it will have written to
standard output a list of zero or more job identifiers which meet the criteria specified by
the options. Each option acts as a filter restricting the number of jobs which might be
listed. With no options, the qselect command will list all jobs at the Server which the
user is authorized to list (query status of). The -u option may be used to limit the selection
to jobs owned by this user or other specified users.
PBS Professional 9.1 109
User’s Guide
When an option is specified with a optional op component to the option argument, then
op specifies a relation between the value of a certain job attribute and the value compo-
nent of the option argument. If an op is allowable on an option, then the description of the
option letter will indicate that op is allowable. The only acceptable strings for the op
component, and the relation the string indicates, are shown in the following list:
.eq. The value represented by the attribute of the job is equal to the
value represented by the option argument.
.ne. The value represented by the attribute of the job is not equal to
the value represented by the option argument.
.ge. The value represented by the attribute of the job is greater than
or equal to the value represented by the option argument.
.gt. The value represented by the attribute of the job is greater than
the value represented by the option argument.
.le. The value represented by the attribute of the job is less than or
equal to the value represented by the option argument.
.lt. The value represented by the attribute of the job is less than the
value represented by the option argument.
[[CC]YY]MMDDhhmm[.SS]
where the MM is the two digits for the month, DD is the day of
the month, hh is the hour, mm is the minute, and the optional SS
is the seconds. CC is the century and YY the year. If op is not
specified, jobs will be selected for which the Execution_Time
and date_time values are equal.
-h hold_list Restricts the selection of jobs to those with a specific set of hold
types. Only those jobs will be selected whose Hold_Types
attribute exactly match the value of the hold_list argument. The
hold_list argument is a string consisting of one or more occur-
rences the single letter n, or one or more of the letters u, o, p, or
s in any combination. If letters are duplicated, they are treated
as if they occurred once. The letters represent the hold types:
Letter Meaning
n none
u user
o operator
p bad password (Windows only)
s system
resource_nameopvalue[,resource_nameopval,...]
PBS Professional 9.1 111
User’s Guide
queue
@server
queue@server
-s states Restricts job selection to those in the specified states. The states
argument is a character string which consists of any combina-
tion of the characters: B,E, H, Q, R, S, T, U, W and X. The
characters in the states argument list states shown in the table
titled “Job States” on page 98.
user_name[@host][,user_name[@host],...]
Host names may be wild carded on the left end, e.g. “*.mydo-
main.com”. User_name without a “@host” is equivalent to
“user_name@*”, i.e. at any host. Jobs will be selected which
are owned by the listed users at the corresponding hosts.
For example, say you want to list all jobs owned by user “barry” that requested more than
16 CPUs. You could use the following qselect command syntax:
Notice that what is returned is the job identifiers of jobs that match the selection criteria.
This may or may not be enough information for your purposes. Many users will use shell
syntax to pass the list of job identifiers directly into qstat for viewing purposes, as
shown in the next example (necessarily different between UNIX and Windows).
UNIX:
qstat -a ‘ qselect -u barry -l ncpus.gt.16 ‘
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ----- ----- ------- ---- --- --- --- ---- - ----
121.south barry workq airfoil -- -- 32 -- 0:01 H --
133.south barry workq trialx -- -- 20 -- 0:01 W --
154.south barry workq airfoil 930 -- 32 -- 1:30 R 0:32
PBS Professional 9.1 113
User’s Guide
Windows (type the following at the cmd prompt, all on one line):
for /F "usebackq" %j in (`qselect -u barry -l ncpus.gt.16`) do
( qstat -a %j )
121.south
133.south
154.south
Note: This technique of using the output of the qselect command as input to qstat
can also be used to supply input to other PBS commands as well.
The xpbs command provides a graphical means of specifying job selection criteria, offer-
ing the flexibility of the qselect command in a point and click interface. Above the
JOBS panel in the main xpbs display is the Other Criteria button. Clicking it will bring
up a menu that lets you choose and select any job selection criteria you wish.
The example below shows a user clicking on the Other Criteria button, then selecting Job
States, to reveal that all job states are currently selected. Clicking on any of these job
states would remove that state from the selection criteria.
114 Chapter 6
Checking Job / System Status
You may specify as many or as few selection criteria as you wish. When you have com-
pleted your selection, click on the Select Jobs button above the HOSTS panel to have
xpbs refresh the display with the jobs that match your selection criteria. The selected cri-
teria will remain in effect until you change them again. If you exit xpbs, you will be
prompted if you wish to save your configuration information; this includes the job selec-
tion criteria.
The xpbs command includes a feature that allows you to track the progress of your jobs.
When you enable the Track Job feature, xpbs will monitor your jobs, looking for the out-
put files that signal completion of the job. The Track Job button will flash red on the
xpbs main display, and if you then click it, xpbs will display a list of all completed jobs
(that you were previously tracking). Selecting one of those jobs will launch a window con-
taining the standard output and standard error files associated with the job.
From this window you can name the users whose jobs you wish to monitor. You also need
to specify where you expect the output files to be: either local or remote (e.g. will the files
be retained on the Server host, or did you request them to be delivered to another host?).
Next, click the start/reset tracking button and then the close window button. Note that you
can disable job tracking at any time by clicking the Track Job button on the main xpbs
display, and then clicking the stop tracking button.
116 Chapter 6
Checking Job / System Status
PBS Professional 9.1 117
User’s Guide
Chapter 7
Working With PBS Jobs
This chapter introduces the reader to various commands useful in working with PBS jobs.
Covered topics include: modifying job attributes, holding and releasing jobs, sending mes-
sages to jobs, changing order of jobs within a queue, sending signals to jobs, and deleting
jobs. In each section below, the command line method for accomplishing a particular task
is presented first, followed by the xpbs method.
Most attributes can be changed by the owner of the job (or a manager or operator) while
the job is still queued. However, once a job begins execution, the only resources that can
be modified are cputime and walltime. These can only be reduced.
When the qalter "-l" option is used to alter the resource list of a queued job, it is important
to understand the interactions between altering the select directive and job limits.
If the job was submitted with an explicit "-l select=", then vnode-level resources must be
qaltered using the "-l select=" form. In this case a vnode level resource RES cannot be
qaltered with the "-l RES" form.
For example:
Submit the job:
118 Chapter 7
Working With PBS Jobs
% qsub -l select=1:ncpus=2:mem=512mb jobscript
Job’s ID is 230
If the selection directive is altered, the job limits for any consumable resource in the direc-
tive are also modified.
select=3:ncpus=2:mem=6gb
However, if the job-wide limit is modified, the corresponding resources in the selection
directive are not modified. It would be impossible to determine where to apply the
changes in a compound directive.
Reducing a job-wide limit to a new value less than the sum of the resource in the directive
is strongly discouraged. This may produce a situation where the job is aborted during exe-
cution for exceeding its limits. The actual effect of such a modification is not specified.
PBS Professional 9.1 119
User’s Guide
If a job is queued, requested modifications must still fit within the queue's and server's job
resource limits. If a requested modification to a resource would exceed the queue's or
server's job resource limits, the resource request will be rejected.
Resources are modified by using the -l option, either in chunks inside of selection state-
ments, or in job-wide modifications using resource_name=value pairs. The selec-
tion statement is of the form:
-l select=[N:]chunk[+[N:]chunk ...]
where N specifies how many of that chunk, and a chunk is of the form:
resource_name=value[:resource_name=value ...]
-l resource_name=value[,resource_name=value ...]
-l place=modifier[:modifier]
The following examples illustrate how to use the qalter command. First we list all the
jobs of a particular user. Then we modify two attributes as shown (increasing the wall-
120 Chapter 7
Working With PBS Jobs
clock time from 20 to 25 minutes, and changing the job name from “airfoil” to “engine”):
qstat -u barry
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ------ ----- ------- ---- --- --- --- ---- - ----
51.south barry workq airfoil 930 -- 1 -- 0:16 R 0:01
54.south barry workq airfoil -- -- 1 -- 0:20 Q --
qstat -a 54
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ------ ----- ------- ---- --- --- --- ---- - ----
54.south barry workq engine -- -- 1 -- 0:25 Q --
To alter a job attribute via xpbs, first select the job(s) of interest, and the click on modify
button. Doing so will bring up the Modify Job Attributes dialog box. From this window
you may set the new values for any attribute you are permitted to change. Then click on
the confirm modify button at the lower left of the window.
The qalter command can be used on job arrays, but not on subjobs or ranges of subjobs.
When used with job arrays, any job array identifiers must be enclosed in double quotes,
e.g.:
qalter -l walltime=25:00 “1234[].south”
PBS provides a pair of commands to hold and release jobs. To hold a job is to mark it as
ineligible to run until the hold on the job is “released”.
The qhold command requests that a Server place one or more holds on a job. A job that
has a hold is not eligible for execution. There are three types of holds: user, operator, and
system. A user may place a user hold upon any job the user owns. An “operator”, who is a
user with “operator privilege”, may place either an user or an operator hold on any job.
The PBS Manager may place any hold on any job. The usage syntax of the qhold com-
mand is:
PBS Professional 9.1 121
User’s Guide
Note that for a job array the job_identifier must be enclosed in double quotes.
The hold_list defines the type of holds to be placed on the job. The hold_list
argument is a string consisting of one or more of the letters u, p, o, or s in any combina-
tion, or the letter n. The hold type associated with each letter is:
Letter Meaning
If no -h option is given, the user hold will be applied to the jobs described by the
job_identifier operand list. If the job identified by job_identifier is in the
queued, held, or waiting states, then all that occurs is that the hold type is added to the job.
The job is then placed into held state if it resides in an execution queue.
If the job is running, then the following additional action is taken to interrupt the execu-
tion of the job. If checkpoint/restart is supported by the host system, requesting a hold on a
running job will cause (1) the job to be checkpointed, (2) the resources assigned to the job
to be released, and (3) the job to be placed in the held state in the execution queue. If
checkpoint / restart is not supported, qhold will only set the requested hold attribute.
This will have no effect unless the job is requeued with the qrerun command.
The qhold command can be used on job arrays, but not on subjobs or ranges of subjobs.
On job arrays, the qhold command can be applied only in the ‘Q’, ‘B’ or ‘W’ states. This
will put the job array in the ‘H’, held, state. If any subjobs are running, they will run to
completion. Job arrays cannot be moved in the ‘H’ state if any subjobs are running.
Checkpointing is not supported for job arrays. Even on systems that support checkpoint-
ing, no subjobs will be checkpointed -- they will run to completion.
Similarly, the qrls command releases a hold on a job. However, the user executing the
122 Chapter 7
Working With PBS Jobs
qrls command must have the necessary privilege to release a given hold. The same rules
apply for releasing a hold as exist for setting a hold.
The qrls command can only be used with job array objects, not with subjobs or ranges.
The job array will be returned to its pre-hold state, which can be either ‘Q’, ‘B’, or ‘W’.
The following examples illustrate how to use both the qhold and qrls commands.
Notice that the state (“S”) column shows how the state of the job changes with the use of
these two commands.
qstat -a 54
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ------ ----- ------- ---- --- --- --- ---- - ----
54.south barry workq engine -- -- 1 -- 0:20 Q --
qhold 54
qstat -a 54
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ------ ----- ------- ---- --- --- --- ---- - ----
54.south barry workq engine -- -- 1 -- 0:20 H --
qrls -h u 54
qstat -a 54
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ------ ----- ------- ---- --- --- --- ---- - ----
54.south barry workq engine -- -- 1 -- 0:20 Q --
If you attempted to release a hold on a job which is not on hold, the request will be
ignored. If you use the qrls command to release a hold on a job that had been previously
running, and subsequently checkpointed, the hold will be released, and the job will return
to the queued (Q) state (and be eligible to be scheduled to run when resources come avail-
able).
PBS Professional 9.1 123
User’s Guide
To hold (or release) a job using xpbs, first select the job(s) of interest, then click the hold
(or release) button.
PBS provides the qdel command for deleting jobs from the system. The qdel command
deletes jobs in the order in which their job identifiers are presented to the command. A job
that has been deleted is no longer subject to management by PBS. A batch job may be
deleted by its owner, a PBS operator, or a PBS administrator.
Example:
qdel 51
qdel 1234[].server
Mail is sent for each job deleted unless you specify otherwise. Use the following option to
qdel to prevent more email than you want from being sent:
-Wsuppress_email=<N>
N must be a non-negative integer. Make N the largest number of emails you wish to
receive. PBS will send one email for each deleted job, up to N. Note that a job array is
one job, so deleting a job array results in one email being sent.
To delete a job using xpbs, first select the job(s) of interest, then click the delete button.
To send a message to a job is to write a message string into one or more output files of the
job. Typically this is done to leave an informative message in the output of the job. Such
messages can be written using the qmsg command.
The -E option writes the message into the error file of the specified job(s). The -O option
writes the message into the output file of the specified job(s). If neither option is specified,
the message will be written to the error file of the job.
The first operand, message_string, is the message to be written. If the string contains
blanks, the string must be quoted. If the final character of the string is not a newline, a
newline character will be added when written to the job’s file. All remaining operands are
job_identifiers which specify the jobs to receive the message string. For example:
To send a message to a job using xpbs, first select the job(s) of interest, then click the
msg button. Doing so will launch the Send Message to Job dialog box. From this window,
you may enter the message you wish to send and indicate whether it should be written to
the standard output or the standard error file of the job. Click the Send Message button to
complete the process.
The qsig command requests that a signal be sent to executing PBS jobs. The signal is
sent to the session leader of the job. Usage syntax of the qsig command is:
Two special signal names, “suspend” and “resume”, (note, all lower case), are used to sus-
pend and resume jobs. When suspended, a job continues to occupy system resources but is
not executing and is not charged for walltime. Manager or operator privilege is required to
suspend or resume a job.
The three examples below all send a signal 9 (SIGKILL) to job 34:
qsig -s SIGKILL 34
qsig -s KILL 34
qsig -s 9 34
To send a signal to a job using xpbs, first select the job(s) of interest, then click the signal
button. Doing so will launch the Signal Running Job dialog box.
From this window, you may click on any of the common signals, or you may enter the sig-
nal number or signal name you wish to send to the job. Click the Signal button to complete
the process.
PBS provides the qorder command to change the order of two jobs, within or across
queues. To order two jobs is to exchange the jobs’ positions in the queue or queues in
which the jobs reside. If job1 is at position 3 in queue A and job2 is at position 4 in queue
B, qordering them will result in job1 being in position 4 in queue B and job2 being in posi-
tion 3 in queue A. The two jobs must be located at the same Server, and both jobs must be
126 Chapter 7
Working With PBS Jobs
owned by the user. No attribute of the job (such as priority) is changed. The impact of
changing the order within the queue(s) is dependent on local job scheduling policy; con-
tact your systems administrator for details.
qstat -u bob
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ------ ----- ------- ---- --- --- --- ---- - ----
54.south bob workq twinkie -- -- 1 -- 0:20 Q --
63[].south bob workq airfoil -- -- 1 -- 0:13 Q --
qorder 54 “63[]”
qstat -u bob
Req'd Elap
Job ID User Queue Jobname Sess NDS TSK Mem Time S Time
-------- ------ ----- ------- ---- --- --- --- ---- - ----
63[].south bob workq airfoil -- -- 1 -- 0:13 Q --
54.south bob workq twinkie -- -- 1 -- 0:20 Q --
To change the order of two jobs using xpbs, select the two jobs, and then click the order
button.
The qorder command can only be used with job array objects, not on subjobs or ranges.
This will change the queue order of the job array in association with other jobs or job
arrays in the queue.
PBS provides the qmove command to move jobs between different queues (even queues
on different Servers). To move a job is to remove the job from the queue in which it
resides and instantiate the job in another queue.
PBS Professional 9.1 127
User’s Guide
queue
@server
queue@server
If the destination operand describes only a queue, then qmove will move jobs into
the queue of the specified name at the job’s current Server. If the destination operand
describes only a Server, then qmove will move jobs into the default queue at that Server.
If the destination operand describes both a queue and a Server, then qmove will
move the jobs into the specified queue at the specified Server. All following operands are
job_identifiers which specify the jobs to be moved to the new destination.
To move jobs between queues or between Servers using xpbs, select the job(s) of inter-
est, and then click the move button. Doing so will launch the Move Job dialog box from
which you can select the queue and/or Server to which you want the job(s) moved.
The qmove command can only be used with job array objects, not with subjobs or ranges.
Job arrays can only be moved from one server to another if they are in the ‘Q’, ‘H’, or ‘W’
states, and only if there are no running subjobs. The state of the job array object is pre-
served in the move. The job array will run to completion on the new server.
As with jobs, a qstat on the server from which the job array was moved will not show the
job array. A qstat on the job array object will be redirected to the new server.
Note: The subjob accounting records will be split between the two servers.
The pbs_rsub command can be used to convert a normal job into a reservation job that
128 Chapter 7
Working With PBS Jobs
will run as soon as possible. PBS creates a reservation queue and a reservation, and
moves the job into the queue. Other jobs can also be moved into that queue via
qmove(1B) or submitted to that queue via qsub(1B).
The format for converting a normal job into a reservation job is:
Example:
pbs_rsub -W qmove=54
pbs_rsub -W qmove=”1234[].server”
The -R and -E options to pbs_rsub are disabled when using the -W qmove option.
For more information, see “Advance Reservation of Resources” on page 138, and the
pbs_rsub(1B), qsub(1B) and qmove(1B) manual pages.
A job’s default walltime is 5 years. Therefore an ASAP reservation’s start time can be in 5
years, if all the jobs in the system have the default walltime.
PBS Professional 9.1 129
User’s Guide
Chapter 8
Advanced PBS Features
This chapter covers the less commonly used commands and more complex topics which
will add substantial functionality to your use of PBS. The reader is advised to read chap-
ters 5 - 7 of this manual first.
On UNIX systems, the exit status of a job is normally the exit status of the shell executing
the job script. If a user is using csh and has a .logout file in the home directory, the
exit status of csh becomes the exit status of the last command in .logout. This may
impact the use of job dependencies which depend on the job’s exit status. To preserve the
job’s exit status, the user may either remove .logout or edit it as shown in this example:
Doing so will ensure that the exit status of the job persists across the invocation of the
.logout file.
The exit status of a job array is determined by the status of each of the completed subjobs.
130 Chapter 8
Advanced PBS Features
It is only available when all valid subjobs have completed. The individual exit status of a
completed subjob is passed to the epilogue, and is available in the ‘E’ accounting log
record of that subjob. See “Job Array Exit Status” on page 167.
The “-W umask=nnn” option to qsub allows you to specify, on UNIX systems, what
umask PBS should use when creating and/or copying your stdout and stderr files,
and any other files you direct PBS to transfer on your behalf.
The following example illustrates how to set your umask to 022 (i.e. to have files created
with write permission for owner only: -rw-r--r-- ).
The “-W block=true” option to qsub allows you to specify that you want qsub to
wait for the job to complete (i.e. “block”) and report the exit value of the job. If job sub-
mission fails, no special processing will take place. If the job is successfully submitted,
qsub will block until the job terminates or an error occurs.
If qsub receives one of the signals: SIGHUP, SIGINT, or SIGTERM, it will print a mes-
sage and then exit with the exit status 2. If the job is deleted before running to completion,
or an internal PBS error occurs, an error message describing the situation will be printed
to this error stream and qsub will exit with an exit status of 3. Signals SIGQUIT and
SIGKILL are not trapped and thus will immediately terminate the qsub process, leaving
the associated job either running or queued. If the job runs to completion, qsub will exit
with the exit status of the job. (See also section 8.1 “UNIX Job Exit Status” on page 129
for further discussion of the job exit status.)
For job arrays, blocking qsub waits until the entire job array is complete, then returns the
exit status of the job array.
PBS Professional 9.1 131
User’s Guide
8.4 Specifying Job Dependencies
PBS allows you to specify dependencies between two or more jobs. Dependencies are use-
ful for a variety of tasks, such as:
type:arg_list[,type:arg_list ...]
where except for the on type, the arg_list is one or more PBS job IDs in the form:
jobid[:jobid ...]
after:arg_list
This job may be scheduled for execution at any point after all jobs
in arg_list have started execution.
afterok:arg_list
This job may be scheduled for execution only after all jobs in
arg_list have terminated with no errors. See "Warning about exit
status with csh" in EXIT STATUS.
afternotok:arg_list
This job may be scheduled for execution only after all jobs in
arg_list have terminated with errors. See "Warning about exit status
with csh" in EXIT STATUS.
afterany:arg_list
132 Chapter 8
Advanced PBS Features
This job may be scheduled for execution after all jobs in
arg_list have finished execution, with or without errors.
before:arg_list
Jobs in arg_list may begin execution once this job has begun
execution.
beforeok:arg_list
beforenotok:arg_list
beforeany:arg_list
on:count
Job IDs in the arg_list of before types must have been submitted with a type of on.
To use the before types, the user must have the authority to alter the jobs in arg_list.
Otherwise, the dependency is rejected and the new job aborted.
Error processing of the existence, state, or condition of the job on which the newly submit-
ted job is a deferred service, i.e. the check is performed after the job is queued. If an error
is detected, the new job will be deleted by the server. Mail will be sent to the job submitter
stating the error.
PBS Professional 9.1 133
User’s Guide
Suppose you have three jobs (job1, job2, and job3) and you want job3 to start after job1
and job2 have ended. The first example below illustrates the options you would use on the
qsub command line to specify these job dependencies.
qsub job1
16394.jupiter
qsub job2
16395.jupiter
qsub -W depend=afterany:16394:16395 job3
16396.jupiter
As another example, suppose instead you want job2 to start only if job1 ends with no
errors (i.e. it exits with a no error status):
qsub job1
16397.jupiter
qsub -W depend=afterok:16397 job2
16396.jupiter
Similarly, you can use before dependencies, as the following example exhibits. Note
that unlike after dependencies, before dependencies require the use of the on depen-
dency.
To transfer output files or to transfer staged-in or staged-out files to/from a remote destina-
134 Chapter 8
Advanced PBS Features
tion, PBS uses either rcp or scp depending on the configuration options. The version of
rcp used by PBS always exits with a non-zero exit status for any error. Thus MOM
knows if the file was delivered or not. The secure copy program, scp, is also based on this
version of rcp and exits with the proper exit status.
If using rcp, the copy of output or staged files can fail for (at least) two reasons.
If using Secure Copy (scp), then PBS will first try to deliver output or stagein/out files
using scp. If scp fails, PBS will try again using rcp (assuming that scp might not exist
on the remote host). If rcp also fails, the above cycle will be repeated after a delay, in
case the problem is caused by a temporary network problem. All failures are logged in
MOM’s log, and an email containing the errors is sent to the job owner.
For delivery of output files on the local host, PBS uses the cp command (UNIX) or the
xcopy command (Windows). Local and remote delivery of output may fail for the fol-
lowing additional reasons:
File staging is a way to specify which files should be copied onto the execution host
before the job starts, and which should be copied off the execution host when it completes.
The “-W stagein=file_list” and “-W stageout=file_list” options to
qsub specify which files are staged (copied) in before the job starts or staged out after the
job completes execution. On completion of the job, all staged-in and staged-out files are
removed from the execution system. The file_list is in the form:
local_file@hostname:remote_file[,...]
regardless of the direction of the copy. Note that the ‘@’ character is used for separating
PBS Professional 9.1 135
User’s Guide
the local specification from the remote specification. The name local_file is the name of
the file on the system where the job executes. It may be an absolute path or relative to the
home directory of the user. The name remote_file is the destination name on the host spec-
ified by hostname. The name may be absolute or relative to the user’s home directory on
the destination host. Thus for stagein, the direction of travel is:
local_file remote_host:remote_file
local_file remote_host:remote_file
Note that all relative paths are relative to the user’s home directory on the respective hosts.
The following example shows how to stagein a file named grid.dat located in the
directory /u/user1 of the computer called server. The staged-in file is requested to
be placed relative to the users home directory under the name of dat1. (Note that the
example uses UNIX-style path separators “/”.)
#PBS -W stagein=dat1@server:/u/user1/grid.dat
...
Note that under Windows special characters such as spaces, backslashes (\), colons (:), and
drive letter specifications are valid pathnames. For example, the following will stagein the
grid.dat file at hostB to a local file (“dat1”) on drive C.:
qsub -W stagein=C:\temp\dat1@hostB:grid.dat
In Windows the stagein and stageout string must be contained in double quotes when
using ^array_index^.
Example of a stagein:
qsub -W stagein="foo.^array_index^
@host-3:C:\WINNT\Temp\foo.^array_index^"
-J 1-5 stage_script
Example of a stageout :
136 Chapter 8
Advanced PBS Features
qsub -W stageout="C:\WINNT\Temp\foo.^array_index^
@vmwhost-3:Q:\pbsuser31\foo.^array_index^_out"
-J 1-5 stage_script
PBS uses rcp or scp (or cp if the remote host is the local host) to perform the transfer.
Hence, stagein and stageout are just:
As with rcp, the remote_file and local_file portions for both stagein and stage-
out may name a directory. For stagein, if remote_file is a directory, then
local_file must also be a directory. Likewise, for stage out, if local_file is a
directory, then remote_file must be a directory. If local_file on a stageout direc-
tive is a directory, that directory on the execution host, including all files and subdirecto-
ries, will be copied. At the end of the job, the directory, including all files and
subdirectories, will be deleted. Users should be aware that this may create a problem if
multiple jobs are using the same directory. The same requirements and hints discussed
above in regard to delivery of output apply to staging files in and out. Wildcards should
not be used in either the local_file or the remote_file name. PBS does not
expand the wildcard character on the local system. If wildcards are used in the
remote_file name, since rcp is launched by rsh to the remote system, the expansion
will occur. However, at job end, PBS will attempt to delete the file whose name actually
contains the wildcard character and will fail to find it. This will leave all the staged-in files
in place (undeleted).
File staging is supported for job arrays. See “File Staging” on page 156.
Using xpbs to set up file staging directives may be easier than using the command line.
On the Submit Job window, in the miscellany options section (far left, center of window)
click on the file staging button. This will launch the File Staging dialog box (shown
below) in which you will be able to set up the file staging you desire.
The File Selection Box will be initialized with your current working directory. If you wish
to select a different directory, double-click on its name, and xpbs will list the contents of
the new directory in the File Selection Box. When the correct directory is displayed, sim-
ply click on the name of the file you wish to stage (in or out). Its name will be written in
the File Selected area.
Next, click either of the Add file selected... buttons to add the named file to the stagein or
PBS Professional 9.1 137
User’s Guide
stageout list. Doing so will write the file name into the corresponding area on the lower
half of the File Staging window. Now you need to provide location information. For
stagein, type in the path and filename where you want the named file placed. For stageout,
specify the hostname and pathname where you want the named file delivered. You may
repeat this process for as many files as you need to stage.
When stagein fails, the job is placed in a 30-minute wait to allow the user time to fix the
problem. Typically this is a missing file or a network outage. Email is sent to the job
owner when the problem is detected. Once the problem has been resolved, the job owner
or the Operator may remove the wait by resetting the time after which the job is eligible to
be run via the -a option to qalter. The server will update the job’s comment with infor-
mation about why the job was put in the wait state. When the job is eligible to run, it may
run on different vnodes.
When stageout encounters an error, there are three retries. PBS waits 1 second and tries
again, then waits 11 seconds and tries a third time, then finally waits another 21 seconds
and tries a fourth time. Email is sent to the job owner if all attempts fail.
The pbsdsh command allows you to distribute and execute a task on each of the vnodes
assigned to your job. (pbsdsh uses the PBS Task Manager API, see tm(3), to distribute
the program on the allocated vnodes.)
Note that the double dash must come after the options and before the program and argu-
ments. The double dash is only required for Linux.
-v Verbose output about error messages and task exit status is pro-
duced.
When run without the -c or the -n option, pbsdsh will spawn the program on all
vnodes allocated to the PBS job. The execution take place concurrently--all copies of the
task execute at (about) the same time.
The following example shows the pbsdsh command inside of a PBS batch job. The
options indicate that the user wants pbsdsh to run the myapp program with one argu-
ment (app-arg1) on all four vnodes allocated to the job (i.e. the default behavior).
#!/bin/sh
#PBS -l select=4:ncpus=1
#PBS -l walltime=1:00:00
The pbsdsh command runs one task for each line in the PBS_NODEFILE. Each MPI
rank will get a single line in the PBS_NODEFILE, so if you are running multiple MPI
ranks on the same host, you will still get multiple pbsdsh tasks on that host.
An Advance Reservation is a set of resources with availability limited to a specific user (or
group of users), a specific start time, and a specified duration. The user submits an
advance reservation with the pbs_rsub command. PBS will then confirm that the reser-
PBS Professional 9.1 139
User’s Guide
vation can be met, or else reject the request. Once the scheduler has confirmed the reserva-
tion, the queue that was created to support this reservation will be enabled, allowing jobs
to be submitted to it. The queue will have a user level access control list set to the user
who submitted the reservation and any other users the owner specified. The queue will
accept jobs in the same manner as normal queues. When the reservation start time is
reached, the queue will be started. Once the reservation is complete, any jobs remaining in
the queue (running or not) will be deleted, and the reservation removed from the Server.
When a reservation is requested and confirmed, it means that a check was made to see if
the reservation would conflict with currently running jobs, other confirmed reservations,
and dedicated time. A reservation request that fails this check is denied by the Scheduler.
If the submitter did not indicate that the submission command should wait for confirma-
tion or rejection (-I option), he will have to periodically query the Server about the status
of the reservation (via pbs_rstat) or wait for a mail message regarding its denial or
confirmation.
Vnodes that have been configured to accept jobs only from a specific queue (vnode-queue
restrictions) cannot be used for advance reservations. See your local PBS Administrator to
determine if this affects your site.
Leave enough time between reservations for the reservations and jobs in them to clean up.
A job consumes reservations even while it is in the “E” or exiting state. This can take
longer when large files are being staged. If the job is still running when the reservation
ends, it may take up to two minutes to be cleaned up. The reservation itself cannot finish
cleaning up until its jobs are cleaned up. This will delay the start time of jobs in the next
reservation unless there is enough time between the reservations for cleanup.
Although a confirmed resources reservation will accept jobs into its queue at any time, the
scheduler is not allowed to schedule jobs from the queue before the reservation period
arrives. Once the reservation period arrives, these jobs will begin to run but they will not
in aggregate use up more resources than the reservation requested.
140 Chapter 8
Advanced PBS Features
A reservation job is started only if its requested walltime will fit within the reservation.
So for example if the reservation runs from 10:00 to 11:00, and the job’s walltime is 4
hours, the job will not be started.
The pbs_rsub command returns an ID string to use in referencing the reservation and an
indication of its current status. The actual specification of resources is done in the same
way as it is for submission of a job. Following is a list and description of options to the
pbs_rsub command.
[[[[CC]YY]MM]DD]hhmm[.SS]
If the day, DD, is not specified, it will default to today if the time
hhmm is in the future. Otherwise, the day will be set to tomor-
row. For example, if you submit a reservation having a specifi-
cation -R 1110 at 11:15am, it will be interpreted as being for
11:10am tomorrow. If the month portion, MM, is not specified, it
defaults to the current month provided that the specified day
DD, is in the future. Otherwise, the month will be set to next
month. Similar comments apply to the two other optional, left
hand components.
-E datetime Specifies the reservation end time. See the -R flag for a
description of the datetime string. If start time and duration are
the only times specified, the end time value is calculated.
-m mail_points Specifies whether mail is sent to user_list and when. The argu-
PBS Professional 9.1 141
User’s Guide
ment mail_points is a string. It can be either “n”, for no mail,
or a string composed of any combination of “a”, “b”, “e”, or “c”.
Default is “ac”. Must be enclosed in double quotes.
-M mail_list Specifies the list of users to whom the Server will attempt to send a
mail message whenever the reservation transitions to one of the
mail states specified in the -m option. Default: reservation’s owner
-N reservation_name
Declares a name for the reservation. The name specified may be
up to 15 characters in length. It must consist of printable, non-
white space characters with the first character alphabetic.
-W other-attributes=value...
This allows a site to define any extra attributes for the reserva-
tion.
Converts the normal job with job ID jobid into a reservation job that
will run as soon as possible. Creates the reservation and reservation
queue and places the job in the queue. Uses the resources requested
by the job to create the reservation.
The following example shows the submission of a reservation asking for 1 vnode, 30 min-
utes of wall-clock time, and a start time of 11:30. Note that since an end time is not speci-
fied, PBS will calculate the end time based on the reservation start time and duration.
A reservation queue named “R226” was created on the local PBS Server. Note that the
reservation is currently unconfirmed. Email will be sent to the reservation owner either
confirming the reservation, or rejecting it. Upon confirmation, the owner of the reserva-
tion can submit jobs against the reservation using the qsub command, naming the reser-
vation queue on the command line with the -q option, e.g.:
When the user requests an advance reservation of resources via the pbs_rsub command,
an option (“-I n”) is available to wait for confirmation response. The value “n” that is
specified is taken as the number of seconds that the command is willing to wait. This value
can be either positive or negative. A non-negative value means that the Server/scheduler
response is needed in “n or less” seconds. After that time the submitter will need to use
pbs_rstat or some other means to discern success or failure of the request. For a nega-
tive value, the command will wait up to “n” seconds for the request to be either confirmed
or denied. If the response does not come back in “n” or fewer seconds, the Server will
automatically delete the request from the system.
The pbs_rstat command is used to show the status of all the reservations on the PBS
Server. There are three different output formats: brief, short (default), and long. The fol-
lowing examples illustrate these three options.
The short option (-S) will show all the reservations in a short concise form. (This is the
default display if no options are given.) The information provided is the identifier of the
reservation, name of the queue that got created for the reservation, user who owns the res-
ervation, the state, the start time, duration in seconds, and the end time.
pbs_rstat -S
Name Queue User State Start / Duration / End
---------------------------------------------------------
R226 R226 user1 CO Today 11:30 / 1800 / Today 12:00
R302 R302 barry CO Today 15:50 / 1800 / Today 16:20
R304 R304 user1 CO Today 15:46 / 1800 / Today 16:16
The full option (-f) will print out the name of the reservation followed by all the
attributes of the reservation.
PBS Professional 9.1 145
User’s Guide
pbs_rstat -f R226
Name: R226.south
Reserve_Owner = user1@south
reserve_type = 2
reserve_state = RESV_CONFIRMED
reserve_substate = 2
reserve_start = Fri Aug 24 11:30:00 2004
reserve_end = Fri Aug 24 12:00:00 2004
reserve_duration = 1800
queue = R226
Resource_List.ncpus = 1
Resource_List.mem = 500kb
Resource_List.nodes = 1
Resource_List.walltime = 00:30:00
Authorized_Users = user1@south
server = south
ctime = Fri Aug 24 06:30:53 2004
mtime = Fri Aug 24 06:30:53 2004
Variable_List = PBS_O_LOGNAME=user1,PBS_O_HOST=south
euser = user1
egroup = group1
The brief option (-B) will only show the identifiers of all the reservations:
pbs_rstat -B
Name: R226.south
Name: R302.south
Name: R304.south
The following table shows the list of possible states for an advance reservation. The ones
most commonly seen are CO, UN, BD, and RN.
146 Chapter 8
Advanced PBS Features
The pbs_rdel command deletes reservations in the order in which their reservation
identifiers are presented to the command. A reservation may be deleted by its owner, or a
PBS operator/manager. Note that when a reservation is deleted, all jobs belonging to the
reservation are deleted as well, regardless of whether or not they are currently running.
pbs_rdel R304
8.8.5 Accounting
Accounting records for advance resource reservations are available in the Server's job
accounting file. The format of such records closely follows the format that exists for job
records. In addition, any job that belongs to an advance reservation will have the reserva-
tion ID recorded in the accounting records for the job.
PBS Professional 9.1 147
User’s Guide
8.8.6 Access Control
A site administrator can inform the Server as to those hosts, groups, and users whose
advance resource reservation requests are (or are not) to be considered. The philosophy in
this regard is same as that which currently exists for jobs.
In a similar vein, the user who submits the advance resource reservation request can spec-
ify to the system those other parties (user(s) or group(s)) that are authorized to submit jobs
to the reservation queue that's to be created.
When this queue is instantiated, these specifications will supply the values for the queue's
user/group access control lists. Likewise, the party who submits the reservation can, if
desired, control the username and group name at the Server that the Server associates with
the reservation.
Dedicated time is one or more specific time periods defined by the administrator. These
are not repeating time periods. Each one is individually defined.
During dedicated time, the only jobs PBS starts are those in special dedicated time queues.
PBS schedules non-dedicated jobs so that they will not run over into dedicated time. Jobs
in dedicated time queues are also scheduled so that they will not run over into non-dedi-
cated time. PBS will attempt to backfill around the dedicated-non-dedicated time borders.
PBS uses walltime to schedule within and around dedicated time. If a job is submitted
without a walltime to a non-dedicated-time queue, it will not be started until all dedicated
time periods are over. If a job is submitted to a dedicated-time queue without a walltime,
it will never run.
To submit a job to be run during dedicated time, use the -q <queue name> option to qsub
and give the name of the dedicated-time queue you wish to use as the queue name.
Queues are created by the administrator; see your administrator for queue name(s).
PBS supports Comprehensive System Accounting (CSA) on SGI Altix machines that are
running SGI’s Pro Pack 2.4, 3.0, 3.2 or 4.0 and have the Linux job container facility avail-
148 Chapter 8
Advanced PBS Features
able. CSA provides accounting information about user jobs, called user job accounting.
CSA works the same with and without PBS. To run user job accounting, either the user
must specify the file to which raw accounting information will be written, or an environ-
ment variable must be set. The environment variable is “ACCT_TMPDIR”. This is the
directory where a temporary file of raw accounting data is written.
To run user job accounting, the user issues the CSA command “ja <filename>” or, if
the environment variable “ACCT_TMPDIR” is set, “ja”. In order to have an accounting
report produced, the user issues the command “ja -<options>” where the options
specify that a report will be written and what kind. To end user job accounting, the user
issues the command “ja -t”; the -t option can be included in the previous set of options.
See the manpage on ja for details.
The starting and ending ja commands must be used before and after any other commands
the user wishes to monitor. Here are examples of command line and a script:
PBS Professional includes optional support for UNIX-based DCE. (By optional, we mean
that the customer may acquire a copy of PBS Professional with the standard security and
authentication module replaced with the DCE module.)
There are two -W options available with qsub which will enable a dcelogin context to
be set up for the job when it eventually executes. The user may specify either an encrypted
password or a forwardable/renewable Kerberos V5 TGT.
Specify the “-W cred=dce” option to qsub if a forwardable, renewable, Kerberos V5,
TGT (ticket granting ticket) with the user as the listed principal is what is to be sent with
the job. If the user has an established credentials cache and a non-expired, forwardable,
renewable, TGT is in the cache, that information is used.
The other choice, “-W cred=dce:pass”, causes the qsub command to interact with
the user to generate a DES encryption of the user's password. This encrypted password is
sent to the PBS Server and MOM processes, where it is placed in a job-specific file for
later use by pbs_mom in acquiring a DCE login context for the job. The information is
destroyed if the job terminates, is deleted, or aborts.
Important: The ”-W pwd=’’” option to qsub has been superseded by the
above two options, and therefore should no longer be used.
Any acquired login contexts and accompanying DCE credential caches established for the
job get removed on job termination or deletion.
Important: The “-W cred” option to qsub is not available under Win-
dows.
PBS Professional includes optional support for Kerberos-only (i.e. no DCE) environment.
(By optional, we mean that the customer may acquire a copy of PBS Professional with the
standard security and authentication module replaced with the KRB5 module.) This is
150 Chapter 8
Advanced PBS Features
not supported under Windows.
A process running as part of a job can use large pages. The memory reported in
resources_used.mem may be larger with large page sizes.
For more information see the man page for setpcred. This can be viewed with the
command "man setpcred" on an AIX machine.
You can run a job that requests large page memory in "mandatory mode":
% qsub
export LDR_CNTRL="LARGE_PAGE_DATA=M"
/path/to/exe/bigprog
^D
You can run a job that requests large page memory in "advisory mode":
% qsub
export LDR_CNTRL="LARGE_PAGE_DATA=Y"
/path/to/exe/bigprog
^D
PBS Professional 9.1 151
User’s Guide
Chapter 9
Job Arrays
This chapter describes job arrays and their use. A job array represents a collection of jobs
which only differ by a single index parameter. The purpose of a job array is twofold. It
offers the user a mechanism for grouping related work, making it possible to submit,
query, modify and display the set as a single unit. Second, it offers a way to possibly
improve performance, because the batch system can use certain known aspects of the col-
lection for speedup.
9.1 Definitions
Subjob Individual entity within a job array (e.g. 1234[7], where 1234[] is
the job array itself, and 7 is the index) which has many properties of
a job as well as additional semantics (defined below.)
Sequence_number The numeric part of a job or job array identifier, e.g. 1234.
Subjob index The unique index which differentiates one subjob from another.
This must be a non-negative integer.
Job array The identifier returned upon success when submitting a job array.
identifier The format is sequence_number[] or
sequence_number[].server.domain.com.
152 Chapter 9
Job Arrays
Job array range A set of subjobs within a job array. When specifying a range,
indices used must be valid members of the job array’s indices.
9.1.1 Description
A job array is a compact representation of one or more jobs, called subjobs when part of a
Job array, which have the same job script, and have the same values for all attributes and
resources, with the following exceptions:
• each subjob has a unique index
• Job Identifiers of subjobs only differ by their indices
• the state of subjobs can differ
All subjobs within a job array have the same scheduling priority.
A job array is submitted through a single command which returns, on success, a “job array
identifier” with a server-unique sequence number. Subjob indices are specified at submis-
sion time. These can be:
• a contiguous range, e.g. 1 through 100
• a range with a stepping factor, e.g. every second entry in
1 through 100 (1, 3, 5, ... 99)
Examples:
The sequence number (1234 in 1234[].server) is unique, so that jobs and job arrays cannot
share a sequence number.
Note: Since some shells, for example csh and tcsh, read “[“ and “]” as shell metacharac-
ters, job array names and subjob names will need to be enclosed in double quotes for all
PBS commands.
Example:
qdel “1234.myhost[5]”
qdel “1234.myhost[]”
Single quotes will work, except where you are using shell variable substitution.
To submit a job array, qsub is used with the option -J range, where range is of the form
X-Y[:Z]. X is the starting index, Y is the ending index, and Z is the optional stepping
factor. X and Y must be whole numbers, and Z must be a positive integer. Y must be
greater than X. If Y is not a multiple of the stepping factor above X, (i.e. it won’t be used
as an index value) the highest index used will be the next below Y. For example, 1-100:2
gives 1, 3, 5, ... 99.
Blocking qsub waits until the entire job array is complete, then returns the exit status of
the job array.
Examples:
Job arrays and subjobs have all of the attributes of a job. In addition, they have the follow-
ing when appropriate. These attributes are read-only.
Job array states map closely to job states except for the ‘B’ state. The ‘B’ state applies to
job arrays and indicates that at least one subjob has left the queued state and is running or
has run, but not all subjobs have run. Job arrays will never be in the ‘R’, ‘S’ or ‘U’ states.
State Indication
State Indication
Q Queued
R Running
E Ending
X Expired or deleted; subjob has completed execution or been deleted
S Suspended
U Suspended by keyboard activity
Environment Variable
Used For Description
Name
File staging for job arrays is like that for jobs, with an added variable to specify the subjob
index. This variable is ^array_index^. This is the name of the variable that will be used
for the actual array index. The stdout and stderr files follow the naming convention for
jobs, but include the identifier of the job array, which includes the subscripted index. As
with jobs, the stagein and stageout keywords require the -W option to qsub.
PBS Professional 9.1 157
User’s Guide
9.6.1 Job Array File Staging Syntax on UNIX
stagein = local_path@host:remote_path
stageout = local_path@host:remote_path
Examples:
Remote_path: /film
Data files used as input: frame1, frame2, frame3
Host: store
Local_path: /tmp
Executable: a.out
#PBS -W stagein=/tmp/in/frame^array_index^@store:/film/frame^array_index^
#PBS- W stageout=/tmp/out/frame^array_index^.out \
@store:/film/frame^array_index^.out
#PBS -J 1-3
a.out frame$PBS_ARRAY_INDEX /tmp/in /tmp/out
Note that the stageout statement is all one line, broken here for readability.
The result will be that the user’s directory named “film” contains the original files frame1,
frame2, frame3, plus the new files frame1.out, frame2.out and frame3.out.
9.6.1.1 Scripts
Example 1
In this example, we have a script named ArrayScript which calls scriptlet1 and scriptlet2.
All three scripts are located in /homedir/testdir.
#!/bin/sh
#PBS -N ArrayExample
158 Chapter 9
Job Arrays
#PBS -J 1-2
echo "Main script: index " $PBS_ARRAY_INDEX
/homedir/testdir/scriptlet$PBS_ARRAY_INDEX
In our example, scriptlet1 and scriptlet2 simply echo their names. We run ArrayScript
using the qsub command:
qsub ArrayScript
Example 2
In this example, we have a script called StageScript. It takes two input files, dataX and
extraX, and makes an output file, newdataX, as well as echoing which iteration it is
on. The dataX and extraX files will be staged from inputs to work, then new-
dataX will be staged from work to outputs.
#!/bin/sh
#PBS -N StagingExample
#PBS -J 1-2
#PBS -W stagein=/homedir/work/data^array_index^
@host1:/homedir/inputs/data^array_index^
#PBS -W stagein=/homedir/work/extra^array_index^
@host1:/homedir/inputs/extra^array_index^
#PBS -W stageout=/homedir/work/newdata^array_index^
@host1:/homedir/outputs/newdata^array_index^
echo "Main script: index " $PBS_ARRAY_INDEX
cd /homedir/work
cat data$PBS_ARRAY_INDEX extra$PBS_ARRAY_INDEX
>> newdata$PBS_ARRAY_INDEX
qsub StageScript
It will run in /homedir, our home directory, which is why the line
“cd /homedir/work” is in the script.
PBS Professional 9.1 159
User’s Guide
9.6.1.2 Output Filenames
The name of the job array will default to the script name if no name is given via qsub -N.
For example, if the sequence number were 1234,
#PBS -N fixgamma
would give stdout for index number 7 the name fixgamma.o1234.7 and stderr the name
fixgamma.e1234.7.
The name of the job array can also be given through stdin.
In Windows the stagein and stageout string must be contained in double quotes when
using ^array_index^.
Example of a stagein:
Example of a stageout:
qsub -W stageout="C:\WINNT\Temp\foo.^array_index^@host-1:Q:\my_username\foo.
^array_index^_out" -J 1-5 stage_script
Note: Some shells such as csh and tcsh use the square bracket (“[“, “]”) as a metacharac-
ter. When using one of these shells, and a PBS command taking subjobs, job arrays or job
array ranges as arguments, the subjob, job array or job array range must be enclosed in
double quotes.
The following table shows PBS commands that take job arrays, subjobs or ranges as argu-
ments. The cells in the table indicate which objects are acted upon. In the table,
Argument to Command
Command
Array[] Array[Range] Array[Index]
The qstat command is used to query the status of a Job Array. The default output is to list
PBS Professional 9.1 161
User’s Guide
the Job Array in a single line, showing the Job Array Identifier. Options can be combined.
To show the state of all running subjobs, use -t -r. To show the state only of subjobs, not
job arrays, use -t -J.
Option Result
Examples:
We run an example job and an example job array, on a machine with 2 processors:
demoscript:
#!/bin/sh
#PBS -N JobExample
sleep 60
arrayscript:
#!/bin/sh
#PBS -N ArrayExample
#PBS -J 1-5
sleep 60
qstat -J
Job id Name User Time Use S Queue
----------- ------------ ---------- -------- - -----
1235[].host ArrayExample user1 0 B workq
qstat -p
Job id Name User % done S Queue
----------- ------------ ---------- ------- - -----
1235[].host ArrayExample user1 0 B workq
1236.host JobExample user1 -- Q workq
qstat -t
Job id Name User Time Use S Queue
----------- ------------ ---------- -------- - -----
1235[].host ArrayExample user1 0 B workq
1235[1].host ArrayExample user1 00:00:00 R workq
1235[2].host ArrayExample user1 00:00:00 R workq
1235[3].host ArrayExample user1 0 Q workq
1235[4].host ArrayExample user1 0 Q workq
1235[5].host ArrayExample user1 0 Q workq
1236.host JobExample user1 0 Q workq
qstat -Jt
Job id Name User Time Use S Queue
------------ ------------ ----- -------- - -----
1235[1].host ArrayExample user1 00:00:00 R workq
1235[2].host ArrayExample user1 00:00:00 R workq
1235[3].host ArrayExample user1 0 Q workq
1235[4].host ArrayExample user1 0 Q workq
1235[5].host ArrayExample user1 0 Q workq
PBS Professional 9.1 163
User’s Guide
After the first two subjobs finish:
qstat -Jtp
Job id Name User % done S Queue
------------ ------------ ----- ------ - -----
1235[1].host ArrayExample user1 100 X workq
1235[2].host ArrayExample user1 100 X workq
1235[3].host ArrayExample user1 -- R workq
1235[4].host ArrayExample user1 -- R workq
1235[5].host ArrayExample user1 -- Q workq
qstat -pt
Job id Name User % done S Queue
------------ ------------ ----- ------ - -----
1235[].host ArrayExample user1 40 B workq
1235[1].host ArrayExample user1 100 X workq
1235[2].host ArrayExample user1 100 X workq
1235[3].host ArrayExample user1 -- R workq
1235[4].host ArrayExample user1 -- R workq
1235[5].host ArrayExample user1 -- Q workq
1236.host JobExample user1 -- Q workq
qstat -Jrt
host:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------- -------- ----- --------- ------ --- --- ------ ----- - -----
1235[5].host user1 workq ArrayExamp 048 -- 1 -- -- R 00:01
164 Chapter 9
Job Arrays
9.7.3 qdel: Deleting a Job Array
The qdel command will take a job array identifier, subjob identifier or job array range.
The indicated object(s) are deleted, including any currently running subjobs. Running
subjobs are treated like running jobs. Subjobs not running will be deleted and never run.
Only one email is sent per deleted job array, so deleting a job array of 5000 subjobs results
in one email being sent.
The qalter command can only be used on a job array object, not on subjobs or ranges. Job
array attributes are the same as for jobs.
The qorder command can only be used with job array objects, not on subjobs or ranges.
This will change the queue order of the job array in association with other jobs or job
arrays in the queue.
The qmove command can only be used with job array objects, not with subjobs or ranges.
Job arrays can only be moved from one server to another if they are in the ‘Q’, ‘H’, or ‘W’
states, and only if there are no running subjobs. The state of the job array object is pre-
served in the move. The job array will run to completion on the new server.
As with jobs, a qstat on the server from which the job array was moved will not show the
job array. A qstat on the job array object will be redirected to the new server.
Note: The subjob accounting records will be split between the two servers.
The qhold command can only be used with job array objects, not with subjobs or ranges.
A hold can be applied to a job array only from the ‘Q’, ‘B’ or ‘W’ states. This will put the
job array in the ‘H’, held, state. If any subjobs are running, they will run to completion.
No queued subjobs will be started while in the ‘H’ state.
The qrls command can only be used with job array objects, not with subjobs or ranges. If
PBS Professional 9.1 165
User’s Guide
the job array was in the ‘Q’ or ‘B’ state, it will be returned to that state. If it was in the
‘W’ state, it will be returned to that state unless its waiting time was reached, it will go to
the ‘Q’ state.
The qrerun command will take a job array identifier, subjob identifier or job array range.
If a job array identifier is given as an argument, it is returned to its initial state at submis-
sion time, or to its altered state if it has been qaltered. All of that job array’s subjobs are
requeued, which includes those that are currently running, and completed and deleted. If
a subjob or range is given, those subjobs are requeued as jobs would be.
The qrun command takes a subjob or a range of subjobs, not a job array object. If a single
subjob is given as the argument, it is run as a job would be. If a range of subjobs is given
as the argument, the non-running subjobs within that range will be run.
The tracejob command can be run on job arrays and individual subjobs. When tracejob is
run on a job array or a subjob, the same information is displayed as for a job, with addi-
tional information for a job array. Note that subjobs do not exist until they are running, so
tracejob will not show any information until they are. When tracejob is run on a job array,
the information displayed is only that for the job array object, not the subjobs. Job arrays
themselves do not produce any MOM log information. Running tracejob on a job array
will give information about why a subjob did not start.
If a job array object, subjob or job array range is given to qsig, all currently running sub-
jobs within the specified set will be sent the signal.
The default behavior of qselect is to return the job array identifier, without returning sub-
job identifiers.
Note: qselect will not return any job arrays when the state selection (-s) option restricts the
set to ‘R’, ‘S’, ‘T’ or ‘U’, because a job array will never be in any of these states. How-
ever, qselect can be used to return a list of subjobs by using the -t option.
Options to qselect can be combined. For example, to restrict the selection to subjobs, use
both the -J and the -T options. To select only running subjobs, use -J -T -sR.
Jobs and subjobs are treated the same way by job run limits. For example, if
max_user_run is set to 5, a user can have a maximum of 5 subjobs and/or jobs running.
PBS Professional 9.1 167
User’s Guide
9.10.2 Starving
A job array’s starving status is based on the queued portion of the array. This means that if
there is a queued subjob which is starving, the job array is starving. A running subjob
retains its starving status when it was started.
Note: Job dependencies are not supported for subjobs or ranges of subjobs.
9.10.4 Accounting
Job accounting records for job arrays and subjobs are the same as for jobs. When a job
array has been moved from one server to another, the subjob accounting records are split
between the two servers, except that there will be no ‘Q’ records for subjobs.
9.10.5 Checkpointing
Checkpointing is not supported for job arrays. On systems that support checkpointing,
subjobs are not checkpointed, instead they run to completion.
If defined, prologues and epilogues will run at the beginning and end of each subjob, but
not for job arrays.
The exit status of a job array is determined by the status of each of the completed subjobs.
It is only available when all valid subjobs have completed. The individual exit status of a
completed subjob is passed to the epilogue, and is available in the ‘E’ accounting log
record of that subjob.
168 Chapter 9
Job Arrays
0 All subjobs of the job array returned an exit status of 0. No PBS error
occurred. Deleted subjobs are not considered
1 At least 1 subjob returned a non-zero exit status. No PBS error occurred.
2 A PBS error occurred.
All subjobs within a job array have the same scheduling priority.
9.10.8.1 Preemption
9.10.8.3 Fairshare
Subjobs are treated like jobs with respect to fairshare ordering, fairshare accounting and
fairshare limits. If running enough subjobs of a job array causes the priority of the owning
entity to change, additional subjobs from that job array may not be the next to start.
All nodes associated with a single subjob should belong to the same placement set or node
group. Different subjobs can be put on different placement sets or node groups.
PBS Professional 9.1 169
User’s Guide
Chapter 10
Multiprocessor Jobs
Placement sets allow partitioning by multiple resources, so that a vnode may be in one set
that share a value for one resource, and another set that share a different value for a differ-
ent resource. See the PBS Professional Administrator’s Guide.
If a job requests grouping by a resource, i.e. place=group=resource, then the chunks are
placed as requested and complex-wide node grouping is ignored.
If a job is to use node grouping but the required number of vnodes is not defined in any
one group, grouping is ignored. This behavior is unchanged.
To submit a job which should run on one host and which requires a certain number of cpus
and amount of memory, submit the job with:
The preferred method for submitting an MPI job is by specifying one chunk per MPI task.
For example, for a 10-way MPI job with 2gb of memory per MPI task, you would use:
qsub -l select=10:ncpus=1:mem=2gb
If you have a cluster of small systems with for example 2 CPUs each, and you wish to sub-
mit an MPI job that will run on four separate hosts, then submit:
The PBS_NODEFILE file will contain one entry for each of the hosts allocated to the job.
In the example above, it would contain 4 lines. The variables NCPUS and
OMP_NUM_THREADS will be set to one.
If you do not care where the four MPI processes are run, you may submit:
For this example, PBS_NODEFILE will contain 4 entries, either four separate hosts, or 3
hosts one of which is repeated once, or 2 hosts, etc. NCPUS and
OMP_NUM_THREADS will be set 1 or 2 depending on the number of cpus allocated
from the first listed host.
The number of MPI processes for a job is controlled by the value of the resource
mpiprocs. The mpiprocs resource controls the contents of the PBS_NODEFILE on the
host which executes the top PBS task for the PBS job (the one executing the PBS job
script.) See “Built-in Resources” on page 29. The PBS_NODEFILE contains one line per
MPI process with the name of the host on which that process should execute. The number
of lines in PBS_NODEFILE is equal to the sum of the values of mpiprocs over all chunks
PBS Professional 9.1 171
User’s Guide
requested by the job. For each chunk with mpiprocs=P, (where P > 0), the host name (the
value of the allocated vnode's resources_available.host) is written to the PBS_NODEFILE
exactly P times.
If a user wishes to run two MPI processes on each of 3 hosts and have them "share" a sin-
gle processor on each host, the user would request
-lselect=3:ncpus=1:mpiprocs=2
VnodeA
VnodeA
VnodeB
VnodeB
VnodeC
VnodeC
If you want 3 chunks, each with 2 CPUs and running 2 MPI process, use:
-l select=3:ncpus=2:mpiprocs=2...
VnodeA
VnodeA
VnodeB
VnodeB
VnodeC
VnodeC
The OMP_NUM_THREADS value can be set explicitly by using the ompthreads pseudo-
172 Chapter 10
Multiprocessor Jobs
resource for any chunk within the select statement. If ompthreads is not used, then
OMP_NUM_THREADS is set to the value of the ncpus resource of that chunk. If neither
ncpus nor ompthreads is used within the select statement, then OMP_NUM_THREADS is
set to 1.
To submit an OpenMP job is as a single chunk, for a 2-CPU job requiring 10gb of mem-
ory, you would use:
qsub -l select=1:ncpus=2:mem=10gb
You might be running an OpenMP application on a host and wish to run fewer threads
than the number of CPUs requested. This might be because the threads need exclusive
access to shared resources in a multi-core processor system, such as to a cache shared
between cores, or to the memory shared between cores. If you want one chunk, with 16
CPUs and 8 threads:
qsub -l select=1:ncpus=16:ompthreads=8
You might be running an OpenMP application on a host and wish to run more threads than
the number of CPUs requested (because each thread is I/O bound perhaps). If you want
one chunk, with eight CPUs and 16 threads:
qsub -l select=1:ncpus=8:ompthreads=16
For jobs that are both MPI and multi-threaded, the number of threads per chunk, for all
chunks, is set to the number of threads requested (explicitly or implicitly) in the first
chunk, except for MPIs that have been integrated with the PBS TM API. For these MPIs
(LAM MPI), you can specify the number of threads separately for each chunk. This
means that for most MPIs, OMP_NUM_THREADS and NCPUS will default to the num-
ber of ncpus requested on the first chunk, and for integrated MPIs, you can set the
ompthreads resource separately for each chunk.
Should you have a job that is both MPI and multi-threaded, you can request one chunk for
each MPI process, or set mpiprocs to the number of MPI processes you want on each
chunk.
For example, to request 4 chunks, each with 1 MPI process, 2 CPUs and 2 threads:
PBS Professional 9.1 173
User’s Guide
qsub -l select=4:ncpus=2
or
qsub -l select=4:ncpus=2:ompthreads=2
qsub -l select=4:ncpus=2:ompthreads=4
qsub -l select=16:ncpus=2
To request two chunks, each with 8 CPUs and 8 MPI tasks and four threads:
qsub -l select=2:ncpus=8:mpiprocs=8:ompthreads=4
Example:
qsub -l select=4:ncpus=2
This request is satisfied by 4 CPUs from VnodeA, 2 from VnodeB and 2 from VnodeC, so
the following is written to the PBS_NODEFILE:
VnodeA
VnodeA
VnodeB
VnodeC
The OpenMP environment variables are set (for the 4 PBS tasks corresponding to the 4
MPI processes) as follows:
For PBS task #1 on VnodeA: OMP_NUM_THREADS=2 NCPUS=2
For PBS task #2 on VnodeA: OMP_NUM_THREADS=2 NCPUS=2
For PBS task #3 on VnodeB: OMP_NUM_THREADS=2 NCPUS=2
For PBS task #4 on VnodeC: OMP_NUM_THREADS=2 NCPUS=2
Example:
qsub -l select=3:ncpus=2:mpiprocs=2:ompthreads=1
This is satisfied by 2 CPUs from each of three vnodes (VnodeA, VnodeB, and VnodeC),
so the following is written to the PBS_VNODEFILE:
VnodeA
VnodeA
174 Chapter 10
Multiprocessor Jobs
VnodeB
VnodeB
VnodeC
VnodeC
The OpenMP environment variables are set (for the 6 PBS tasks corresponding to the 6
MPI processes) as follows:
For PBS task #1 on VnodeA: OMP_NUM_THREADS=1 NCPUS=1
For PBS task #2 on VnodeA: OMP_NUM_THREADS=1 NCPUS=1
For PBS task #3 on VnodeB: OMP_NUM_THREADS=1 NCPUS=1
For PBS task #4 on VnodeB: OMP_NUM_THREADS=1 NCPUS=1
For PBS task #5 on VnodeC: OMP_NUM_THREADS=1 NCPUS=1
For PBS task #6 on VnodeC: OMP_NUM_THREADS=1 NCPUS=1
To run two threads on each of N chunks, each running a process, all on the same Altix:
This starts N processes on a single host, with two OpenMP threads per process, because
OMP_NUM_THREADS=2.
For most implementations of the Message Passing Interface (MPI), you would use the
mpirun command to launch your application. For example, here is a sample PBS script
for an MPI job:
#PBS -l select=arch=linux
#
mpirun -np 32 -machinefile $PBS_NODEFILE a.out
For users of PBS with MPICH on Linux, the mpirun command has been changed
slightly. The syntax and arguments are the same except for one option, which should not
be set by the user:
-machinefile file PBS supplies the machinefile. If the user tries to specify it, PBS
will print a warning that it is replacing the machinefile.
PBS Professional 9.1 175
User’s Guide
#PBS -l select=arch=linux
#
mpirun a.out
The pbs_mpilam command follows the convention of LAM's mpirun. The “nodes”
here are LAM nodes. LAM's mpirun has two syntax forms:
Where
<where> is a set of node and/or CPU identifiers indicating where to start <program>:
The first form is fully supported by PBS: all user MPI processes are tracked. The second
form is supported, but user MPI processes are not tracked.
CAUTION: Keep in mind that if the <where> argument and global option -np or -c are
not specified in the command line, then pbs_mpilam will expect an ASCII schema file
as argument.
PBS users of AIX machines running IBM’s Parallel Operating Environment, or POE, can
176 Chapter 10
Multiprocessor Jobs
run jobs on the HPS using either IP or US mode. PBS will manage the HPS.
Under PBS, the poe command is slightly different. The syntax and arguments are the
same except for the following:
Options:
-hostfile <file> PBS supplies the hostfile to POE. Any specification for hostfile
will be ignored.
-euilib {ip | us} If the command line option -euilib is set, it will take precedence
over the MP_EUILIB environment variable. If the -euilib
option is set to us, user mode is set for the job. If the option is
set to any other value, that value is passed to poe.
-msg_api This option can only take the values "MPI" or "LAPI".
Environment variables:
MP_MSG_API This variable can only take the values "MPI" or "LAPI".
Notes:
1 Since PBS is tracking tasks started by poe, these tasks are counted towards a
user’s run limits.
2 Running multiple poe jobs in the background will not work. Instead, run poe
PBS Professional 9.1 177
User’s Guide
jobs one after the other or submit separate jobs. Otherwise HPS windows will be used by
more than one task.
3 The tracejob command will show any of various error messages.
1. Using IP mode, run a single executable poe job with 4 ranks on hosts spread across
the PBS-allocated nodes listed in $PBS_NODEFILE:
% cat $PBS_NODEFILE
host1
host2
host3
host4
% cat job.script
poe /path/mpiprog -euilib ip
% qsub -l select=4:ncpus=1 -lplace=scatter job.script
2. Using US mode, run a single executable poe job with 4 ranks on hosts spread
across the PBS-allocated nodes listed in $PBS_NODEFILE:
% cat $PBS_NODEFILE
host1
host2
host3
host4
% cat job.script
poe /path/mpiprog -euilib us
3. Using IP mode, run executables prog1 and prog2 with 2 ranks of prog1 on host1, 2
ranks of prog2 on host2 and 2 ranks of prog2 on host3.
% cat $PBS_NODEFILE
178 Chapter 10
Multiprocessor Jobs
host1
host1
host2
host2
host3
host3
% cat job.script
echo prog1 > /tmp/poe.cmd
echo prog1 >> /tmp/poe.cmd
echo prog2 >> /tmp/poe.cmd
echo prog2 >> /tmp/poe.cmd
echo prog2 >> /tmp/poe.cmd
echo prog2 >> /tmp/poe.cmd
poe -cmdfile /tmp/poe.cmd -euilib ip
rm /tmp/poe.cmd
% qsub -l select=3:ncpus=2:mpiprocs=2 \
-l place=scatter job.script
4. Using US mode, run executables prog1 and prog2 with 2 ranks of prog1 on host1,
2 ranks of prog2 on host2 and 2 ranks of prog2 on host3.
% cat $PBS_NODEFILE
host1
host1
host2
host2
host3
host3
% cat job.script
echo prog1 > /tmp/poe.cmd
echo prog1 >> /tmp/poe.cmd
echo prog2 >> /tmp/poe.cmd
echo prog2 >> /tmp/poe.cmd
echo prog2 >> /tmp/poe.cmd
echo prog2 >> /tmp/poe.cmd
poe -cmdfile /tmp/poe.cmd -euilib us
rm /tmp/poe.cmd
PBS Professional 9.1 179
User’s Guide
% qsub -l select=3:ncpus=2:mpiprocs=2 \
-l place=scatter job.script
If your complex contains machines that are not on the HPS, and you wish to run on the
HPS, you must specify machines on the HPS. Your administrator will define a resource
on each host on the HPS. To specify machines on the HPS, you must request the "hps"
resource in your select statement. For this example, the resource is “hps”.
Using place=scatter: When "scatter" is used, the 4 chunks are on different hosts so each
host has 1 hps resource:
% qsub -l select=4:ncpus=2:hps=1
Using place=pack: When "pack" is used, all the chunks are put on one host so a chunk
with no resources and one "hps" must be specified:
% qsub -l select=4:ncpus=2+1:ncpus=0:hps=1
This ensures that the hps resource is only counted once. You could also use this:
% qsub -l select=1:ncpus=8:hps=1
For two chunks of 4 CPUs, one on one machine and one on another, you would use:
PBS is more tightly integrated with the mpirun command on HP-UX so that resources
can be tracked and processes managed. When running a PBS MPI job, you can use the
same arguments to the mpirun command as you would outside of PBS. The -h host
and -l user options will be ignored, and the -np number option will be modified to
fit the available resources.
You use the same command as you would use outside of PBS, either “mpirun.ch_gm” or
“mpirun”.
10.6.5.1 Options
Inside a PBS job script, all of the options to the PBS interface are the same as
mpirun.ch_gm except for the following:
-machinefile The file argument contents are ignored and replaced by the
<file> contents of the $PBS_NODEFILE.
-pg The use of the -pg option, for having multiple executables on
multiple hosts, is allowed but it is up to user to make sure only
PBS hosts are specified in the process group file; MPI processes
spawned on non-PBS hosts are not guaranteed to be under the
control of PBS.
10.6.5.2 Examples
Run a single-executable MPICH-GM job with 64 processes spread out across the PBS-
allocated hosts listed in $PBS_NODEFILE:
PBS_NODEFILE:
pbs-host1
pbs-host2
pbs-host3
qsub -l select=3:ncpus=1
mpirun.ch_gm -np 64 /path/myprog.x 1200
^D
<job-id>
Run an MPICH-GM job with multiple executables on multiple hosts listed in the process
group file “procgrp”:
PBS Professional 9.1 181
User’s Guide
qsub -l select=3:ncpus=1
echo "host1 1 user1 /x/y/a.exe arg1 arg2" > procgrp
echo "host2 1 user1 /x/x/b.exe arg1 arg2" >> procgrp
When the job runs, mpirun.ch_gm will give this warning message:
warning: “-pg” is allowed but it is up to user to make sure only PBS hosts are
specified; MPI processes spawned are not guaranteed to be under the control of
PBS.
The warning is issued because if any of the hosts listed in procgrp are not under
the control of PBS, then the processes on those hosts will not be under the control
of PBS.
You use the same command as you would use outside of PBS, either “mpirun.ch_mx” or
“mpirun”.
10.6.6.1 Options
Inside a PBS job script, all of the options to the PBS interface are the same as
mpirun.ch_mx except for the following:
-machinefile The file argument contents are ignored and replaced by the
<file> contents of the $PBS_NODEFILE.
-pg The use of the -pg option, for having multiple executables on
multiple hosts, is allowed but it is up to user to make sure only
PBS hosts are specified in the process group file; MPI processes
spawned on non-PBS hosts are not guaranteed to be under the
control of PBS.
10.6.6.2 Examples
Run a single-executable MPICH-MX job with 64 processes spread out across the PBS-
allocated hosts listed in $PBS_NODEFILE:
PBS_NODEFILE
pbs-host1
pbs-host2
pbs-host3
qsub -l select=3:ncpus=1
mpirun.ch_mx -np 64 /path/myprog.x 1200
^D
<job-id>
Run an MPICH-MX job with multiple executables on multiple hosts listed in the process
group file “procgrp”:
qsub -l select=2:ncpus=1
echo "pbs-host1 1 username /x/y/a.exe arg1 arg2" \
> procgrp
echo "pbs-host2 1 username /x/x/b.exe arg1 arg2" \
>> procgrp
mpirun.ch_mx -pg procgrp /path/myprog.x
rm -f procgrp
^D
<job-id>
warning: “-pg” is allowed but it is up to user to make sure only PBS hosts are
specified; MPI processes spawned are not guaranteed to be under PBS-control
PBS Professional 9.1 183
User’s Guide
The warning is issued because if any of the hosts listed in procgrp are not under
the control of PBS, then the processes on those hosts will not be under the control
of PBS.
You use the same command as you would use outside of PBS, either “mpirun.mpd” or
“mpirun”. If the MPD daemons are not already running, the PBS interface will take care
of starting them for you.
10.6.7.1 Options
Inside a PBS job script, all of the options to the PBS interface are the same as
mpirun.mpd with MPD except for the following:
-m <file> The file argument contents are ignored and replaced by the
contents of the $PBS_NODEFILE.
-pg The use of the -pg option, for having multiple executables on
multiple hosts, is allowed but it is up to user to make sure only
PBS hosts are specified in the process group file; MPI processes
spawned on non-PBS hosts are not guaranteed to be under the
control of PBS.
The script starts MPD daemons on each of the unique hosts listed in $PBS_NODEFILE,
using either the rsh or ssh method based on the value of the environment variable RSH-
COMMAND. The default is rsh. The script also takes care of shutting down the MPD dae-
184 Chapter 10
Multiprocessor Jobs
mons at the end of a run.
If the MPD daemons are not running, the PBS interface to mpirun.mpd will start GM's
MPD daemons as this user on the allocated PBS hosts. The MPD daemons may have been
started already by the administrator or by the user. MPD daemons are not started inside a
PBS prologue script since it won't have the path of mpirun.mpd that the user executed
(GM or MX), which would determine the path to the MPD binary.
10.6.7.3 Examples
Run a single-executable MPICH-GM job with 64 processes spread out across the PBS-
allocated hosts listed in $PBS_NODEFILE:
PBS_NODEFILE:
pbs-host1
pbs-host2
pbs-host3
qsub -l select=3:ncpus=1
[MPICH-GM-HOME]/bin/mpirun.mpd -np 64 \
/path/myprog.x 1200
^D
<job-id>
If the GM MPD daemons are not running, the PBS interface to mpirun.mpd
will start them as this user on the allocated PBS hosts. The daemons may
have been previously started by the administrator or the user.
Run an MPICH-GM job with multiple executables on multiple hosts listed in the process
group file “procgrp”:
Job script:
qsub -l select=3:ncpus=1
echo "host1 1 user1 /x/y/a.exe arg1 arg2" \
> procgrp
echo "host2 1 user1 /x/x/b.exe arg1 arg2" \
>> procgrp
When the job runs, mpirun.mpd will give the warning message:
warning: “-pg” is allowed but it is up to user to make sure only PBS hosts are
specified; MPI processes spawned are not guaranteed to be under
PBS-control.
The warning is issued because if any of the hosts listed in procgrp are not under
the control of PBS, then the processes on those hosts will not be under the control
of PBS.
You use the same command as you would use outside of PBS, either “mpirun.mpd” or
“mpirun”. If the MPD daemons are not already running, the PBS interface will take care
of starting them for you.
10.6.8.1 Options
Inside a PBS job script, all of the options to the PBS interface are the same as
mpirun.ch_gm with MPD except for the following:
-m <file> The file argument contents are ignored and replaced by the
contents of the $PBS_NODEFILE.
-pg The use of the -pg option, for having multiple executables on
multiple hosts, is allowed but it is up to user to make sure only
PBS hosts are specified in the process group file; MPI processes
spawned on non-PBS hosts are not guaranteed to be under the
186 Chapter 10
Multiprocessor Jobs
control of PBS.
The PBS mpirun interface starts MPD daemons on each of the unique hosts listed in
$PBS_NODEFILE, using either the rsh or ssh method, based on value of environment
variable RSHCOMMAND. The default is rsh. The interface also takes care of shutting
down the MPD daemons at the end of a run.
If the MPD daemons are not running, the PBS interface to mpirun.mpd will start MX's
MPD daemons as this user on the allocated PBS hosts. The MPD daemons may already
have been started by the administrator or by the user. MPD daemons are not started inside
a PBS prologue script since it won't have the path of mpirun.mpd that the user executed
(GM or MX), which would determine the path to the MPD binary.
10.6.8.3 Examples
Run a single-executable MPICH-MX job with 64 processes spread out across the PBS-
allocated hosts listed in $PBS_NODEFILE:
PBS_NODEFILE:
pbs-host1
pbs-host2
pbs-host3
qsub -l select=3:ncpus=1
[MPICH-MX-HOME]/bin/mpirun.mpd -np 64 \
/path/myprog.x 1200
^D
<job-id>
If the MPD daemons are not running, the PBS interface to mpirun.mpd will
start GM's MPD daemons as this user on the allocated PBS hosts. The MPD
daemons may be already started by the administrator or by the user.
Run an MPICH-MX job with multiple executables on multiple hosts listed in the process
group file “procgrp”:
qsub -l select=2:ncpus=1
echo "pbs-host1 1 username /x/y/a.exe arg1 arg2" \
> procgrp
echo "pbs-host2 1 username /x/x/b.exe arg1 arg2"\
PBS Professional 9.1 187
User’s Guide
>> procgrp
warning: “-pg” is allowed but it is up to user to make sure only PBS hosts
are specified; MPI processes spawned are not guaranteed to be under
PBS-control
The warning is issued because if any of the hosts listed in procgrp are not under
the control of PBS, then the processes on those hosts will not be under the control
of PBS.
PBS provides an interface to MPICH2’s mpirun. If executed inside a PBS job, this
allows for PBS to track all MPICH2 processes so that PBS can perform accounting and
have complete job control. If executed outside of a PBS job, it behaves exactly as if stan-
dard MPICH2's mpirun had been used.
You use the same “mpirun” command as you would use outside of PBS.
When submitting PBS jobs that invoke the pbsrun wrapper script for MPICH2's mpirun,
be sure to explicitly specify the actual number of ranks or MPI tasks in the qsub select
specification. Otherwise, jobs will fail to run with "too few entries in the machinefile".
#PBS -l select=1:ncpus=1:host=hostA+1:ncpus=2:host=hostB
mpirun -np 3 /tmp/mytask
hostA
hostB
188 Chapter 10
Multiprocessor Jobs
hostB
hostA
hostB
which would conflict with the "-np 3" specification in mpirun as only 2 MPD daemons
will be started.
a) #PBS -l select=1:ncpus=1:host=hostA+2:ncpus=1:host=hostB
b) #PBS -l select=1:ncpus=1:host=hostA+1:ncpus=2:host=hostB:mpiprocs=2
hostA
hostB
hostB
10.6.9.1 Options
If executed inside a PBS job script, all of the options to the PBS interface are the same as
MPICH2's mpirun except for the following:
-host, -ghost For specifying the execution host to run on. Ignored.
-machinefile The file argument contents are ignored and replaced by the con-
<file> tents of the $PBS_NODEFILE.
-localonly <x> For specifying the <x> number of processes to run locally. Not
supported. The user is advised instead to use the equivalent
arguments:
"-np <x> -localonly".
-np If the user does not specify a -np option, then no default value
is provided by the PBS wrapper scripts. It is up to the local
mpirun to decide what the reasonable default value should be,
PBS Professional 9.1 189
User’s Guide
which is usually 1. The maximum number of ranks that can be
launched is the number of entries in $PBS_NODEFILE.
The interface ensures that the MPD daemons are started on each of the hosts listed in the
$PBS_NODEFILE. It also ensures that the MPD daemons are shut down at the end of
MPI job execution.
10.6.9.3 Examples
Run a single-executable MPICH2 job with 6 processes spread out across the PBS-allo-
cated hosts listed in $PBS_NODEFILE:
PBS_NODEFILE:
pbs-host1
pbs-host2
pbs-host3
pbs-host1
pbs-host2
pbs-host3
Job.script:
<job-id>
PBS_NODEFILE:
hostA
hostA
hostB
190 Chapter 10
Multiprocessor Jobs
hostB
hostC
hostC
Job script:
#PBS -l select=3:ncpus=2
mpirun -np 2 /tmp/mpitest1 : \
-np 2 /tmp/mpitest2 : \
-np 2 /tmp/mpitest3
Run job:
qsub job.script
Run an MPICH2 job with multiple executables on multiple hosts using mpirun -con-
figfile option and $PBS_NODEFILE:
PBS_NODEFILE:
hostA
hostA
hostB
hostB
hostC
hostC
Job script:
#PBS -l select=3:ncpus=2
echo "-np 2 /tmp/mpitest1" > my_config_file
echo "-np 2 /tmp/mpitest2" >> my_config_file
echo "-np 2 /tmp/mpitest3" >> my_config_file
mpirun -configfile my_config_file
rm -f my_config_file
Run job:
qsub job.script
PBS Professional 9.1 191
User’s Guide
10.6.10 PBS Jobs with Intel MPI's mpirun
PBS provides an interface to Intel MPI’s mpirun. If executed inside a PBS job, this
allows for PBS to track all Intel MPI processes so that PBS can perform accounting and
have complete job control. If executed outside of a PBS job, it behaves exactly as if stan-
dard Intel MPI's mpirun was used.
You use the same “mpirun” command as you would use outside of PBS.
When submitting PBS jobs that invoke the pbsrun wrapper script for Intel MPI, be sure to
explicitly specify the actual number of ranks or MPI tasks in the qsub select specification.
Otherwise, jobs will fail to run with "too few entries in the machinefile".
#PBS -l select=1:ncpus=1:host=hostA+1:ncpus=2:host=hostB
mpirun -np 3 /tmp/mytask
hostA
hostB
hostB
hostA
hostB
which would conflict with the "-np 3" specification in mpirun as only 2 MPD daemons
will be started.
a) #PBS -l select=1:ncpus=1:host=hostA+2:ncpus=1:host=hostB
b) #PBS -l select=1:ncpus=1:host=hostA+1:ncpus=2:host=hostB:mpiprocs=2
10.6.10.1 Options
If executed inside a PBS job script, all of the options to the PBS interface are the same as
for Intel MPI’s mpirun except for the following:
-host, -ghost For specifying the execution host to run on. Ignored.
-machinefile The file argument contents are ignored and replaced by the con-
<file> tents of the $PBS_NODEFILE.
-np If the user does not specify a -np option, then no default value
is provided by the PBS interface. It is up to the local mpirun to
decide what the reasonable default value should be, which is
usually 1. The maximum number of ranks that can be
launched is the number of entries in $PBS_NODEFILE.
PBS Professional 9.1 193
User’s Guide
10.6.10.2 MPD Startup and Shutdown
Intel MPI's mpirun takes care of starting/stopping the MPD daemons. The PBS interface
to Intel MPI’s mpirun always passes the arguments -totalnum=<number of
mpds to start> and -file=<mpd_hosts_file> to the actual mpirun, taking
its input from unique entries in $PBS_NODEFILE.
10.6.10.3 Examples
Run a single-executable Intel MPI job with 6 processes spread out across the PBS-allo-
cated hosts listed in $PBS_NODEFILE:
PBS_NODEFILE:
pbs-host1
pbs-host2
pbs-host3
pbs-host1
pbs-host2
pbs-host3
Job.script:
# mpirun takes care of starting the MPD daemons
# on unique hosts listed in $PBS_NODEFILE,
# and also runs 6 processes mapped to each host
# listed in $PBS_NODEFILE; mpirun takes care of
# shutting down MPDs.
mpirun /path/myprog.x 1200
Run an Intel MPI job with multiple executables on multiple hosts using
$PBS_NODEFILE and mpiexec arguments to mpirun:
$PBS_NODEFILE
hostA
hostA
hostB
hostB
194 Chapter 10
Multiprocessor Jobs
hostC
hostC
Job script:
# mpirun runs MPD daemons on hosts listed
# in $PBS_NODEFILE
# mpirun runs 2 instances of mpitest1
# on hostA; 2 instances of mpitest2 on
# hostB; 2 instances of mpitest3 on
# hostC.
# mpirun takes care of shutting down the
# MPDs at the end of MPI job run.
mpirun -np 2 /tmp/mpitest1 : \
-np 2 /tmp/mpitest2 : \
-np 2 /tmp/mpitest3
Run an Intel MPI job with multiple executables on multiple hosts via the -configfile
option and $PBS_NODEFILE:
$PBS_NODEFILE:
hostA
hostA
hostB
hostB
hostC
hostC
Job script:
echo “-np 2 /tmp/mpitest1” >> my_config_file
echo “-np 2 /tmp/mpitest2” >> my_config_file
echo “-np 2 /tmp/mpitest3” >> my_config_file
# cleanup
rm -f my_config_file
You use the same “mpirun” command as you would use outside of PBS.
10.6.11.1 Options
If executed inside a PBS job script, all of the options to the PBS interface are the same as
MVAPICH1's mpirun except for the following:
-np If the user does not specify a -np option, then PBS uses the
number of entries found in the $PBS_NODEFILE. The maxi-
mum number of ranks that can be launched is the number of
entries in $PBS_NODEFILE.
10.6.11.2 Examples
Run a single-executable MVAPICH1 job with 6 ranks spread out across the PBS-allocated
196 Chapter 10
Multiprocessor Jobs
hosts listed in $PBS_NODEFILE:
PBS_NODEFILE:
pbs-host1
pbs-host1
pbs-host2
pbs-host2
pbs-host3
pbs-host3
Job.script:
# mpirun runs 6 processes mapped to each host listed
# in $PBS_NODEFILE
mpirun -np 6 /path/myprog
<job-id>
You use the same “mpiexec” command as you would use outside of PBS.
The maximum number of ranks that can be launched is the number of entries in
$PBS_NODEFILE.
10.6.12.1 Options
If executed inside a PBS job script, all of the options to the PBS interface are the same as
MVAPICH2's mpiexec except for the following:
The interface ensures that the MPD daemons are started on each of the hosts listed in the
$PBS_NODEFILE. It also ensures that the MPD daemons are shut down at the end of
MPI job execution.
10.6.12.3 Examples
PBS_NODEFILE:
pbs-host1
pbs-host2
pbs-host3
Job.script:
mpiexec -np 6 /path/mpiprog
<job-id>
Launch an MVAPICH2 MPI job with multiple executables on multiple hosts listed in the
default file "mpd.hosts". Here, run executables prog1 and prog2 with 2 ranks of prog1 on
host1, 2 ranks of prog2 on host2 and 2 ranks of prog2 on host3 all specified on the com-
mand line.
PBS_NODEFILE:
pbs-host1
pbs-host2
pbs-host3
198 Chapter 10
Multiprocessor Jobs
Job.script:
mpiexec -n 2 prog1 : -n 2 prog2 : -n 2 prog2
<job-id>
Launch an MVAPICH2 MPI job with multiple executables on multiple hosts listed in the
default file "mpd.hosts". Run executables prog1 and prog2 with 2 ranks of prog1 on
host1, 2 ranks of prog2 on host2 and 2 ranks of prog2 on host3 all specified using the
-configfile option.
PBS_NODEFILE:
pbs-host1
pbs-host2
pbs-host3
Job.script:
echo "-n 2 -host host1 prog1" > /tmp/jobconf
echo "-n 2 -host host2 prog2" >> /tmp/jobconf
echo "-n 2 -host host3 prog2" >> /tmp/jobconf
mpiexec -configfile /tmp/jobconf
rm /tmp/jobconf
<job-id>
In order to override the default rsh, set PBS_RSHCOMMAND in your job script:
export PBS_RSHCOMMAND=<rsh_cmd>
PBS Professional 9.1 199
User’s Guide
10.7 MPI Jobs on the Altix
PBS has its own mpiexec for the Altix running ProPack 4 or greater. The PBS mpiexec
has the standard mpiexec interface. The PBS mpiexec does require proper configuration
of the Altix. See your administrator to find out whether your system is configured for the
PBS mpiexec.
You can launch an MPI job on a single Altix, or across multiple Altixes. PBS will manage
and track the processes. You can use CSA, if it is configured, to collect accounting infor-
mation on your jobs. PBS will run the MPI tasks in the cpusets it manages.
You can run MPI jobs in the placement sets chosen by PBS. When a job is finished, PBS
will clean up after it.
For MPI jobs across multiple Altixes, PBS will manage the multihost jobs. For example,
if you have two Altixes named Alt1 and Alt2, and want to run two applications called
mympi1 and mympi2 on them, you can put this in your job script:
mpiexec -host Alt1 -n 4 mympi1 : -host Alt2 -n 8 mympi2
You can specify the name of the array to use via the PBS_MPI_SGIARRAY environment
variable.
To verify how many CPUs are included in a cpuset created by PBS, use:
> $ cpuset -d <set name> | egrep cpus
This will work either from within a job or not.
The alt_id returned by MOM has the form cpuset=<name>. <name> is the name of the
cpuset, which is the $PBS_JOBID.
Jobs will share cpusets if the jobs request sharing and the cpusets’ sharing attribute is not
set to force_excl. Jobs can share the memory on a nodeboard if they have a CPU from that
nodeboard. To fit as many small jobs as possible onto vnodes that already have shared
jobs on them, request sharing in the job resource requests.
PBS will try to put a job that will fit in a single nodeboard on just one nodeboard. How-
ever, if the only CPUs available are on separate nodeboards, and those vnodes are not allo-
cated exclusively to existing jobs, and the job can share a vnode, then the job will be run
200 Chapter 10
Multiprocessor Jobs
on the separate nodeboards.
If a job is suspended, its processes will be moved to the global cpuset. When the job is
restarted, they are restored.
You can launch MPI jobs on a single Altix running ProPack 2/3. Submit MPI jobs using
SGI’s mpirun.
A Blue Gene system is made up of one service node, one or more front-end nodes, a
shared storage location (referred to a the CWFS -- cluster wide file system), dozens or
hundreds of I/O nodes, thousands of compute nodes, and various networks that keep
everything together. The front-end node, service node, and I/O node run the Linux SUSE
Enterprise 9 OS; the compute node runs a lightweight OS called OSK.
A compute node is made up of 2 CPUs. Each processor supports only 1 thread of execu-
tion per processor; no dynamic creation of processes is allowed. Jobs run on the compute
nodes in one of 2 modes: 1) co-processor mode, where only 1 CPU is used for computa-
tion while the other is used for communication, and the job must fit in 512 MB of mem-
ory; 2) virtual mode, where both cpus can be used for computation where the cpus interact
using a shared, non-cached area of memory (scratchpad). Each process must fit in 256 MB
of memory.
Each compute node is connected to six other nearest neighbors in a torus manner (there's a
wraparound edge linked to similar edge processors). Every compute node is associated
with at least 1 I/O node, which performs all I/O duties. A set of compute nodes and 1 I/O
node make up what is termed a “PSET”.
Each base partition is connected to 3 switches - one for each dimension in (x, y, z). Some
switches can be used as “pass through” switches in mesh configurations. Each network
switch has 6 ports where 2 ports (P0, P1) connect to a midplane, while the other ports
(P2…P5) connect to other switches.
A Blue Gene job contains an executable, its arguments, and owner (one who submitted the
job). It runs exclusively on a 3d, rectangular, contiguous, isolated set of compute nodes
called a partition or bglblock. Valid partition sizes are as follows:
The system administrator must completely partition the system in advance, to accommo-
date users' requirements. PBS will take care of finding these previously-defined partitions
and scheduling jobs on these partitions.
This hierarchy is used for job submission examples later. It looks like this:
All of the functionality of PBS on Linux is the same except preemption and floating PBS
licenses, which are not supported. The suspend/resume feature of PBS jobs is not sup-
ported. Attempts to suspend a PBS job will return "No support for requested service".
The hold/release feature of PBS either through check_abort, or restart action scripts, fore-
grounded or transmogrified, is supported.
The user invokes mpirun in the job script to actually run the executable on some assigned
partition, as specified in environment variable MPIRUN_PARTITION:
#PBS -l select=128:ncpus=2
mpirun <executable> <args>
The user specifies the compute node execution mode, which can be either “co-processor”,
or “virtual node”, via mpirun inside the job script:
#PBS -l select=1024:ncpus=2
mpirun -mode CO -np 1024 <executable> <args>
qsub -l select=512:ncpus=2
mpirun -np 300 -mapfile coord_file <executable> <args>
Based on the example hierarchy, the job is assigned partition R111, but only 300 tasks are
spawned in a 512 compute nodes pool.
Compute nodes can be under-allocated but not over-allocated. mpirun will reject a
request if the number of tasks specified is greater than the available cpus on the assigned
partition.
IBM's mpirun takes care of instantiating a user's executable on the assigned partition.
Example:
qsub -l select=ncpus=512
mpirun -partition R100 my_exec
WARNING: ignoring option -partition
PBS supplies an mpirun that calls the Blue Gene mpirun, translating some arguments.
The user’s script arguments to mpirun are translated as follows:
Details of translation:
If the user specified an mpirun -mode value, then that is what mpirun will use as node
mode value.
Example:
qsub -l select=512:ncpus=2
mpirun -mode VN my_exec -->
mpirun -mode VN -np 512 my_exec
If the user specified an mpirun -np value, then that is what mpirun will use as the number
PBS Professional 9.1 205
User’s Guide
of mpirun tasks to spawn.
Example:
qsub -l select=1024:ncpus=2
mpirun -np 768 my_exec -->
mpirun -mode CO -np 768 my_exec
If the user did not specify a -mode value, mpirun puts in a default of -mode CO for co-pro-
cessor mode.
Example:
qsub -l select=1024:ncpus=2
mpirun -np 1024 my_exec -->
mpirun -mode CO -np 1024 my_exec
An exception to this is if the user requested "ncpus" in the qsub line (e.g. qsub -l
select=1:ncpus=2) along with the mpirun -np specification on the job script. In this case a
default mode of VN for virtual node will be put in by the wrapped mpirun.
Example:
qsub -l select=256:ncpus=1
mpirun -np 160 my_exec translates to -->
mpirun -mode VN 160 my_exec
If mpirun resolves putting in a CO (co-processor) node mode value, and the user did not
specify a -np value, then the value for number of tasks is either
1. the number of compute nodes (X) if the PBS job was submitted as
qsub -l select=X:ncpus=2, or
2. "number of cpus divided by 2" (Y/2) if the PBS job was submitted as
qsub -l select=Y:ncpus=1.
Examples:
qsub -l select=129:ncpus=2
mpirun my_exec -->
mpirun -mode CO -np 129 my_exec
qsub -l select=1024:ncpus=1
mpirun my_exec -->
mpirun -mode CO -np 512 my_exec
206 Chapter 10
Multiprocessor Jobs
qsub -l select=129:ncpus=1
mpirun my_exec -->
mpirun -mode CO -np 64 my_exec
where (129/2) = 64 rounded down
If mpirun resolves putting in a VN (virtual node) node mode value, and the user did not
specify a -np value, then the value for number of tasks is either
1. "number of compute nodes times 2" (X*2) if the user submitted the PBS job as
qsub -l select=X:ncpus=2, or
2. number of cpus if the user submitted the PBS job as
qsub -l select=Y:ncpus=1.
Example:
qsub -l select=129:ncpus=2
mpirun -mode VN my_exec -->
mpirun -mode VN -np 258 my_exec
qsub -l select=1024:ncpus=1
mpirun -mode VN my_exec -->
mpirun -mode VN -np 1024 my_exec
If the user does not specify a -np value or number of CPUs requested, mpirun adds:
-np $MPIRUN_PARTITION_SIZE
The PBS mpirun takes care of attaching a "</dev/null" to the actual mpirun command
line to empty out standard input. This is needed in order for mpirun to run properly
inside a batch job since mpirun tries to read from STDIN causing failures in a batch
environment where there is no tty.
The MOM expects a job attribute called “pset” that is set by the scheduler to the name of
the partition chosen to run the job. This attribute is visible to the user. The format is
pset = partition=<host_short_name>-<partition_identifier>
For instance, if the scheduler chose partition R001 on a machine called BG1, then a run-
ning job would have the following set:
PBS Professional 9.1 207
User’s Guide
pset = partition=BG1-R001
The $PBS_NODEFILE contains the set of vnodes forming a legal partition that has been
assigned to the job (e.g. R001, R010).
Since only user-level checkpointing is available for processes running on the BG/L nodes,
the user is responsible for performing periodic checkpoints on their applications. The
application must be compiled against the Blue Gene checkpoint library, so that the appli-
cation can do its own checkpointing.
All pre-defined partitions (containing midplanes) will uniformly have either “torus” or
“mesh” as their connection type. Therefore, users do not need to specify a connection type
when submitting jobs.
In the job's executing environment, the following environment variables are always set:
MPIRUN_PARTITION=<partition_name>
MPIRUN_PARTITION_SIZE=<# of ncpus>
On a hold request of a running job, the Blue Gene job associated with the PBS job would
be canceled.
The Blue Gene administrator must configure each partition to mount the shared file sys-
tem, otherwise mpirun calls will fail with a “login failed:” message.
Running MPI jobs in Blue Gene depends on the shared location in the cluster wide filesys-
tem (CWFS) that has been set up for a site. This shared location is what is mounted on the
partition as it boots up, and is accessible by the Blue Gene I/O nodes for creation, duplica-
208 Chapter 10
Multiprocessor Jobs
tion of input/output/error files. It is recommended that users create their MPI programs in
such a way that input is read, and output/error files are created under this shared location.
Users submitting jobs on the front end host no longer need to expect PBS to return output/
error files back to this submission host. That is, pbs_mom on the service node should not
be copying files back to the front-end node.
Users are encouraged to submit jobs with the “-koe” option, which is to keep output/error
files on the service node.
Mpitest.c:
#include <stdio.h>
#include <mpi.h>
#include <errno.h>
return 0;
}
Job script:
#PBS -l select=ncpus=1024
#PBS -N midplanejob
#PBS -koe
#PBS -q midplane
#PBS -V
Example 1: Specify 512 nodes; the job will run on one of the 512-node partitions first if
available (R000, R001, R010, R011, R100, R101, R110, R111}:
qsub -l select=512:ncpus=2
qsub -l select=1024:ncpus=1 (512 * 2 cpus)
Example 2: Request 1024 nodes; the job will run on a 1024-node partition first if avail-
able: {R00, R01, R10, R11}:
210 Chapter 10
Multiprocessor Jobs
qsub -l select=1024:ncpus=2
qsub -l select=2048:ncpus=1 (1024 * 2 cpus)
Example 3: Request 2048 nodes; the job will run on one of the 2048-node partitions first if
available: {R0, R1}
qsub -l select=2048:ncpus=2
qsub -l select=4096:ncpus=1 (2048 * 2 cpus)
Example 4: Request the whole system; the job will run on the R_32 partition:
qsub -l select=4096:ncpus=2
qsub -l select=8192:ncpus=1 (4096 * 2 cpus)
Example 5: Request 32 compute nodes; the job will still end up in one of the 128-node
small partitions:
qsub -l select=32:ncpus=2
qsub -l select=64:ncpus=1 (32 * 2 cpus)
Example 6: Request 129 compute nodes; the job will still end up in one of the 512-node
partitions:
qsub -l select=129:ncpus=2
qsub -l select=258:ncpus=1 (129 * 2 cpus)
Example 7: Request 600 compute nodes; the job will end up in the 1024 node partition:
qsub -l select=600:ncpus=2
qsub -l select=1200:ncpus=1 ( 600 * 2 cpus)
Example 8: Request 1500 compute nodes; the job will end up in one of the 2048-node par-
titions:
qsub -l select=1500:ncpus=2
qsub -l select=3000:ncpus=1 (1500 * 2 cpus)
Example 9: Request 2049 compute nodes; the job will have to run on the full system parti-
tion:
qsub -l select=2049:ncpus=2
PBS Professional 9.1 211
User’s Guide
qsub -l select=4098:ncpus=1 (2049 * 2 CPUS)
Example 01: The following request is unsatisfiable since it exceeds the number of com-
pute nodes in a full system partition:
qsub -l select=5000:ncpus=2
qsub -l select=10000:ncpus=1
Example 11: Complex requests (spanning multiple select chunks), though unnecessary,
can be satisfied.
qsub -l select=128:ncpus=2+512:ncpus=2
causes the scheduler to find a partition that can accommodate at least 640 compute nodes.
On a typical system, to execute a Parallel Virtual Machine (PVM) program you can use
the pvmexec command. The pvmexec command expects a “hostfile” argument for the
list of hosts on which to spawn the parallel job.
#PBS -N pvmjob
#
pvmexec a.out -inputfile data_in
To start the PVM daemons on the hosts listed in $PBS_NODEFILE, start the PVM con-
sole on the first host in the list, and print the hosts to the standard output file named “job-
name.o<PBS jobID>, use “echo conf | pvm $PBS_NODEFILE”. To quit the PVM
console but leave the PVM daemons running, use “quit”. To stop the PVM daemons,
restart the PVM console, and quit, use “echo halt | pvm”.
#PBS -N pvmjob
212 Chapter 10
Multiprocessor Jobs
#PBS -V
cd $PBS_O_WORKDIR
Jobs are suspended or checkpointed on the Altix using the PBS suspend and checkpoint.
There is no OS-level checkpoint. Suspended or checkpointed jobs will resume on the
original nodeboards.
Under Irix 6.5 and later, MPI parallel jobs as well as serial jobs can be checkpointed and
restarted on SGI systems provided certain criteria are met. SGI’s checkpoint system call
cannot checkpoint processes that have open sockets. Therefore it is necessary to tell
mpirun to not create or to close an open socket to the array services daemon used to start
the parallel processes. One of two options to mpirun must be used:
-cpr This option directs mpirun to close its connection to the array
services daemon when a checkpoint is to occur.
-miser This option directs mpirun to directly create the parallel process
rather than use the array services. This avoids opening the
socket connection at all.
The -miser option appears the better choice as it avoids the socket in the first place. If
the -cpr option is used, the checkpoint will work, but will be slower because the socket
connection must be closed first. Note that interactive jobs or MPMD jobs (more than one
executable program) cannot be checkpointed in any case. Both use sockets (and TCP/IP)
to communicate, outside of the job for interactive jobs and between programs in the
MPMD case.
PBS does not support cpusets with multi-host jobs on the IRIX.
On IRIX, the cpuset name is the first 8 characters of the job ID. If there is already a cpuset
by that name, the last character in the name is replaced by a,b,c...z,A,...,Z until a unique
PBS Professional 9.1 213
User’s Guide
name is found.
The NEC job feature creates a NEC jobid for each PBS task. This jobid acts as an
inescapable session on a single host. PBS can track MPI processes as long as they
are all on one NEC machine.
There is no vmem resource (NEC SX-8 machines do not use virtual memory.)
The pbs_probe command will work the same except for the following:
No files or directories related to Tcl/Tk will exist.
Permissions for PBS_EXEC and PBS_HOME will have the group write bit set.
214 Chapter 10
Multiprocessor Jobs
PBS Professional 9.1 215
User’s Guide
Appendix A: PBS
Environment Variables
Table 21: PBS Environment Variables
Variable Meaning
Variable Meaning
PBS_O_QUEUE The original queue name to which the job was submitted.
PBS_O_SHELL Value of SHELL from submission environment
PBS_O_SYSTEM The operating system name where qsub was executed.
PBS_O_TZ Value of TZ from submission environment
PBS_O_WORKDIR The absolute path of directory where qsub was executed.
PBS_QUEUE The name of the queue from which the job is executed.
PBS_TASKNUM The task (process) number for the job on this vnode.
TMPDIR The job-specific temporary directory for this job.
PBS Professional 9.1 217
User’s Guide
Appendix B: Converting
From NQS to PBS
For those converting to PBS from NQS or NQE, PBS includes a utility called nqs2pbs
which converts an existing NQS job script so that it will work with PBS. (In fact, the
resulting script will be valid to both NQS and PBS.) The existing script is copied and PBS
directives (“#PBS”) are inserted prior to each NQS directive (either “#QSUB” or “#Q$”)
in the original script.
Important: Converting NQS date specifications to the PBS form may result in a
warning message and an incomplete converted date. PBS does not
support date specifications of “today”, “tomorrow”, or the name of
the days of the week such as “Monday”. If any of these are encoun-
tered in a script, the PBS specification will contain only the time
portion of the NQS specification (i.e. #PBS -a hhmm[.ss]). It
is suggested that you specify the execution time on the qsub com-
mand line rather than in the script. All times are taken as local time.
If any unrecognizable NQS directives are encountered, an error
message is displayed. The new PBS script will be deleted if any
errors occur.
A queue complex in NQS was a grouping of queues within a batch Server. The purpose of
a complex was to provide additional control over resource usage. The advanced schedul-
ing features of PBS eliminate the requirement for queue complexes.
218
Appendix B: Converting From NQS to PBS
PBS Professional 9.1 219
User’s Guide
Appendix C: License
Agreement
Altair Engineering, Inc.
Software License Agreement
This License Agreement is a legal agreement between Altair Engineering, Inc. (“Altair”)
and you (“Licensee”) governing the terms of use of the Altair Software. Before you may
download or use the Software, your consent to the following terms and conditions is
required by clicking on the 'I Accept” button. If you do not have the authority to bind your
organization to these terms and conditions, you must click on the button that states “I do
not accept” and then have an authorized party in your organization consent to these terms.
In the event that your organization and Altair have a master software license agreement,
mutually agreed upon in writing, in place at the time of your execution of this agreement,
the terms of the master agreement shall govern.
Index
Symbols job arrays 164 B
$PBS_NODEFILE 207 Ames Research Center ix base partition 200
API vii, 5, 9, 12, 137 Batch
application licenses job 16
A floating 39
Access Control 5, 147 processing 12
node-locked batch, job 13
Account 11 per-CPU 40
Account_Name 74 bglblock 201
per-host 40 bind a partition 203
Accounting 5, 146 per-use 40
job arrays 167 block 74, 130
arch 30 Blue Gene 200
accounting 148 arrangement 45
accounting_id 77 base partition 200
array 77 compute node 200
ACCT_TMPDIR 148 array_id 77
Administrator 12 error file handling 207
array_index 77, 135 example hierarchy 201
Administrator Guide vii, array_indices_remaining
15, 16 job examples 209
77 midplane 200
Advance Reservation array_indices_submitted
States 145 MPI I/O 207
77 mpirun 202
Aerospace computing 2 array_state_count 77
AIX 175 arguments 204
Attribute supplied by PBS
Large Page Mode 150 account_string 71
alt_id 77 204
defined 12 partitions 201
Altair Engineering 4 priority 6, 67
Altair Grid Technologies ii, PBS_NODEFILE 207
rerunnable 13, 66 pset 206
4 attributes
Altering running a job 207
modifying 117
230 Index