Clemson Home  >  CCIT HomeSkip NavigationA-Z Index    Calendar    CU Safety    Map    Webcams    Phonebook    

Running your job

Once you have compiled and tested your code on user, you are ready to try it on the palmetto cluster.

The scheduler and resource manager

You will run your job on palmetto through a scheduling system called Maui and resource manager called Torque. The scheduler and resource manager work together to ensure optimal use of the resources along with fair sharing of those resources amongst the users.

Note: Torque is an open source version of PBS therefore we will refer to it as PBS in the remainder of this material.

In a nutshell, you will build a list of instructions for PBS. These instructions are in a "PBS lingo", telling where your code and data are, how many palmetto resources you need, and how you want to view logs and results. You place these instructions in a file and submit that file to a PBS routing queue. PBS will determine the appropriate execution queue for your job based on your instructions. As the term "queue" implies, your job is queued with other jobs in order to share the palmetto resources in a fair and optimal manner. If the resources are available when your job arrives, PBS will launch it immediately. Otherwise your job is held until the resources have become available through completion of some "in line" before it.

The queues

There are two routing queues: test and main. The test queue has 5 nodes and, as the name says, is used for testing. The main queue has all of the rest and is the default queue. main will automatically route the job to the proper execution queue based on number of nodes requested and specified walltime. The starting priority of a job is slightly weighted towards larger node jobs so those jobs will get a chance to fire. MaxRun and MaxQueuable are the numbers of jobs each user can have in each queue at one time. The table below describes the main queue.

0 - 2 hours (quick) 2 - 24 hours (short) 24 - 72 hours (long)
1 - 10 nodes (tiny) Priority = 1000

MaxRun = 30

MaxQueuable = 100
Priority = 500

MaxRun = 30

MaxQueuable = 100
Priority = 250

MaxRun = 30

MaxQueuable = 100
11 - 50 nodes (small) Priority = 1000

MaxRun = 5

MaxQueuable = 50
Priority = 500

MaxRun = 5

MaxQueuable = 50
Priority = 250

MaxRun = 5

MaxQueuable = 50
51 - 256 nodes (medium) Priority = 1500

MaxRun = 2

MaxQueuable = 5
Priority = 1000

MaxRun = 2

MaxQueuable = 5
Priority = 500

MaxRun = 2

MaxQueuable = 5
257 - 763 nodes (large) Priority = 2000

MaxRun = 1

MaxQueuable = 5
Priority = 1500

MaxRun = 1

MaxQueuable = 5
Priority = 1000

MaxRun = 1

MaxQueuable = 5

The queue names are made up by concatenating the parenthetical terms. For example "tiny_long", "medium_short", "large_long". To see the whole list online, enter

qstat -Q

which will show you the names of all queues. From there you can enter the queue name

qstat -Qf <queue_name>

to find out the specifics for that queue. For example

qstat -Qf medium_long

will return the information

Queue: medium_long
    queue_type = Execution
    Priority = 100
    max_user_queuable = 10
    total_jobs = 0
    state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 
    from_route_only = True
    resources_max.nodect = 168
    resources_max.walltime = 72:00:00
    resources_min.nodect = 85
    resources_default.nodes = 1
    resources_default.walltime = 24:30:00
    mtime = 1210792345
    resources_assigned.nodect = 0
    max_user_run = 5
    keep_completed = 30
    enabled = True
    started = True

where we see that this queue has a priority of 100, maximum queuable jobs of 10, maximum nodes of 168 and so forth.

A test run

As mentioned, the first step of running your job is to put your instructions to PBS into something called a PBS script file. Your script file will look like

Serial Job

#PBS -N Pade
#PBS -l nodes=1
#PBS -q test
#PBS -l walltime=01:00:00
#PBS -k oe
#PBS -m abe
source ~/.bashrc
cd ~/atmos3
pade

Note that instructions to PBS all begin with "#PBS". After the instructions to PBS, we include the sequence of Linux commands required to run the code. To explain

#PBS -N Pade

Optional: This defines the name for the job. In this case, the name "Pade" will be shown on all qstat and xpbs displays. (Default: name of the file which contains the PBS script)

#PBS -l nodes=1

Optional: This tells PBS how many nodes you will use. Since this is a serial job, you will request one. (Default: 1)

#PBS -q test

Optional: This tells PBS to which routing queue you want to submit your job. (Default: main)

#PBS -l walltime=01:00:00

Recommended: The maximum wall clock time specified for your job. Here we show the job will run at most 1 hours. The format for walltime is HH:MM:SS. Only two digits can be used for minutes and seconds. The default wall clock time is 30 minutes.

#PBS -l cput=3:00:00,mem=512mb

Optional: This tells PBS specifically how many other resources your job will need. See the Common PBS Directives section below for more resource options. In this example, the job requires 3 hours of CPU time and 0.5 GB of memory. (Default: as listed for the routing queue specified on the #PBS -q directive)

#PBS -k oe

This tells PBS which output to keep; "e" refers to standard error (stderr) and "o" refers to standard out (stdout.) Both of these files will appear in your root directory. In the case of this example, standard error will be in filename Pade.ennnn and standard output will be in Pade.onnnn where nnnn is the four digit identifier given to you by PBS at job submission time. In general, the ennnn and onnnn suffix will be appended to the job name. You can combine the two files into one by using the #PBS -j eo. (Default: keep neither file)

#PBS -m abe

Optional: This instructs PBS to send mail to you when the job begins running (b), when it has stopped running (e), and if it has aborted (a). (Default: send mail only to owner and only in case of job abort)

source ~/.bashrc

This ensures that the environment variables are set for the compiler that you used.

cd ~/atmos3

Optional: Move to the directory which contains the executable. Alternately, you can specify the full path to the executable. (Default: your home/root directory)

pade

Required: Run the serial code, specifying it by name.

To submit the job you would enter:

qsub pade.pbs

where pade.pbs is the name of the file which contains your PBS script. The response you get will be in the form

34945.pbs001.palmetto.clemson.edu

where 34934 is the Job ID (you will use this for tracking the job later) and pbs001 is the name of the scheduling node that determines which compute nodes will be assigned to your job.

You can track your jobs via the qstat command.

qstat -f 34945

will give you complete information about this job like how much cpu time and memory it has used, which nodes it is running on, and much more.

qstat -f -u <myuserid>

will give you complete information about all jobs you have running.

When your job completes, you will find your output (e.g. stdout) in Pade.o34934 and any errors (e.g. stderr) in Pade.e34934 in your root directory.

In the event that your job has gone awry, you can delete it with

qdel 34945

For more information on qstat and qdel, see their respective man pages.

Selecting your nodes

The palmetto cluster is made up of several different node architectures. You can see the nodes by entering

checkproperties

which will show something like

Palmetto node properties:
 
    255 batch,amd,opteron,2356
    252 batch,intel,xeon,e5345
    256 batch,intel,xeon,e5410
      1 test,amd,opteron,2356
      5 test,intel,xeon,e5345
      2 test,intel,xeon,e5410

If you do not request any specific nodes, you will be given the first available nodes as listed in the PBS server's node file. Currently this file is ordered starting at node0001 up through the maximum. For example, the nodes are identified as

node0001-node0257 are xeon 5345
node0258-node0515 are xeon 5410
node0516-node0771 are amd 2356

and will be assigned in that order as available.

You can select particular nodes by including an additional node property on the #PBS -l command. For example

#PBS -l nodes=16:ppn=8

requests 16 of the next available nodes (any architecture) whereas

#PBS -l nodes=16:ppn=8:intel

will specify that they be Intel xeon nodes and

#PBS -l nodes=16:ppn=8:amd

will specify that they be AMD opteron nodes.

Taking it one step further,

#PBS -l nodes=16:ppn=8:e5345

and

#PBS -l nodes=16:ppn=8:e5410

will specify which Intel xeon nodes to use.

PBS scripts for MPI and OpenMP

To complete our examples, we show scripts for running our parallel jobs as well

MPI Job
OpenMP Job
#PBS -N Array-decomp
#PBS -l nodes=2:ppn=8
#PBS -l walltime=0:10:00
#PBS -k oe
#PBS -m abe
source ~/.bashrc
cd ~/array
/usr/bin/mpiexec -n 16 array-decomp
#PBS -N workshare
#PBS -l nodes=1:ppn=8:amd
#PBS -l walltime=0:10:00
#PBS -k oe
#PBS -m abe
source ~/.bashrc
cd ~/work
export OMP_NUM_THREADS=8
workshare

Here we note the use of mpiexec to run the MPI code on 16 (-np 16) processors. In our OpenMP script, we specify the use of 8 processors via the environment variable "OMP_NUM_THREADS" (OpenMP will parallelize only across the processors on the same node.)

(Note: If you're running MPI jobs on Palmetto using the MPICH2 library, we highly recommend that you add the line

export MPD_CON_EXT=ext_${PBS_JOBID}

to your PBS submit script. This should solve some of the mpdboot errors some users have been experiencing.)

Common PBS directives

Directive Description
-d directory Specifies the working directory. (Default: your home directory.)
-h Specifies that a hold will be applied to the job at submission time.
-I Declares that the job will be run interactively.
-j Declares that the standard error stream will be merged with the standard output stream of the job.
-k Defines which (if either) of the standard output or standard error will be retained. The argument keep can take on the following attributes:

e - standard error is retained on the executing host
o - standart output is retained on the executing host
eo - both streams are retained
oe - both streams are retained
n - neither stream is retained

An optional path parameter can be included to direct the output and errors to a different location (as in $PBS -k -eo /home/myuserid/Pade/results).

-l resource_list Defines the resources that are required by the job and establishes a limit on the amount of resource that can be consumed. The resource_list argument
is of the form:

resource_name[=[value]][,resource_name[=[value]],...]

Resources monitored by PBS

nodes The number of nodes required for this run. Units:nodes
ppn The number of processors required per node. Units:processors
architecture The architecture required for this run. Value: intel, amd, e5345 or e5410
walltime Maximum amount of real time during which the job can be in the running state. Units:time
-m Defines a set of conditions under which the execution server will send a mail message about the job. The mail options argument can take on one or more of following arguments:

a - mail is sent when the job is aborted
b - mail is sent when the job begins execution
e - mail is sent when the job terminates
n - no mail is sent
-N name Defines a name for the job.
-q destination Defines a routing queue destination. See the table at the beginning of this section for more information on routing queues.
-v variable_list Expands the set of environment variables that are exported to the job.
-V Declares that all environment variables from the qsub command's environment are to be exported to the batch job.
-W additional_attributes The -W allows for the specification of additional job attributes. This directive is useful for synchronizing concurrent and subsequent jobs. You should consult the PBS External Refererence Specification for a full list of the synchronization directives.

PBS documentation

More material on PBS can be found at

man qsub
man qstat
man qdel
man qalter
man 7 pbs_resources

Introduction to Palmetto I (Seminar slides)
TORQUE Administrator's Manual - 2.1 Job Submission



Maintained by CITI web services                    Copyright ©2008 Clemson University, Clemson, S.C. 29634, (864) 656-331