| Clemson Home > CCIT Home | Skip Navigation | A-Z Index Calendar CU Safety Map Webcams Phonebook |
Running your job
Once you have compiled and tested your code on user, you are ready to try it on the palmetto cluster.
The scheduler and resource manager
You will run your job on palmetto through a scheduling system called Maui and resource manager called Torque. The scheduler and resource manager work together to ensure optimal use of the resources along with fair sharing of those resources amongst the users.
Note: Torque is an open source version of PBS therefore we will refer to it as PBS in the remainder of this material.
In a nutshell, you will build a list of instructions for PBS. These instructions are in a "PBS lingo", telling where your code and data are, how many palmetto resources you need, and how you want to view logs and results. You place these instructions in a file and submit that file to a PBS routing queue. PBS will determine the appropriate execution queue for your job based on your instructions. As the term "queue" implies, your job is queued with other jobs in order to share the palmetto resources in a fair and optimal manner. If the resources are available when your job arrives, PBS will launch it immediately. Otherwise your job is held until the resources have become available through completion of some "in line" before it.
There are two routing queues: test and main. The test queue has 5 nodes and, as the name says, is used for testing. The main queue has all of the rest and is the default queue. main will automatically route the job to the proper execution queue based on number of nodes requested and specified walltime. The starting priority of a job is slightly weighted towards larger node jobs so those jobs will get a chance to fire. MaxRun and MaxQueuable are the numbers of jobs each user can have in each queue at one time. The table below describes the main queue.
| 0 - 2 hours (quick) | 2 - 24 hours (short) | 24 - 72 hours (long) | |
| 1 - 10 nodes (tiny) | Priority = 1000 MaxRun = 30 MaxQueuable = 100 |
Priority = 500 MaxRun = 30 MaxQueuable = 100 |
Priority = 250 MaxRun = 30 MaxQueuable = 100 |
| 11 - 50 nodes (small) | Priority = 1000 MaxRun = 5 MaxQueuable = 50 |
Priority = 500 MaxRun = 5 MaxQueuable = 50 |
Priority = 250 MaxRun = 5 MaxQueuable = 50 |
51 - 256 nodes (medium) | Priority = 1500 MaxRun = 2 MaxQueuable = 5 |
Priority = 1000 MaxRun = 2 MaxQueuable = 5 |
Priority = 500 MaxRun = 2 MaxQueuable = 5 |
| 257 - 763 nodes (large) | Priority = 2000 MaxRun = 1 MaxQueuable = 5 |
Priority = 1500 MaxRun = 1 MaxQueuable = 5 |
Priority = 1000 MaxRun = 1 MaxQueuable = 5 |
The queue names are made up by concatenating the parenthetical terms. For example "tiny_long", "medium_short", "large_long". To see the whole list online, enter
qstat -Q
which will show you the names of all queues. From there you can enter the queue name
qstat -Qf <queue_name>
to find out the specifics for that queue. For example
qstat -Qf medium_long
will return the information
Queue: medium_long
queue_type = Execution
Priority = 100
max_user_queuable = 10
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0
from_route_only = True
resources_max.nodect = 168
resources_max.walltime = 72:00:00
resources_min.nodect = 85
resources_default.nodes = 1
resources_default.walltime = 24:30:00
mtime = 1210792345
resources_assigned.nodect = 0
max_user_run = 5
keep_completed = 30
enabled = True
started = True
where we see that this queue has a priority of 100, maximum queuable jobs of 10, maximum nodes of 168 and so forth.
As mentioned, the first step of running your job is to put your instructions to PBS into something called a PBS script file. Your script file will look like
Serial Job
#PBS -N Pade #PBS -l nodes=1 #PBS -q test #PBS -l walltime=01:00:00 #PBS -k oe #PBS -m abe source ~/.bashrc cd ~/atmos3 pade
Note that instructions to PBS all begin with "#PBS". After the instructions to PBS, we include the sequence of Linux commands required to run the code. To explain
#PBS -N Pade
Optional: This defines the name for the job. In this case, the name "Pade" will be shown on all qstat and xpbs displays. (Default: name of the file which contains the PBS script)
#PBS -l nodes=1
Optional: This tells PBS how many nodes you will use. Since this is a serial job, you will request one. (Default: 1)
#PBS -q test
Optional: This tells PBS to which routing queue you want to submit your job. (Default: main)
#PBS -l walltime=01:00:00
Recommended: The maximum wall clock time specified for your job. Here we show the job will run at most 1 hours. The format for walltime is HH:MM:SS. Only two digits can be used for minutes and seconds. The default wall clock time is 30 minutes.
#PBS -l cput=3:00:00,mem=512mb
Optional: This tells PBS specifically how many other resources your job will need. See the Common PBS Directives section below for more resource options. In this example, the job requires 3 hours of CPU time and 0.5 GB of memory. (Default: as listed for the routing queue specified on the #PBS -q directive)
#PBS -k oe
This tells PBS which output to keep; "e" refers to standard error (stderr) and "o" refers to standard out (stdout.) Both of these files will appear in your root directory. In the case of this example, standard error will be in filename Pade.ennnn and standard output will be in Pade.onnnn where nnnn is the four digit identifier given to you by PBS at job submission time. In general, the ennnn and onnnn suffix will be appended to the job name. You can combine the two files into one by using the #PBS -j eo. (Default: keep neither file)
#PBS -m abe
Optional: This instructs PBS to send mail to you when the job begins running (b), when it has stopped running (e), and if it has aborted (a). (Default: send mail only to owner and only in case of job abort)
source ~/.bashrc
This ensures that the environment variables are set for the compiler that you used.
cd ~/atmos3
Optional: Move to the directory which contains the executable. Alternately, you can specify the full path to the executable. (Default: your home/root directory)
pade
Required: Run the serial code, specifying it by name.
To submit the job you would enter:
qsub pade.pbs
where pade.pbs is the name of the file which contains your PBS script. The response you get will be in the form
34945.pbs001.palmetto.clemson.edu
where 34934 is the Job ID (you will use this for tracking the job later) and pbs001 is the name of the scheduling node that determines which compute nodes will be assigned to your job.
You can track your jobs via the qstat command.
qstat -f 34945
will give you complete information about this job like how much cpu time and memory it has used, which nodes it is running on, and much more.
qstat -f -u <myuserid>
will give you complete information about all jobs you have running.
When your job completes, you will find your output (e.g. stdout) in Pade.o34934 and any errors (e.g. stderr) in Pade.e34934 in your root directory.
In the event that your job has gone awry, you can delete it with
qdel 34945
For more information on qstat and qdel, see their respective man pages.
The palmetto cluster is made up of several different node architectures. You can see the nodes by entering
checkproperties
which will show something like
Palmetto node properties:
255 batch,amd,opteron,2356
252 batch,intel,xeon,e5345
256 batch,intel,xeon,e5410
1 test,amd,opteron,2356
5 test,intel,xeon,e5345
2 test,intel,xeon,e5410
If you do not request any specific nodes, you will be given the first available nodes as listed in the PBS server's node file. Currently this file is ordered starting at node0001 up through the maximum. For example, the nodes are identified as
node0001-node0257 are xeon 5345
node0258-node0515 are xeon 5410
node0516-node0771 are amd 2356
and will be assigned in that order as available.
You can select particular nodes by including an additional node property on the #PBS -l command. For example
#PBS -l nodes=16:ppn=8
requests 16 of the next available nodes (any architecture) whereas
#PBS -l nodes=16:ppn=8:intel
will specify that they be Intel xeon nodes and
#PBS -l nodes=16:ppn=8:amd
will specify that they be AMD opteron nodes.
Taking it one step further,
#PBS -l nodes=16:ppn=8:e5345
and
#PBS -l nodes=16:ppn=8:e5410
will specify which Intel xeon nodes to use.
PBS scripts for MPI and OpenMP
To complete our examples, we show scripts for running our parallel jobs as well
MPI Job OpenMP Job #PBS -N Array-decomp #PBS -l nodes=2:ppn=8 #PBS -l walltime=0:10:00 #PBS -k oe #PBS -m abe source ~/.bashrc cd ~/array /usr/bin/mpiexec -n 16 array-decomp #PBS -N workshare #PBS -l nodes=1:ppn=8:amd #PBS -l walltime=0:10:00 #PBS -k oe #PBS -m abe source ~/.bashrc cd ~/work export OMP_NUM_THREADS=8 workshare
Here we note the use of mpiexec to run the MPI code on 16 (-np 16) processors. In our OpenMP script, we specify the use of 8 processors via the environment variable "OMP_NUM_THREADS" (OpenMP will parallelize only across the processors on the same node.)
(Note: If you're running MPI jobs on Palmetto using the MPICH2 library, we highly recommend that you add the line
export MPD_CON_EXT=ext_${PBS_JOBID}
to your PBS submit script. This should solve some of the mpdboot errors some users have been experiencing.)
Directive Description -d directory Specifies the working directory. (Default: your home directory.) -h Specifies that a hold will be applied to the job at submission time. -I Declares that the job will be run interactively. -j Declares that the standard error stream will be merged with the standard output stream of the job. -k Defines which (if either) of the standard output or standard error will be retained. The argument keep can take on the following attributes:
e - standard error is retained on the executing host o - standart output is retained on the executing host eo - both streams are retained oe - both streams are retained n - neither stream is retained An optional path parameter can be included to direct the output and errors to a different location (as in $PBS -k -eo /home/myuserid/Pade/results).
-l resource_list Defines the resources that are required by the job and establishes a limit on the amount of resource that can be consumed. The resource_list argument
is of the form:resource_name[=[value]][,resource_name[=[value]],...]
Resources monitored by PBS
nodes The number of nodes required for this run. Units:nodes ppn The number of processors required per node. Units:processors architecture The architecture required for this run. Value: intel, amd, e5345 or e5410 walltime Maximum amount of real time during which the job can be in the running state. Units:time -m Defines a set of conditions under which the execution server will send a mail message about the job. The mail options argument can take on one or more of following arguments:
a - mail is sent when the job is aborted b - mail is sent when the job begins execution e - mail is sent when the job terminates n - no mail is sent -N name Defines a name for the job. -q destination Defines a routing queue destination. See the table at the beginning of this section for more information on routing queues. -v variable_list Expands the set of environment variables that are exported to the job. -V Declares that all environment variables from the qsub command's environment are to be exported to the batch job. -W additional_attributes The -W allows for the specification of additional job attributes. This directive is useful for synchronizing concurrent and subsequent jobs. You should consult the PBS External Refererence Specification for a full list of the synchronization directives.
More material on PBS can be found at
man qsub
man qstat
man qdel
man qalter
man 7 pbs_resources
Introduction to Palmetto I (Seminar slides)
TORQUE Administrator's Manual - 2.1 Job Submission
- Login to post comments
