Using the Batch Queues.

Batch queues are implemented  through the Distributed Queuing System (DQS). At this writing, DQS version 3.2.7 is being used.
  • Setting up a DQS Job
  • What Types of Queues are Available?
  • Common DQS Commands
  • Miscellaneous Information
  • How to Get More Information
  • Setting up a DQS Job

    To run a job on the batch queues you should create a DQS script file for your job (e.g. myjob.dqs). This is simply a script written in tcsh, containing the commands that you want to execute in batch.

    You can include batch options in your script, as shown in the following example.  These options control what sort of information you wish passed to the job, what type of computer to run the job on, and what to do when the job finishes.  While there are a great many options that can be used it is unlikely that anyone in these labs will need to use very many of them.  The batch queuing software recognizes an option when it begins with a "#$ " sequence of characters.  The options are written in the same fashion as unix command line options; They are single letters and are introduced with a minus sign.  You can read about all the available options at qsub.

    Contents of the file "myjob.dqs":
    #!/bin/csh
    #
    #$ -cwd
    #$ -j y
    #
    source /etc/csh.login
    source ~/.login
    #
    (your normal script goes here)
    tnt 1 30
    #
    The "-cwd" option (Okay, the options are not all single letters.  Since when has Unix been consistent.) says that the script should execute in the same directory it was submitted from.  The "-j y" options says that the job should produce a single log file.  Traditionally, Unix separates error messages from normal messages.  These two options are quite useful and probably should be in all DQS scripts.  You could type these options on the command line whenever you submit a script to the batch system, but that would require that you remember to do so.  If you want these options all the time you might as well put them in the script.

    The two "source" commands recreate the environment that you are used to working in.  In all other cases these scripts are executed whenever you log in, but for some reason DQS does not.  If you did not include these statements in your script the csh would not recognize the "tnt" command.

    DQS recognizes an option called "-V" which copies your current environment variables and aliases to the batch job.  While you may think that this option can replace the two "source"d lines in the example, it is not a good idea.  Quite a number of environment variables are specific to the particular computer or computer type you are using.  If you use the "-V" option and your batch job is sent to a computer of a differing type,  your job is likely to fail with very obscure and confusing error messages.

    There is an additional option which controls the name of the log file.  Since a script may be ran many times it is best not to hardwire the name of the log file, least the file be overwritten.  We recommend that this option be specified on the command line when the job is submitted.  The script in the previous example would be submitted with the command
     

    qsub -o myjob.log myjob.dqs

    What Types of Queues are Available?

    When you submit a job to the DQS, it selects a queue on a particular host based on what DQS thinks will get your job done the quickest.  In the unusual case where your job has some special need, you can inform DQS of that need by specifying a required "resource".  These resources indicate what kind of computer your job is able to run on, what software licenses are available, or how much memory is present.  Available queue groups of the first kind are: sgi, linux, and alpha.  We only have two groups which defines software licenses and they are  named sharp and buster.

    The "-l" option (That's a lower case letter L, by the way.) is used to specify a resource.  You can either specify a required resource inside the script or on the command line when you type the qsub command.  Since the requirement for a resource is intrinsic to a job, one should define it inside the script.

    If your job can only run on an SGI computer you should add to your script the line

    #$ -l sgi
    If your job runs the program Sharp it will have to execute on a computer licensed for that software.  This is done by adding the line
    #$ -l sharp
    The other major consideration is the amount of memory your job requires.  Some jobs require a great deal of memory and some of our computers have more memory than others.  Instead of trying to remember which computer has how much memory the batch queuing system knows.  All you have to do is to know how much memory your job requires.  You can find this from reading the program's documentation, asking someone in the lab who has ran the program before, or running the job once and using the top command.

    Once you know how much memory the job needs you can add a line to your script which looks like

    #$ -l mem.gt.800
    This line will cause the job to only run on computers with more than 800 MB of memory.  Since the largest amount of memory in any computer right now is 2000 MB if you ask for more than that the job will wait until we buy a bigger computer.

    You can combine resource requirements on a single line, if there are more than one needed.  For example

    #$ -l mem.lt.400,sgi,sharp
    This line will cause your job to run on a computer which has less than 400 MB of memory, is an SGI, and has a license to run Sharp.

    In the utmost emergency (i.e. you can't think of any other way to do what you want) you can send the job to a particular computer.  This is done with the "#$ -q" option.  If you create a batch job that writes to tape it will have to run on Zinc.  In that case you add to your script the line

    #$ -q zinc
    In the vast majority of cases your job will run properly on any of the computers in our lab.  You should not specify a resource limitation for your job unless that job really needs to run on a particular kind of machine.  In general, the automatic scheduling system is better at placing jobs than you are.

    Common DQS commands

    Summary Examples

    qsub  test.dqs
    qstat -f
    qdel  67
    See the DQS documentation or man pages for a list of all possible options. Generally, options can be included on the command line, or in your command file, as shown in the example in the section on Setting up a DQS Job.

    Discovering queue status -- qstat

    qstat
    qstat -f 
    qstat -f -l sharp
    Typical output from qstat -f -l sharp
    Queue Name      Queue Type    Quan  Load          State
    ----------      ----------    ----  ----          -----
    helium          batch         1/1   0.95  er      UP
      sharp    ingo.Rhai         4798   0:1   r       RUNNING   08/11/99 11:01:37
    hydrogen        batch         0/1   0.00  er      UP
    lithium         batch         0/1   1.02  er      UP
    sodium          batch         1/1   1.03  er      UP
      sharp    ingo.Rhai         5011   0:2   r       RUNNING   08/16/99 11:12:22
    The output shows that there are four queues that accept sharp jobs and two are running. They are on helium and sodiumSodium has an average of 1.03 processes competing for its CPU; hydrogen has no processes active on it. The symbols er indicate that the queues are enabled and running. If the symbols read eru, it means that the queue is in an unknown state, ie, communication has been lost. Notify a the systems administrator. ALARM  in state column indicates that the load on the particular system have exceeded a certain threshold limit, normal queuing will resume automatically when the load diminishes

    Deleting a job -- qdel

    Each DQS job has a unique job-id. The job id is reported when you submit the job, and can be discovered at any time using the command qstat -f. To delete a job, use the command
    qdel jobid

    Miscellaneous Information

    DQS Scheduling Algorithm

    Presently, DQS Version 3.2.7 is implemented using the nice priority scheme. Under this scheme, the queues are set to execute at a nice CPU scheduling priority of 19.

    How to Get More Information

    There is a document in the alcove titled DQS USER GUIDE. It tells you much more than you wanted to know. However, section 2.3 describes all the options in all the commands. There are also man pages available.


    Based on Art Perlo (perlo@csb.yale.edu)
    webmaster@uoxray.uoregon.edu