Using the DQS Batch Queues.

Batch queues are implemented  through the Distributed Queueing System (DQS). At this writing, DQS version 3.2.7 is being used.

The Distributed Queueing System (DQS) is used to run non-interactive jobs in the UOxray Computing Laboratory.

  • What Types of Queues are Available?
  • Setting up a DQS Job
  • Common DQS Commands
  • Miscellaneous Information
  • How to Get More Information
  • What Types of Queues are Available?

    When you submit a job to the DQS, it selects a queue on a particular host based on what you tell it about the resources your job needs.
    These resources indicate what kind of computer your job is able to run on.  Available queue groups are: r10k, sgi, alpha and sharp.
    Specify r10k to run on an R1000 SGI; specify sgi to run on any SGI; specify alpha for a DEC Alpha running Digital Unix.
    To submit a job named test.com to a queue on any SGI computer, you type:

    qsub -l sgi test.com

    For other examples, see the section, Common DQS Commands.

    It is important to correctly specify the resource(s) that you need. For example, if you type, qsub -l DEC ... instead of qsub -l alpha ... your job will wait for a queue with the resource DEC to become available. Since no such queues exist, your job will never be started.
     

    Setting up a DQS Job

    There are three methods for submitting a job to the queue:


    1. Create a DQS script file for your job (myjob.dqs). This is simply a script for your favorite shell, containing the commands that you want to execute in batch. Unless you specify the -cwd option, your job will be run from your home directory, so define your file paths accordingly.

    a) You can include qsub command line options in your script, as shown in the following example:

    Notice: If you use this script directly, remove the comments (#  text .....) before submitting the job.

    The following script would be submitted as: qsub myjob.dqs
    #!/bin/csh 
    #$ -cwd         # set working directory to where you qsub'd from
    # -m bea        # send mail when job begins, ends or aborts
    #$ -N myjob     # job name prefix
    #$ -j y         # join standard output (-o) and standard error (-e) in one file (-o)
    #$ -o myjob.log # direct std output to myjob.log instead of the default 
    #$ -e myjob.err # direct error output (stderr) to the file myjob.err instead of the default
    #$ -l r10k      # run the job on a queue on the sgi R1000 host with the lowest current load average
    #$ -V           # Specifies that all environmental variables be  exported  to the context of the job.
    or 
    #$ -v PATH      # Export only these environmental variables with the job 
    (your normal script goes here)
    /usr/local/tnt/bin/tnt 1 30
    b) Or on the command line:
    The following script would be submitted as: qsub -cwd -N myjob -j y -o myjob.log -l r10k -V myjob.dqs
    #!/bin/csh
    (your normal script goes here)
    /usr/local/tnt/bin/tnt 1 30
    
    
    2. By redirecting the standard input STDIN to the queue (<< EOF)
    qsub -cwd -N myjob -j y -o myjob.log -l r10k -V <<EOF
    (your normal script goes here)
    /usr/local/tnt/bin/tnt 1 30
    EOF

    Common DQS commands

    Summary Examples qsub  -cwd -l alpha test.com

    qstat -f -l sgi
    qdel  67
    See the DQS documentation or man pages for a list of all possible options. Generally, options can be included on the command line, or in your command file, as shown in the example in the section on Setting up a DQS Job.

    Discovering queue status -- qstat

    qstat
    qstat -f 
    qstat -f -l sharp
    Typical output from qstat -f -l sharp
    Queue Name      Queue Type    Quan  Load          State
    ----------      ----------    ----  ----          -----
    helium_2        batch         1/1   0.95  er      UP
      sharp    ingo.Rhai         4798   0:1   r       RUNNING   08/11/99 11:01:37
    hydrogen_2      batch         0/1   0.00  er      UP
    lithium_2       batch         0/1   1.02  er      UP
    sodium_2        batch         1/1   1.03  er      UP
      sharp    ingo.Rhai         5011   0:2   r       RUNNING   08/16/99 11:12:22
    The output shows that there are four queues that accept sharp jobs and two are running. They are on helium and sodiumSodium has an average of 1.03 processes competing for its CPU; hydrogen has no processes active on it. The symbols er indicate that the queues are enabled and running. If the symbols read eru, it means that the queue is in an unknown state, ie, communication has been lost. Notify a the systems administrator. ALARM  in state column indicates that the load on the particular system have exceeded a certain threshold limit, normal queueing will resume automatically when the load diminishes
     

    Deleting a job -- qdel

    Each DQS job has a unique job-id. The job id is reported when you submit the job, and can be discovered at any time using the command qstat -f. To delete a job, use the command
    qdel jobid

    Miscellaneous Information

    DQS Initialization Issues

    When DQS starts your job, it initially uses sh instead of csh or tcsh.  Therefore, it does not execute the system-wide install.csh file. That means that all aliases that are define in this file are lost and commands must be explicitly specified i.e. use /usr/local/tnt/bin/tin 1 30 instead of tnt 1 30.

    DQS Scheduling Algorithm

    Presently, DQS Version 3.2.7 is implemented using the nice priority scheme. Under this scheme, the queues are set to execute at a nice CPU scheduling priority of 19.

    How to Get More Information

    There is a document in the alcove titled DQS USER GUIDE. It tells you much more than you wanted to know. However, section 2.3 describes all the options in all the commands. There are also man pages available.


    Based on Art Perlo (perlo@csb.yale.edu)
    dale@uoxray.uoregon.edu