IT-Service Faculty of Physics
print

Language Selection

Breadcrumb Navigation


Content

Use of the computing cluster at the CIP pool

Simple Linux Utility for Resource Management (SLURM) is installed in the CIP-Pool. It is a powerful scheduler, which allows you to use the work stations in the CIP as computing clusters. For the users, this means that batch jobs can be send to this cluster and be monitored there.

Writing jobs

Technically, a job is about an easily feasible file. This can be a bash script as well as a normal binary code. However, bash scripts are to be favoured, because directives can be transferred to the scheduler in them.

Such a script may look like this:

#! /bin/bash
#
# SLURM options
#
# name of the job in the queue ist "MyJob"
#SBATCH --job-name MyJob
#
# brief comment, so that the admins know, what your job is particularly doing
#SBATCH --comment "No one will read this anyway..."
#
# the job should only start at 8 pm after the CIP is colsed
#SBATCH --begin 20:00
#
# limit for the demanded CPU time: 12 hours
#SBATCH --time=12:00:00
# ...
#
# requesting the main programs
echo "Hello World!"
sleep 30

As a matter of course, SLURM also supports the usage of MPI-libraries for the interprocess communication.

Starting jobs

The interaction with the CIP cluster is carried out via user commands of the SLURM module. This means, before anything can be done with the cluster, the SLURM module must be loaded for the user shell. This can be done by using the following command.

module load slurm

Only now the SLURM commands like sbatch(1), scancel(1), sinfo(1), squeue(1),...  and the appendant man pages will be available. The current shell is now prepared for the interaction with the cluster.

Are the desired options for the job written in a bash script, like suggested above, commands can be reduced.
E.g. the dispatch to a queue can be reduced to the command:

Ben.Utzer@cipXY:~$ sbatch myjob.sh
Submitted batch job 3

 If it concerns a single binary, the command sbatch has to be directly followed by the options:

sbatch -J MyJob ... ./JobBinary

The results of the calculation will be written down in the current working directory of the job. SLURM will compile the file "slurm-%jobid.out", where "%jobid" will be replaced by the jobs' corresponding ID of squeue. In the example above, a file called "slurm-3.out" is compiled in the home directory;
with the content:

Hello World!

Monitoring your jobs

For checking on the state of the cluster, simply use the command sinfo:

Ben.Utzer@cipXY:~$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
cip         up   infinite     1 alloc cip2-1
cip         up   infinite    24  idle cip[51-56],cip2-[2-19]

It can be noticed that the job of the example from above only uses the resources of one computer (cip2-1). The exacte state of the queue and the state of your own job can be examined via squeue:

Ben.Utzer@cipXY:~$ squeue
JOBID PARTITION     NAME      USER ST TIME NODES NODELIST(REASON)
    3       cip myjob.sh Ben.Utzer  R 0:17     1 cip2-1

The output of this command also provides the job ID of your job, which can be used to address the job with SLURM commands. For example via scancel, optional signals can be send to the processes of the job

scancel --signal=SIGNAL_NAME %jobid

or be right interrupted:

scancel %jobid

 

Further information

Detailed information on commands and programs can be found on the man pages, for example

man squeue

or in the official documentation.