Use of the computing cluster at the CIP pool
Simple Linux Utility for Resource Management (SLURM) is installed in the CIP-Pool. It is a powerful scheduler, which allows you to use the work stations in the CIP as computing clusters. For the users, this means that batch jobs can be send to this cluster and be monitored there.
Technically, a job is about an easily feasible file. This can be a bash script as well as a normal binary code. However, bash scripts are to be favoured, because directives can be transferred to the scheduler in them.
Such a script may look like this:
# SLURM options
# name of the job in the queue ist "MyJob"
#SBATCH --job-name MyJob
# brief comment, so that the admins know, what your job is particularly doing
#SBATCH --comment "No one will read this anyway..."
# the job should only start at 8 pm after the CIP is colsed
#SBATCH --begin 20:00
# limit for the demanded CPU time: 12 hours
# requesting the main programs
echo "Hello World!"
As a matter of course, SLURM also supports the usage of MPI-libraries for the interprocess communication.
The interaction with the CIP cluster is carried out via user commands of the SLURM module. This means, before anything can be done with the cluster, the SLURM module must be loaded for the user shell. This can be done by using the following command.
module load slurm
Are the desired options for the job written in a bash script, like suggested above, commands can be reduced.
E.g. the dispatch to a queue can be reduced to the command:
Ben.Utzer@cipXY:~$ sbatch myjob.sh
Submitted batch job 3
If it concerns a single binary, the command sbatch has to be directly followed by the options:
sbatch -J MyJob ... ./JobBinary
The results of the calculation will be written down in the current working directory of the job. SLURM will compile the file "slurm-%jobid.out", where "%jobid" will be replaced by the jobs' corresponding ID of squeue. In the example above, a file called "slurm-3.out" is compiled in the home directory;
with the content:
Monitoring your jobs
For checking on the state of the cluster, simply use the command sinfo:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
cip up infinite 1 alloc cip2-1
cip up infinite 24 idle cip[51-56],cip2-[2-19]
It can be noticed that the job of the example from above only uses the resources of one computer (cip2-1). The exacte state of the queue and the state of your own job can be examined via squeue:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3 cip myjob.sh Ben.Utzer R 0:17 1 cip2-1
The output of this command also provides the job ID of your job, which can be used to address the job with SLURM commands. For example via scancel, optional signals can be send to the processes of the job
scancel --signal=SIGNAL_NAME %jobid
or be right interrupted:
Detailed information on commands and programs can be found on the man pages, for example
or in the official documentation.