Parallel Processing and Job Arrays
Introduction
Below are example SLURM scripts for jobs employing parallel processing. In general, parallel jobs can be separated into four categories:
- Distributed memory programs that include explicit support for message passing between processes (e.g. MPI). These processes execute across multiple CPU cores and/or nodes.
- Multithreaded programs that include explicit support for shared memory processing via multiple threads of execution (e.g. Posix Threads or OpenMP) running across multiple CPU cores.
- Embarrassingly parallel analysis in which multiple instances of the same program execute on multiple data files simultaneously, with each instance running independently from others on its own allocated resources (i.e. CPUs and memory). SLURM job arrays offer a simple mechanism for achieving this.
- GPU (graphics processing unit) programs including explicit support for offloading to the device via languages like CUDA or OpenCL.
It is important to understand the capabilities and limitations of an application in order to fully leverage the parallel processing options available on the ACCRE cluster. For instance, many popular scientific computing languages like Python , R , and Matlab now offer packages that allow for GPU or multithreaded processing, especially for matrix and vector operations.
MPI Jobs
Jobs running MPI (Message Passing Interface) code require special attention within SLURM. SLURM allocates and launches MPI jobs differently depending on the version of MPI used (e.g. OpenMPI or Intel MPI). We recommend using the most recent version of OpenMPI or Intel MPI available through Lmod to compile code and then using SLURM’s srun
command to launch parallel MPI jobs. The example below runs MPI code compiled by GCC+ OpenMPI:
#!/bin/bash #SBATCH --mail-user=myemail@vanderbilt.edu #SBATCH --mail-type=ALL #SBATCH --nodes=3 #SBATCH --tasks-per-node=8 # 8 MPI processes per node #SBATCH --time=7-00:00:00 #SBATCH --mem=4G # 4 GB RAM per node #SBATCH --output=mpi_job_slurm.log module load GCC OpenMPI echo $SLURM_JOB_NODELIST srun ./test # srun is SLURM's version of mpirun/mpiexec
This example requests 3 nodes and 8 tasks (i.e. processes) per node, for a total of 24 MPI tasks. By default, SLURM allocates 1 CPU core per process, so this job will run across 24 CPU cores. Note that srun
accepts many of the same arguments as mpirun / mpiexec (e.g. -n <number cpus>) but also allows increased flexibility for task affinity, memory, and many other features. Type man srun
for a list of options.
More information about running MPI jobs within SLURM can be found here here: http://slurm.schedmd.com/mpi_guide.html
Feel free to open a help desk ticket if you require assistance with your MPI job.
srun
This command is used to launch a parallel job step. Typically, srun
is invoked from a SLURM job script to launch a MPI job (much in the same way that mpirun
or mpiexec
are used). Please note that your application must include MPI code in order to run in parallel across multiple CPU cores using srun
. Invoking srun
on a non-MPI command or executable will result in this program being independently run X times on each of the CPU cores in the allocation.
Alternatively, srun
can be run directly from the command line on a gateway, in which case srun
will first create a resource allocation for running the parallel job. The -n [CPU_CORES]
option is passed to specify the number of CPU cores for launching the parallel job step. For example, running the following command from the command line will obtain an allocation consisting of 16 CPU cores and then run the command hostname
across these cores:
srun -n 16 hostname
For more information about srun
see: http://www.schedmd.com/slurmdocs/srun.html
Multithreaded Jobs
Multithreaded programs are applications that are able to execute in parallel across multiple CPU cores within a single node using a shared memory execution model. In general, a multithreaded application uses a single process (i.e. “task” in SLURM) which then spawns multiple threads of execution. By default, SLURM allocates 1 CPU core per task. In order to make use of multiple CPU cores in a multithreaded program, one must include the --cpus-per-task
option.Below is an example of a multithreaded program requesting 4 CPU cores per task. The program itself is responsible for spawning the appropriate number of threads.
#!/bin/bash #SBATCH --mail-user=myemail@vanderbilt.edu #SBATCH --mail-type=ALL #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 # 4 threads per task #SBATCH --time=02:00:00 # two hours #SBATCH --mem=4G #SBATCH --output=multithread.out #SBATCH --job-name=multithreaded_example # Load the most recent version of GCC available through Lmod module load GCC # Run multi-threaded application ./hello
Job Arrays
Job arrays are useful for submitting and managing a large number of similar jobs. As an example, job arrays are convenient if a user wishes to run the same analysis on 100 different files. SLURM provides job array environment variables that allow multiple versions of input files to be easily referenced. In the example below , three input files called vectorization_0.py
, vectorization_1.py
, and vectorization_2.py
are used as input for three independent Python jobs:
#!/bin/bash #SBATCH --mail-user=myemail@vanderbilt.edu #SBATCH --mail-type=ALL #SBATCH --ntasks=1 #SBATCH --time=2:00:00 #SBATCH --mem=2G #SBATCH --array=0-2 #SBATCH --output=python_array_job_slurm_%A_%a.out echo "SLURM_JOBID: " $SLURM_JOBID echo "SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID echo "SLURM_ARRAY_JOB_ID: " $SLURM_ARRAY_JOB_ID # Load Anaconda distribution of Python module load Anaconda2 python vectorization_${SLURM_ARRAY_TASK_ID}.py
The #SBATCH --array=0-2
line specifies the array size (3) and array indices (0, 1, and 2). These indices are referenced through the SLURM_ARRAY_TASK_ID
environment variable in the final line of the SLURM batch script to independently analyze the three input files. Each Python instance will receive its own resource allocation; in this case, each instance is allocated 1 CPU core (and 1 node), 2 hours of wall time, and 2 GB of RAM.
The --array=
option is flexible in terms of the index range and stride length. For instance, --array=0-10:2
would give indices of 0, 2, 4, 6, 8, and 10.
The %A
and %a
variables provide a method for directing standard output to separate files. %A
references the SLURM_ARRAY_JOB_ID
while %a
references SLURM_ARRAY_TASK_ID
. SLURM treats job ID information for job arrays in the following way: each task within the array has the same SLURM_ARRAY_JOB_ID
, and its own unique SLURM_JOBID
and SLURM_ARRAY_TASK_ID
. The JOBID shown from squeue
is formatted by SLURM_ARRAY_JOB_ID
followed by an underscore and the SLURM_ARRAY_TASK_ID
.
While the previous example provides a relatively simple method for running analyses in parallel, it can at times be inconvenient to rename files so that they may be easy indexed from within a job array. The following example provides a method for analyzing files with arbitrary file names, provided they are all stored in a sub-directory named data :
#!/bin/bash #SBATCH --mail-user=myemail@vanderbilt.edu #SBATCH --mail-type=ALL #SBATCH --ntasks=1 #SBATCH --time=2:00:00 #SBATCH --mem=2G #SBATCH --array=1-5 # In this example we have 5 files to analyze #SBATCH --output=python_array_job_slurm_%A_%a.out arrayfile=`ls data/ | awk -v line=$SLURM_ARRAY_TASK_ID '{if (NR == line) print $0}'` module load Anaconda2 # load Anaconda distribution of Python python data/$arrayfile
More information can be found here: http://slurm.schedmd.com/job_array.html