GPUs at ACCRE
GPU types
All GPU types are placed into a single batch_gpu partition for resources where bursting and high throughput is desired, or an interactive_gpu partition for resources where immediate access is desired at the cost of being able to burst. All GPU nodes by default will be placed in the batch_gpu partition.
Accessing GPU Resources in the batch_gpu Partition
GPU resources in the batch_gpu partition are accessed using your group with the _acc suffix, for example if your group is accre_lab then the slurm
account to use is accre_lab_acc. To see which groups have the _acc suffix, run the slurm_resources command:
[bob@gw344 accre]$ slurm_resources Accounts to use for accessing the batch (default) partition: Account ------------------ accre fe_accre_lab accre_guests Accounts and GPU types for accessing the batch_gpu partition: Accounts GPU Type ------------------ ---------------------------- fe_accre_lab_acc nvidia_geforce_rtx_2080_ti fe_accre_lab_acc nvidia_rtx_a6000 fe_accre_lab_acc nvidia_titan_x accre_guests_acc nvidia_geforce_rtx_2080_ti accre_guests_acc nvidia_rtx_a6000 You have access to accelerated GPU resources in the batch_gpu partition. As a usage example, if you wanted to request 2 GPUs of type "nvidia_rtx_a6000" for a job with account "accre_guests_acc" on the partition "batch_gpu", then you would add the following lines to your SLURM script: #SBATCH --account=accre_guests_acc #SBATCH --partition=batch_gpu #SBATCH --gres=gpu:nvidia_rtx_a6000:2 Note that you may not request more GPUs than your account is allowed.
The partition must be set to batch_gpu. This partition includes all GPU resources intended for batch usage (optimized for throughput, not latency, allowing bursting). To select your GPU type, instead of a generic #SBATCH --gres=gpu:4 for 4 A6000 GPUs you must select your specific GRES GPU type, for example:
#SBATCH --gres=gpu:nvidia_rtx_a6000:4. A job submission not specifying GPU type will be rejected.
| GPU Type | Architecture | GPU Memory (GB) | GRES Type | CPU Cores per GPU | Available System Memory per GPU (GB) |
|---|---|---|---|---|---|
| Nvidia H100 NVL | Hopper | 94 | nvidia_h100_nvl | 32 | 561 |
| Nvidia L40S | Ada Lovelace | 48 | nvidia_l40s | 8 | 187 |
| Nvidia A100 (80GB) | Ampere | 80 | nvidia_a100_80gb | 32 | 374 |
| Nvidia RTX A6000 | Ampere | 48 | nvidia_rtx_a6000 | 16 | 124 |
| Nvidia RTX A4000 | Ampere | 16 | nvidia_rtx_a4000 | 4 | 30 |
| Nvidia Quadro RTX 6000 | Turing | 24 | quadro_rtx_6000 | 6 | 92 |
| Nvidia RTX 2080 Ti | Turing | 11 | nvidia_geforce_rtx_2080_ti | 6 | 92 |
| Nvidia Titan X (or Xp) | Pascal | 12 | nvidia_titan_x | 2 | 60 |
Here is an example script for a user in the group accre_lab which has access to A6000 GPUs in the batch_gpu partition which is requesting 3 A6000 GPUs:
#!/bin/bash #SBATCH --mem=8G #SBATCH --cpus-per-task=2 #SBATCH --time=00:30:00 #SBATCH --job-name=r9_gpu_test #SBATCH --account=accre_lab_acc #SBATCH --gres=gpu:nvidia_rtx_a6000:3 #SBATCH --partition=batch_gpu setup_accre_software_stack module load cuda/12.6 python/3.12.4 ./my_analysis_code.py
Accessing GPU Resources in the interactive_gpu Partition
GPU resources in the batch_gpu partition are accessed using your group with the _iacc suffix, for example if your group is accre_lab then the slurm
account to use is accre_lab_iacc. To see which groups have the _iacc suffix, run the slurm_resources command:
[bob@gw344 accre]$ slurm_resources ...snip... Accounts and QOSs for accessing the interactive gpu partition: Account QOS CPU limit Memory Limit (GiB) GPU Type GPU Limit ------------------- ------------------- ---------- ------------------- ------------------ ---------- accre_iacc debug_iacc 24 372.0 nvidia_rtx_a4000 4 accre_iacc fe_accre_lab_iacc 8 60.0 nvidia_rtx_a4000 1 accre_iacc fe_accre_lab_iacc 8 60.0 nvidia_titan_x 1 Use the "qosstate [QOS_NAME]" command to get current usage for the QOS specified by QOS_NAME with a breakdown of usage by user and slurm account. Note that the "debug_iacc" QOS is a special QOS available to all cluster users for quick debugging of slurm scripts requiring a GPU. On this special QOS there are additional restrictions of a maximum wall clock time of 00:30:00 and each user can use a maximum of cpu=6,gres/gpu:nvidia_rtx_a4000=1,gres/gpu=1,mem=93G at one time. As an example, to submit a job to the interactive gpu partition using the account "accre_iacc" and QOS "debug_iacc" you would add the following lines to your slurm script: #SBATCH --account=accre_iacc #SBATCH --partition=interactive_gpu #SBATCH --qos=debug_iacc #SBATCH --gres=gpu:nvidia_rtx_a4000:1
Example of Running a GPU job
For a simple GPU job this example will use PyTorch provided by the Research Alliance of Canada software stack and an Nvidia RTX A4000 GPU which all users are provided short (30 minute) access to via the debug_iacc slurm QoS. An example python script which trains a sequential neural network has been adapted from the example code at Vanderbilt University's Machine Learning Training Facility.
First create the python training script as a file named train_neural_network.py with the following contents:
""" This example trains a sequential neural network Adapted from example code from Vanderbilt University's Machine Learning Training Facility https://docs.mltf.vu """ import torch from torch.utils.data import Dataset from torchvision import datasets from torchvision.transforms import ToTensor from torch.utils.data import DataLoader import torch.nn as nn import torch.nn.functional as F import torch.optim as optim class SeqNet(nn.Module): def __init__(self, input_size, hidden_size1, hidden_size2, output_size): super(SeqNet, self).__init__() self.lin1 = nn.Linear(input_size, hidden_size1) self.lin2 = nn.Linear(hidden_size1, hidden_size2) self.lin3 = nn.Linear(hidden_size2, output_size) def forward(self, x): x = torch.flatten(x,1) x = self.lin1(x) x = F.sigmoid(x) x = self.lin2(x) x = F.log_softmax(x, dim=1) out = self.lin3(x) return out def train(model, train_loader, loss_function, optimizer, num_epochs): # Transfer model to device model.to(device) for epoch in range(num_epochs): running_loss = 0.0 model.train() for i ,(images,labels) in enumerate(train_loader): images = torch.div(images, 255.) # Transfer data tensors to device images, labels = images.to(device), labels.to(device) optimizer.zero_grad() outputs = model(images) loss = loss_function(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() average_loss = running_loss / len(train_loader) print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {average_loss:.4f}') print("Training finished.") input_size = 784 hidden_size1 = 200 hidden_size2 = 200 output_size = 10 num_epochs = 10 batch_size = 100 lr = 0.01 device = torch.device("cuda") print("Training on device: ", torch.cuda.get_device_name(device)) my_net = SeqNet(input_size, hidden_size1, hidden_size2, output_size) my_net = my_net.to(device) optimizer = torch.optim.Adam( my_net.parameters(), lr=lr) loss_function = nn.CrossEntropyLoss() fmnist_train = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor()) fmnist_test = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor()) fmnist_train_loader = DataLoader(fmnist_train, batch_size=batch_size, shuffle=True) fmnist_test_loader = DataLoader(fmnist_test, batch_size=batch_size, shuffle=True) train(my_net, fmnist_train_loader, loss_function, optimizer, num_epochs) correct = 0 total = 0 for images,labels in fmnist_test_loader: images = torch.div(images, 255.) images = images.to(device) labels = labels.to(device) output = my_net(images) _, predicted = torch.max(output,1) correct += (predicted == labels).sum() total += labels.size(0) print('Accuracy of the model: %.3f %%' %((100*correct)/(total+1)))
Now create a file named train_neural_network.slurm with the following contents:
#!/bin/bash
#SBATCH --account=accre_iacc # Change to your group with an "_iacc" suffix
#SBATCH --partition=interactive_gpu
#SBATCH --qos=debug_iacc
#SBATCH --mem=8G
#SBATCH --cpus-per-task=2
#SBATCH --gpus=nvidia_rtx_a4000:1
# Setup python interpreter from the CC software stack
# Note that PyTorch has its own CUDA libraries so we
# do not need to load cuda/12.6
echo -e "Setting up virtual environment for training model...\n\n"
setup_accre_software_stack
ml python/3.12.4 scipy-stack/2025a
# Create a temporary directory on the local storage
# of the compute node and set up a python virtual environment
# and install PyTorch in the virtual environment
source setup_accre_runtime_dir
python -m venv ${ACCRE_RUNTIME_DIR}/venv
source ${ACCRE_RUNTIME_DIR}/venv/bin/activate
pip install torch torchvision
# Run the training script
echo -e "\n\nRunning the training script...\n\n"
python train_neural_network.py
You will need to change the line #SBATCH --account=... to an account that you have access to. Use the slurm_resources command to find a valid slurm account ending in "_iacc" for which you have access to the "debug_iacc" QoS. Note that all ACCRE users should have access to the "debug_iacc" QoS for debugging short jobs on a select set of GPU resources. If desired, you can change the QoS and GPU type to a different model if you are part of a group with access to those resources.
Finally, submit your slurm script with the command sbatch train_neural_network.slurm. The slurm scheduler will return a JobID number. You can check the status of your job with the command squeue -j [JOBID] where JOBID is your job id number. If your job does not run quickly it may be that other users are using all available GPUs for the debug QoS. To check the status of this QoS, use the command qosstate debug_iacc.
When your job is finished, you will see a file of the form slurm-NNN.out where NNN is the Job ID number of your job. The standard output from the model training will be written to this file.