AlphaFold on ACCRE

From ACCRE Wiki

Overview

AlphaFold is a deep-learning protein folding prediction software developed by the DeepMind team. This documentation will guide you on how to run AlphaFold 2.3.2 on the ACCRE cluster at Vanderbilt University.

Prerequisites

Before running AlphaFold on ACCRE, you need to have:

  • An ACCRE account (See Instructions)
  • Access to CSB GPUs (Check by typing $id in the ACCRE command prompt)
  • A working knowledge of Linux commands and ACCRE’s batch job submission system, SLURM
  • Amino acid sequences in FASTA format of the protein(s) you want to predict the structure of

Instructions

  1. If you do not have an account, you can apply for one at ACCRE
  2. Instructions for requesting a new CSB/Mchaourab account (New Users):
    • Open a web browser to: New CSB User using login: sbrequest password: welcomevsb
    • Fill out the form completely. Request a password that is NOT the same as your Vanderbilt University e-password. Leave the shell as tcsh.
    • Under the Email section fill in item #1 with your Vanderbilt email address, leave #2 blank.
    • For the associated lab, choose ‘Mchaourab’ in the drop down box. If you have an office address and phone # please put that information into the fields. The home phone # is optional.
    • Click ‘Continue’ at bottom.
    • Check your information and once verified, click ‘Submit’ to send this.
  3. After logging into ACCRE, Copy and edit the script below for your run.
    • Set up the input/output data path in the script. Replace /Path/to/your/input/and/output/data with your own input/output data path.
    • Set the path to the input FASTA file. Replace CTD-EF.fasta with the name of your own FASTA file.
    • Make sure to replace the values for the CALCDIR variable, and AF2_MINICONDA with the appropriate paths for your input files and version of AlphaFold you want to use.
  4. Submit the script using SLURM by running the command sbatch <filename>, where <filename> is the name of the script file you created.
  5. While waiting for the job to finish, you can check the status of your job using the command squeue -u <username>. (You are able to log out of ACCRE while you are waiting for you jobs to finish)
  6. The output files will be in the input/output data path you set in the script.

To use AlphaFold 2.3.2 on the ACCRE cluster, you can use the following SBATCH script:

#!/bin/bash --norc

#SBATCH --account=csb_gpu_acc
#SBATCH --partition=batch_gpu
#SBATCH --constraint=csbtmp
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --nodes=1
#SBATCH --ntasks=6
#SBATCH --gres=gpu:nvidia_rtx_a6000:1
#SBATCH --mem=24G
#SBATCH --time=16:00:00
#SBATCH --job-name=af232-test
#SBATCH --output=af232-test.log

# Set your input/output data path
CALCDIR=/path/to/your/input/and/output/data

# Your input fasta should be in the directory above:
FASTA=CTD-EF.fasta
# Where is the AF2 miniconda environment
AF2_MINICONDA=/sb/apps/alphafold232/miniconda3
# Where is the AF2 Inference data
AF2_DATADIR=/sb/apps/alphafold-data.230
# Where is the AF2 Git?
AF2_REPO=/sb/apps/alphafold232/alphafold

cd $CALCDIR

#Look at the driver and GPUs
nvidia-smi

echo -n "Running on "
echo $SLURM_JOB_NODELIST

# Activate CSB Alphafold2 miniconda environment
source $AF2_MINICONDA/bin/activate af232
export LD_LIBRARY_PATH=$AF2_MINICONDA/envs/af232/lib:$LD_LIBRARY_PATH

python $AF2_REPO/run_alphafold.py \
        --fasta_paths=$FASTA \
        --max_template_date=9999-12-31 \
        --data_dir=$AF2_DATADIR \
        --output_dir=$CALCDIR \
        --uniref90_database_path=$AF2_DATADIR/uniref90/uniref90.fasta \
        --mgnify_database_path=$AF2_DATADIR/mgnify/mgy_clusters_2022_05.fa \
        --uniref30_database_path=$AF2_DATADIR/uniref30/UniRef30_2021_03 \
        --bfd_database_path=$AF2_DATADIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
        --template_mmcif_dir=$AF2_DATADIR/pdb_mmcif/mmcif_files \
        --pdb_seqres_database_path=$AF2_DATADIR/pdb_seqres/pdb_seqres.txt \
        --obsolete_pdbs_path=$AF2_DATADIR/pdb_mmcif/obsolete.dat \
        --uniprot_database_path=$AF2_DATADIR/uniprot/uniprot.fasta \
        --use_gpu_relax

Using the Multimer Feature in AF2 Version 2.3

In version 2.3, AlphaFold introduced a multimer feature for predicting interactions between multiple chains of a protein. To use this feature, include the flag --model_preset=multimer in your script as shown in the SBATCH script above.

Monitoring the Job

You can monitor the status of your job using the squeue command:

squeue -u your_username

Conclusion

That’s it! This guide should have given you a good idea of how to run AlphaFold 2.3.2 on the ACCRE cluster. If you have any questions or run into any issues, feel free to contact the ACCRE support team for assistance.

Documentation

See here for more information on AlphaFold.
AlphaFold2 on GitHub.

AlphaFold Changelog

Version 2.3.2

  • More robust download in Colab with shutil.
  • Ability to only run relax for the best unrelaxed model in run_alphafold.py.
  • Improved documentation for ranked outputs.
  • Removed jax dependency from results pkl.
  • Updated tensorflow to 2.11.0.
  • Documentation improvements for installing aria2c.
  • Case-insensitive logic for `_chem_comp.type` in mmCIF parsing.
  • Enhanced error messages for Colab cell submission order.
  • Corrected type annotations.
  • Upgraded Python to 3.9 in Colab.
  • Improved robustness of masked softmax for bfloat16.
  • Updated pyopenssl in Colab to address cryptography dependency issue.


Version 2.3

  • Improved accuracy of protein structure predictions
  • Added support for predicting membrane protein structures
  • Updated training data and methods

Version 2.2

  • Improved performance and speed of protein structure predictions
  • Added support for predicting protein-ligand complex structures
  • Updated training data and methods

Version 2.1

  • Improved accuracy and reliability of protein structure predictions
  • Added support for predicting disordered protein regions
  • Updated training data and methods

Version 2.0

  • Initial release of AlphaFold 2
  • Revolutionized protein structure prediction with unprecedented accuracy
  • Based on deep learning neural networks and advanced modeling techniques

AlphaFold 3 on ACCRE

AlphaFold 3 is the next-generation structure prediction tool developed by DeepMind, supporting complex modeling tasks such as protein-ligand and protein-nucleic acid interactions. AlphaFold 3 uses a JSON-based input system and requires containerized environments for deployment. This section explains how to run AlphaFold 3 on ACCRE using Singularity and Alphafold3 only support compute capability 8.0 or greater, that is A100s or greater.

For a detailed guide from Alliance Canada, see their documentation: AlphaFold 3 Documentation (Alliance CAN)

Prerequisites

You can obtain the model by requesting it from Google. They aim to respond to requests within 2-3 business days. Please see Obtaining Model Parameters.

Getting Started

Set up a directory for your AlphaFold 3 environment and pull the container:

mkdir alphafold3
cd alphafold3
singularity pull docker://brandonsoubasis/alphafold3:latest

This will create the `alphafold3_latest.sif` Singularity container in your working directory.

Submitting a Job on ACCRE

Below is an example SBATCH script you can modify and submit using the `sbatch` command:

#!/bin/bash

#SBATCH --job-name=alphafold3
#SBATCH --account=mchaourab_acc
#SBATCH --partition=batch_gpu
#SBATCH --gres=gpu:nvidia_a100_80gb:2
#SBATCH --output=/home/USERNAME/alphafold3/run.log
#SBATCH --time=12:00:00
#SBATCH --mem=400G
#SBATCH --cpus-per-task=32

setup_accre_software_stack

# Load CUDA
ml cuda/12.6

# Define paths
BASE_INPUT_DIR="/home/USERNAME/alphafold3"
OUTPUT_DIR="/path/to/your/output/data"
SINGULARITY_IMAGE="/home/USERNAME/alphafold3/alphafold3_latest.sif"
DB_DIR="/data/mchaourab/alphafold3-data"  # Or your custom DB path
MODEL_DIR="/path/to/model/parameters/alphafold3_models"

# Input JSON
INPUT_JSON="/path/to/your/input/INPUT.json"
INPUT_NAME=$(basename "$INPUT_JSON" .json)
INPUT_OUTPUT_DIR="$OUTPUT_DIR/$INPUT_NAME"

echo "Running AlphaFold 3 for: $INPUT_JSON"
mkdir -p "$INPUT_OUTPUT_DIR"

# Run AlphaFold 3
singularity exec --nv \
  --bind "$INPUT_JSON":/root/af_input/fold_input.json \
  --bind "$INPUT_OUTPUT_DIR":/root/af_output \
  --bind "$MODEL_DIR":/root/models \
  --bind "$DB_DIR":/root/public_databases \
  "$SINGULARITY_IMAGE" \
  python3 /app/alphafold/run_alphafold.py \
    --json_path=/root/af_input/fold_input.json \
    --model_dir=/root/models \
    --db_dir=/root/public_databases \
    --output_dir=/root/af_output

Notes

  • Input to AlphaFold 3 must be a JSON file. You can create these based on examples from the see inputs.
  • Databases should be downloaded ahead of time or linked to the `/root/public_databases` mount as shown above.
  • Model weights must be downloaded and made available to `/root/models`.
  • Unlike AlphaFold 2, AlphaFold 3 runs only one structure per job.
  • see outputs
  • see performance

Monitoring Jobs

Monitor your job status with:

squeue -u your_username

Troubleshooting

If you encounter issues with Singularity or data paths, ensure your mounts and file paths are correctly defined. Contact ACCRE support if you're unsure.

References


This document has been developed by the Center for Applied AI in Protein Dynamics.