AlphaFold on ACCRE
Overview
AlphaFold is a deep-learning protein folding prediction software developed by the DeepMind team. This documentation will guide you on how to run AlphaFold 2.3.2 on the ACCRE cluster at Vanderbilt University.
Prerequisites
Before running AlphaFold on ACCRE, you need to have:
- An ACCRE account (See Instructions)
- Access to CSB GPUs (Check by typing
$idin the ACCRE command prompt) - A working knowledge of Linux commands and ACCRE’s batch job submission system, SLURM
- Amino acid sequences in FASTA format of the protein(s) you want to predict the structure of
Instructions
- If you do not have an account, you can apply for one at ACCRE
- Instructions for requesting a new CSB/Mchaourab account (New Users):
- Open a web browser to: New CSB User using login:
sbrequestpassword:welcomevsb - Fill out the form completely. Request a password that is NOT the same as your Vanderbilt University e-password. Leave the shell as tcsh.
- Under the Email section fill in item #1 with your Vanderbilt email address, leave #2 blank.
- For the associated lab, choose ‘Mchaourab’ in the drop down box. If you have an office address and phone # please put that information into the fields. The home phone # is optional.
- Click ‘Continue’ at bottom.
- Check your information and once verified, click ‘Submit’ to send this.
- Open a web browser to: New CSB User using login:
- After logging into ACCRE, Copy and edit the script below for your run.
- Set up the input/output data path in the script. Replace /Path/to/your/input/and/output/data with your own input/output data path.
- Set the path to the input FASTA file. Replace
CTD-EF.fastawith the name of your own FASTA file. - Make sure to replace the values for the CALCDIR variable, and AF2_MINICONDA with the appropriate paths for your input files and version of AlphaFold you want to use.
- Submit the script using SLURM by running the command
sbatch <filename>, where<filename>is the name of the script file you created. - While waiting for the job to finish, you can check the status of your job using the command
squeue -u <username>. (You are able to log out of ACCRE while you are waiting for you jobs to finish) - The output files will be in the input/output data path you set in the script.
To use AlphaFold 2.3.2 on the ACCRE cluster, you can use the following SBATCH script:
#!/bin/bash --norc
#SBATCH --account=csb_gpu_acc
#SBATCH --partition=batch_gpu
#SBATCH --constraint=csbtmp
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --nodes=1
#SBATCH --ntasks=6
#SBATCH --gres=gpu:nvidia_rtx_a6000:1
#SBATCH --mem=24G
#SBATCH --time=16:00:00
#SBATCH --job-name=af232-test
#SBATCH --output=af232-test.log
# Set your input/output data path
CALCDIR=/path/to/your/input/and/output/data
# Your input fasta should be in the directory above:
FASTA=CTD-EF.fasta
# Where is the AF2 miniconda environment
AF2_MINICONDA=/sb/apps/alphafold232/miniconda3
# Where is the AF2 Inference data
AF2_DATADIR=/sb/apps/alphafold-data.230
# Where is the AF2 Git?
AF2_REPO=/sb/apps/alphafold232/alphafold
cd $CALCDIR
#Look at the driver and GPUs
nvidia-smi
echo -n "Running on "
echo $SLURM_JOB_NODELIST
# Activate CSB Alphafold2 miniconda environment
source $AF2_MINICONDA/bin/activate af232
export LD_LIBRARY_PATH=$AF2_MINICONDA/envs/af232/lib:$LD_LIBRARY_PATH
python $AF2_REPO/run_alphafold.py \
--fasta_paths=$FASTA \
--max_template_date=9999-12-31 \
--data_dir=$AF2_DATADIR \
--output_dir=$CALCDIR \
--uniref90_database_path=$AF2_DATADIR/uniref90/uniref90.fasta \
--mgnify_database_path=$AF2_DATADIR/mgnify/mgy_clusters_2022_05.fa \
--uniref30_database_path=$AF2_DATADIR/uniref30/UniRef30_2021_03 \
--bfd_database_path=$AF2_DATADIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--template_mmcif_dir=$AF2_DATADIR/pdb_mmcif/mmcif_files \
--pdb_seqres_database_path=$AF2_DATADIR/pdb_seqres/pdb_seqres.txt \
--obsolete_pdbs_path=$AF2_DATADIR/pdb_mmcif/obsolete.dat \
--uniprot_database_path=$AF2_DATADIR/uniprot/uniprot.fasta \
--use_gpu_relax
Using the Multimer Feature in AF2 Version 2.3
In version 2.3, AlphaFold introduced a multimer feature for predicting interactions between multiple chains of a protein. To use this feature, include the flag --model_preset=multimer in your script as shown in the SBATCH script above.
Monitoring the Job
You can monitor the status of your job using the squeue command:
squeue -u your_username
Conclusion
That’s it! This guide should have given you a good idea of how to run AlphaFold 2.3.2 on the ACCRE cluster. If you have any questions or run into any issues, feel free to contact the ACCRE support team for assistance.
Documentation
See here for more information on AlphaFold.
AlphaFold2 on GitHub.
AlphaFold Changelog
Version 2.3.2
- More robust download in Colab with shutil.
- Ability to only run relax for the best unrelaxed model in run_alphafold.py.
- Improved documentation for ranked outputs.
- Removed jax dependency from results pkl.
- Updated tensorflow to 2.11.0.
- Documentation improvements for installing aria2c.
- Case-insensitive logic for `_chem_comp.type` in mmCIF parsing.
- Enhanced error messages for Colab cell submission order.
- Corrected type annotations.
- Upgraded Python to 3.9 in Colab.
- Improved robustness of masked softmax for bfloat16.
- Updated pyopenssl in Colab to address cryptography dependency issue.
Version 2.3
- Improved accuracy of protein structure predictions
- Added support for predicting membrane protein structures
- Updated training data and methods
Version 2.2
- Improved performance and speed of protein structure predictions
- Added support for predicting protein-ligand complex structures
- Updated training data and methods
Version 2.1
- Improved accuracy and reliability of protein structure predictions
- Added support for predicting disordered protein regions
- Updated training data and methods
Version 2.0
- Initial release of AlphaFold 2
- Revolutionized protein structure prediction with unprecedented accuracy
- Based on deep learning neural networks and advanced modeling techniques
AlphaFold 3 on ACCRE
AlphaFold 3 is the next-generation structure prediction tool developed by DeepMind, supporting complex modeling tasks such as protein-ligand and protein-nucleic acid interactions. AlphaFold 3 uses a JSON-based input system and requires containerized environments for deployment. This section explains how to run AlphaFold 3 on ACCRE using Singularity and Alphafold3 only support compute capability 8.0 or greater, that is A100s or greater.
For a detailed guide from Alliance Canada, see their documentation: AlphaFold 3 Documentation (Alliance CAN)
Prerequisites
You can obtain the model by requesting it from Google. They aim to respond to requests within 2-3 business days. Please see Obtaining Model Parameters.
Getting Started
Set up a directory for your AlphaFold 3 environment and pull the container:
mkdir alphafold3 cd alphafold3 singularity pull docker://brandonsoubasis/alphafold3:latest
This will create the `alphafold3_latest.sif` Singularity container in your working directory.
Submitting a Job on ACCRE
Below is an example SBATCH script you can modify and submit using the `sbatch` command:
#!/bin/bash
#SBATCH --job-name=alphafold3
#SBATCH --account=mchaourab_acc
#SBATCH --partition=batch_gpu
#SBATCH --gres=gpu:nvidia_a100_80gb:2
#SBATCH --output=/home/USERNAME/alphafold3/run.log
#SBATCH --time=12:00:00
#SBATCH --mem=400G
#SBATCH --cpus-per-task=32
setup_accre_software_stack
# Load CUDA
ml cuda/12.6
# Define paths
BASE_INPUT_DIR="/home/USERNAME/alphafold3"
OUTPUT_DIR="/path/to/your/output/data"
SINGULARITY_IMAGE="/home/USERNAME/alphafold3/alphafold3_latest.sif"
DB_DIR="/data/mchaourab/alphafold3-data" # Or your custom DB path
MODEL_DIR="/path/to/model/parameters/alphafold3_models"
# Input JSON
INPUT_JSON="/path/to/your/input/INPUT.json"
INPUT_NAME=$(basename "$INPUT_JSON" .json)
INPUT_OUTPUT_DIR="$OUTPUT_DIR/$INPUT_NAME"
echo "Running AlphaFold 3 for: $INPUT_JSON"
mkdir -p "$INPUT_OUTPUT_DIR"
# Run AlphaFold 3
singularity exec --nv \
--bind "$INPUT_JSON":/root/af_input/fold_input.json \
--bind "$INPUT_OUTPUT_DIR":/root/af_output \
--bind "$MODEL_DIR":/root/models \
--bind "$DB_DIR":/root/public_databases \
"$SINGULARITY_IMAGE" \
python3 /app/alphafold/run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--model_dir=/root/models \
--db_dir=/root/public_databases \
--output_dir=/root/af_output
Notes
- Input to AlphaFold 3 must be a JSON file. You can create these based on examples from the see inputs.
- Databases should be downloaded ahead of time or linked to the `/root/public_databases` mount as shown above.
- Model weights must be downloaded and made available to `/root/models`.
- Unlike AlphaFold 2, AlphaFold 3 runs only one structure per job.
- see outputs
- see performance
Monitoring Jobs
Monitor your job status with:
squeue -u your_username
Troubleshooting
If you encounter issues with Singularity or data paths, ensure your mounts and file paths are correctly defined. Contact ACCRE support if you're unsure.
References
This document has been developed by the Center for Applied AI in Protein Dynamics.