AlphaFold on ACCRE

From ACCRE Wiki

Overview

AlphaFold is a deep-learning protein folding prediction software developed by the DeepMind team. This documentation will guide you on how to run AlphaFold 2.3.2 on the ACCRE cluster at Vanderbilt University.

Prerequisites

Before running AlphaFold on ACCRE, you need to have:

  • An ACCRE account (See Instructions)
  • Access to CSB GPUs (Check by typing $id in the ACCRE command prompt)
  • A working knowledge of Linux commands and ACCRE’s batch job submission system, SLURM
  • Amino acid sequences in FASTA format of the protein(s) you want to predict the structure of

Instructions

  1. If you do not have an account, you can apply for one at ACCRE
  2. Instructions for requesting a new CSB/Mchaourab account (New Users):
    • Open a web browser to: New CSB User using login: sbrequest password: welcomevsb
    • Fill out the form completely. Request a password that is NOT the same as your Vanderbilt University e-password. Leave the shell as tcsh.
    • Under the Email section fill in item #1 with your Vanderbilt email address, leave #2 blank.
    • For the associated lab, choose ‘Mchaourab’ in the drop down box. If you have an office address and phone # please put that information into the fields. The home phone # is optional.
    • Click ‘Continue’ at bottom.
    • Check your information and once verified, click ‘Submit’ to send this.
  3. After logging into ACCRE, Copy and edit the script below for your run.
    • Set up the input/output data path in the script. Replace /Path/to/your/input/and/output/data with your own input/output data path.
    • Set the path to the input FASTA file. Replace CTD-EF.fasta with the name of your own FASTA file.
    • Make sure to replace the values for the CALCDIR variable, and AF2_MINICONDA with the appropriate paths for your input files and version of AlphaFold you want to use.
  4. Submit the script using SLURM by running the command sbatch <filename>, where <filename> is the name of the script file you created.
  5. While waiting for the job to finish, you can check the status of your job using the command squeue -u <username>. (You are able to log out of ACCRE while you are waiting for you jobs to finish)
  6. The output files will be in the input/output data path you set in the script.

To use AlphaFold 2.3.2 on the ACCRE cluster, you can use the following SBATCH script:

#!/bin/bash --norc

#SBATCH --account=csb_gpu_acc
#SBATCH --partition=turing
#SBATCH --constraint=csbtmp
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --nodes=1
#SBATCH --ntasks=6
#SBATCH --gres=gpu:1
#SBATCH --mem=24G
#SBATCH --time=16:00:00
#SBATCH --job-name=af232-test
#SBATCH --output=af232-test.log

# Set your input/output data path
CALCDIR=/path/to/your/input/and/output/data

# Your input fasta should be in the directory above:
FASTA=CTD-EF.fasta
# Where is the AF2 miniconda environment
AF2_MINICONDA=/sb/apps/alphafold232/miniconda3
# Where is the AF2 Inference data
AF2_DATADIR=/csbtmp/alphafold-data.230
# Where is the AF2 Git?
AF2_REPO=/sb/apps/alphafold232/alphafold

cd $CALCDIR

#Look at the driver and GPUs
nvidia-smi

echo -n "Running on "
echo $SLURM_JOB_NODELIST

# Activate CSB Alphafold2 miniconda environment
source $AF2_MINICONDA/bin/activate af232
export LD_LIBRARY_PATH=$AF2_MINICONDA/envs/af232/lib:$LD_LIBRARY_PATH

python $AF2_REPO/run_alphafold.py \
        --fasta_paths=$FASTA \
        --max_template_date=9999-12-31 \
        --data_dir=$AF2_DATADIR \
        --output_dir=$CALCDIR \
        --uniref90_database_path=$AF2_DATADIR/uniref90/uniref90.fasta \
        --mgnify_database_path=$AF2_DATADIR/mgnify/mgy_clusters_2022_05.fa \
        --uniref30_database_path=$AF2_DATADIR/uniref30/UniRef30_2021_03 \
        --bfd_database_path=$AF2_DATADIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
        --template_mmcif_dir=$AF2_DATADIR/pdb_mmcif/mmcif_files \
        --pdb_seqres_database_path=$AF2_DATADIR/pdb_seqres/pdb_seqres.txt \
        --obsolete_pdbs_path=$AF2_DATADIR/pdb_mmcif/obsolete.dat \
        --uniprot_database_path=$AF2_DATADIR/uniprot/uniprot.fasta \
        --use_gpu_relax

Using the Multimer Feature in AF2 Version 2.3

In version 2.3, AlphaFold introduced a multimer feature for predicting interactions between multiple chains of a protein. To use this feature, include the flag --model_preset=multimer in your script as shown in the SBATCH script above.

Monitoring the Job

You can monitor the status of your job using the squeue command:

squeue -u your_username

Conclusion

That’s it! This guide should have given you a good idea of how to run AlphaFold 2.3.2 on the ACCRE cluster. If you have any questions or run into any issues, feel free to contact the ACCRE support team for assistance.

Documentation

See here for more information on AlphaFold.
AlphaFold2 on GitHub.

AlphaFold Changelog

Version 2.3.2

  • More robust download in Colab with shutil.
  • Ability to only run relax for the best unrelaxed model in run_alphafold.py.
  • Improved documentation for ranked outputs.
  • Removed jax dependency from results pkl.
  • Updated tensorflow to 2.11.0.
  • Documentation improvements for installing aria2c.
  • Case-insensitive logic for `_chem_comp.type` in mmCIF parsing.
  • Enhanced error messages for Colab cell submission order.
  • Corrected type annotations.
  • Upgraded Python to 3.9 in Colab.
  • Improved robustness of masked softmax for bfloat16.
  • Updated pyopenssl in Colab to address cryptography dependency issue.


Version 2.3

  • Improved accuracy of protein structure predictions
  • Added support for predicting membrane protein structures
  • Updated training data and methods

Version 2.2

  • Improved performance and speed of protein structure predictions
  • Added support for predicting protein-ligand complex structures
  • Updated training data and methods

Version 2.1

  • Improved accuracy and reliability of protein structure predictions
  • Added support for predicting disordered protein regions
  • Updated training data and methods

Version 2.0

  • Initial release of AlphaFold 2
  • Revolutionized protein structure prediction with unprecedented accuracy
  • Based on deep learning neural networks and advanced modeling techniques



This document has been developed by the Center for Applied AI in Protein Dynamics.