Search

# Alphafold 2

Alphafold 2 allows users to predict the 3-D structure of arbitrary proteins. It was published in Nature (Jumper et al. 2021).

We have implemented the use of alphafold on the campus cluster through the use of singularity and some scripts to help run the container for your particular files. It works best when using gpus to help with the computation.

##### Preparing to run

Here we will show how to load the modules for it, create a submission scripts, and submit the job.

To get started you should get the fasta file you want to run against.  If you want to just try it out, you can grab a fasta file from public sources.

Next  load the environment modules to put the software in your path:

This examples assumes you have the fasta file in  a direcotry in your home directory called fasta_files. It also assumes you are writing the output files to the scratch dir. You can make the directories like this:

mkdir -p ~/fasta_files

mkdir -p /central/scratch/$USER/alphafold/out There is an example fasta file at /central/software/alphafold/examples which can be copied to your fasta_files direcotry: cp /central/software/alphafold/examples/rcsb_pdb_3DMW-EDS.fasta ~/fasta_files/. Next you will want to create a submission script. There is an example file available as well which you can copy to your home diretory: cp /central/software/alphafold/examples/alphafold.sub ~/. ##### The Submission Script We will go through the script line by line so you understand what it is doing. The script will always start like a normal shell script would. typically calling bash: #!/bin/bash Any line that starts with #SBATCH is an instruction to the scheduler. This tells the scheduler what resources you need and can set various things. These options will be superceded by anything sent on the command line when submitting Next we give the job a name which will show up in the scheduler: #SBATCH --job-name=alphafold_run Then we will say how long we want it to run, The job will be killed when it reaches this length. We will start with the maximum time of 7 days, but when you are more comfortable with job runtimes you may want to drop this to a reasonable time. Setting a more realistic time will help keep jobs that are doing the wrong thing from incurring additional costs and will also let you jobs get through the queue faster since it may be able to fit into a backfill slot #SBATCH --time=7-00:00 The next lines are all about the resources you job will use. In this case we will be using a single node, but not allowing other jobs to run on it (exclusive). It will use one task but that task can use 28 cores. We are requesting 4 gpus on the node and 32G #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --gres=gpu:4 # You need to request one GPU to be able to run AlphaFold properly #SBATCH --exclusive #SBATCH --cpus-per-task=28 # adjust this if you are using parallel commands #SBATCH --mem=32G # adjust this according to the memory requirement per node you need The next two lines are about having the schedule keep you informed about when the job starts and ends. Make sure to put your actual email address in. You can also not set these if you prefer to not be emailed. #SBATCH --mail-user=$USER@caltech.edu

#SBATCH --mail-type=ALL

Next we get to what will actually run on the compute node when it runs.

First we will set some variables on where your input files are, where to put the output files, and where the alphafold data directories are:

INPUT_DIR=/home/$USER/fasta_files/ OUTPUT_DIR=/central/scratch/$USER/alphafold/out

Next we will load the modules.  This is in case you forgot to load them before.

We create the ouput direcotry if you hadn;t already, the change directories to the alphafold directory:

mkdir -p $OUTPUT_DIR The we run the wrapper script for alphafold which will launch the alphafold container via singularity and pass it the options you set and use some defaults that weren't set by the end user. We use the time command at the beginning just to know how long the process took. time run_alphafold_sing.sh -o$OUTPUT_DIR -m model_1 -f $INPUT_DIR/rcsb_pdb_3DMW-EDS.fasta -t 2020-05-14 -d$DOWNLOAD_DIR