From 35cc251be2e4962bfa1ef02b5089ea0402d367c1 Mon Sep 17 00:00:00 2001 From: Konstantin Gilep <82955438+gilep@users.noreply.github.com> Date: Mon, 14 Oct 2024 13:52:12 +0200 Subject: [PATCH 1/7] Small corrections README.md --- README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index fd295d9a..4bbd74fc 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ * [About AlphaPulldown](#about-alphapulldown) * [Overview](#overview) * [Alphafold databases](#alphafold-databases) -* [Snakemake AlphaPulldown](#snakemake-alphapulldown-) +* [Snakemake AlphaPulldown](#snakemake-alphapulldown) * [1. Installation](#1-installation) * [2. Configuration](#2-configuration) * [3. Execution](#3-execution) @@ -70,7 +70,7 @@ * [Next step](#next-step-6) * [Downstream analysis](#downstream-analysis) * [Jupyter notebook](#jupyter-notebook) - * [Results table](#results-table-) + * [Results table](#results-table) * [Results management scripts](#results-management-scripts) * [Decrease the size of AlphaPulldown output](#decrease-the-size-of-alphapulldown-output) * [Convert Models from PDB Format to ModelCIF Format](#convert-models-from-pdb-format-to-modelcif-format) @@ -83,9 +83,9 @@ # About AlphaPulldown -AlphaPulldown is an implementation of [AlphaFold-Multimer](https://github.com/google-deepmind/alphafold) designed for customizable high-throughput screening of protein-protein interactions. In addition, AlphaPulldown provides additional customizations of AlphaFold, including custom structural multimeric templates (TrueMultimer), MMseqs2 multiple sequence alignment (MSA) and [ColabFold](https://github.com/sokrypton/ColabFold) databases, proteins fragments predictions, and implementation of cross-link mass spec data using [AlphaLink2](https://github.com/Rappsilber-Laboratory/AlphaLink2/tree/main). +AlphaPulldown is a customized implementation of [AlphaFold-Multimer](https://github.com/google-deepmind/alphafold) designed for customizable high-throughput screening of protein-protein interactions. It extends AlphaFold’s capabilities by incorporating additional run options, such as customizable multimeric structural templates (TrueMultimer), [MMseqs2](https://github.com/soedinglab/MMseqs2) multiple sequence alignment (MSA) via [ColabFold](https://github.com/sokrypton/ColabFold) databases, protein fragment predictions, and the ability to incorporate mass spec data as an input using [AlphaLink2](https://github.com/Rappsilber-Laboratory/AlphaLink2/tree/main). -AlphaPulldown can be used in two ways: either by a two-step pipeline made of **python scripts**, which this manual covers, or by a **Snakemake pipeline** as a whole. For details on using the Snakemake pipeline, please refer to the separate GitHub [**repository**](https://github.com/KosinskiLab/AlphaPulldownSnakemake). +AlphaPulldown can be used in two ways: either by a two-step pipeline made of **python scripts**, or by a **Snakemake pipeline** as a whole. For details on using the Snakemake pipeline, please refer to the separate GitHub [**repository**](https://github.com/KosinskiLab/AlphaPulldownSnakemake). ## Overview @@ -179,7 +179,7 @@ alphafold_database/ # Total: ~ 2.2 TB (download: 438 > [!NOTE] -> Uniclust30 is the version of the database generated before 2019, UniRef30 is the one generated after 2019. Please note that AlphaPulldown is using UniRef30_2023_02 by default. This version can be downloaded by [this script](https://github.com/KosinskiLab/alphafold/blob/main/scripts/download_uniref30.sh). Alternatively, please overwrite the default path to the uniref30 database using --uniref30_database_path flag of create_individual_features.py. +> Uniclust30 is the version of the database generated before 2019, UniRef30 is the one generated after 2019. Please note that AlphaPulldown is using UniRef30_2023_02 by default. This version can be downloaded by [this script](https://github.com/KosinskiLab/alphafold/blob/main/scripts/download_uniref30.sh). Alternatively, please overwrite the default path to the uniref30 database using the --uniref30_database_path flag of create_individual_features.py. > [!NOTE] > Since the local installation of all genetic databases is space-consuming, you can alternatively use the [remotely-run MMseqs2 and ColabFold databases](https://github.com/sokrypton/ColabFold). Follow the corresponding [instructions](#13-run-using-mmseqs2-and-colabfold-databases-faster). However, for AlphaPulldown to function, you must download the parameters stored in the `params/` directory of the AlphaFold database. @@ -242,7 +242,7 @@ After responding to these prompts, your Slurm profile named *slurm_noSidecar* fo **Download The Pipeline**: -This will download the version specified by '--tag' of the snakemake pipeline and create the repository AlphaPulldownSnakemake, or any other name you choose. +This will download the version specified by '--tag' of the snakemake pipeline and create the repository AlphaPulldownSnakemake or any other name you choose. ```bash snakedeploy deploy-workflow \ https://github.com/KosinskiLab/AlphaPulldownSnakemake \ @@ -1414,7 +1414,7 @@ create_notebook.py --cutoff=5.0 --output_dir= * `--pae_figsize`: Figsize of pae_plot, default is 50. -This command will generate an `output.ipynb`, which you can open using JupyterLab. JupyterLab is installed with AlphaPulldown via pip. To view the notebook, launch it with: +This command will generate an `output.ipynb`, which you can open using JupyterLab. JupyterLab is installed with AlphaPulldown. To view the notebook, launch it with: ```bash jupyter-lab output.ipynb From c4fcf6dace63a82fe6c6fe7a99a38bfaa550bfd6 Mon Sep 17 00:00:00 2001 From: Konstantin Gilep <82955438+gilep@users.noreply.github.com> Date: Mon, 14 Oct 2024 16:06:01 +0200 Subject: [PATCH 2/7] Update README.md (Features Database p1) --- README.md | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/README.md b/README.md index 4bbd74fc..9f0b1361 100644 --- a/README.md +++ b/README.md @@ -1700,3 +1700,53 @@ ranked_0.zip #### Miscellaneous Options At this time, there is only one option left unexplained: `--compress`. It tells the script to compress ModelCIF files using Gzip. In the case of `--add_associated`, the ModelCIF files in the associated Zip archive are also compressed. + +# Features Database +Alternatively, to generate feature files locally, you can download them from the AlphaPulldown Features Database, which covers proteins from major model organisms. + +## Installation +[MinIO Client](https://min.io/docs/minio/linux/reference/minio-mc.html) (`mc`) needs to be installed to access the Features Database. + +Download the mc binary, make it executable and move to your PATH: + +```bash +curl -O https://dl.min.io/client/mc/release/linux-amd64/mc +chmod +x mc +sudo mv mc /usr/local/bin/ +``` + +Verify `mc` works: + +```bash +mc --help +``` + +## Configuration + +Create an Alias for the Features Database: + +```bash + mc alias set embl https://s3.embl.de "" "" --api S3v4 +``` + +## Download Features +When `mc` is successfully installed and configured you can use it to access the Features Database. The commands are similar to bash commands. + +Check the list of the available organisms: + +```bash + mc ls embl/alphapulldown/input_features +``` + +Every organism directory contains compressed .pkl.xz features files. + +To download the features file for Q6BF25 protein from *E. coli* run: + +```bash +mc cp embl/alphapulldown/input_features/Escherichia_coli/Q6BF25.pkl.xz Q6BF25.pkl.xz +``` + +You can also download all features of the organism proteins by copying the whole organism directory: +``` +mc cp embl/alphapulldown/input_features/Escherichia_coli/Escherichia_coli ./Escherichia_coli +``` From e47e3c4becb98bb5a7aaed335c9f1045915f9a60 Mon Sep 17 00:00:00 2001 From: Konstantin Gilep <82955438+gilep@users.noreply.github.com> Date: Tue, 15 Oct 2024 11:04:26 +0200 Subject: [PATCH 3/7] Update README.md (Features Database p2) --- README.md | 58 +++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 43 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 9f0b1361..9b87ed6e 100644 --- a/README.md +++ b/README.md @@ -1701,13 +1701,21 @@ ranked_0.zip At this time, there is only one option left unexplained: `--compress`. It tells the script to compress ModelCIF files using Gzip. In the case of `--add_associated`, the ModelCIF files in the associated Zip archive are also compressed. +
+ # Features Database -Alternatively, to generate feature files locally, you can download them from the AlphaPulldown Features Database, which covers proteins from major model organisms. -## Installation -[MinIO Client](https://min.io/docs/minio/linux/reference/minio-mc.html) (`mc`) needs to be installed to access the Features Database. +Instead of generating feature files locally, you can download them from the **AlphaPulldown Features Database**, which contains precomputed protein **features for major model organisms**. + +## Installation + +To access the Features Database, you need to install the [MinIO Client](https://min.io/docs/minio/linux/reference/minio-mc.html) (`mc`). + +### Steps: -Download the mc binary, make it executable and move to your PATH: +1. Download the `mc` binary. +2. Make the binary executable. +3. Move it to your `PATH` for system-wide access. ```bash curl -O https://dl.min.io/client/mc/release/linux-amd64/mc @@ -1715,7 +1723,9 @@ chmod +x mc sudo mv mc /usr/local/bin/ ``` -Verify `mc` works: +### Verify installation: + +To ensure `mc` is correctly installed, you can run: ```bash mc --help @@ -1723,30 +1733,48 @@ mc --help ## Configuration -Create an Alias for the Features Database: +Set up an alias for easy access to the AlphaPulldown Features Database hosted at EMBL: ```bash - mc alias set embl https://s3.embl.de "" "" --api S3v4 +mc alias set embl https://s3.embl.de "" "" --api S3v4 ``` -## Download Features -When `mc` is successfully installed and configured you can use it to access the Features Database. The commands are similar to bash commands. +This alias allows you to interact with the Features Database as if it were a local directory. + +## Downloading Features + +Once `mc` is installed and configured, you can start accessing the Features Database. The `mc` commands mimic standard bash commands. + +### List available organisms: -Check the list of the available organisms: +To view the list of available organisms with precomputed feature files, run: ```bash - mc ls embl/alphapulldown/input_features +mc ls embl/alphapulldown/input_features ``` -Every organism directory contains compressed .pkl.xz features files. +Each organism directory contains compressed `.pkl.xz` feature files, named according to their **UniProt ID**. -To download the features file for Q6BF25 protein from *E. coli* run: +### Download specific protein features: + +For example, to download the feature file for the protein with UniProt ID Q6BF25 from *Escherichia coli*, use: ```bash mc cp embl/alphapulldown/input_features/Escherichia_coli/Q6BF25.pkl.xz Q6BF25.pkl.xz ``` -You can also download all features of the organism proteins by copying the whole organism directory: +### Download all features for an organism: + +To download all feature files for proteins from a specific organism, such as *E. coli*, copy the entire directory: + +```bash +mc cp --recursive embl/alphapulldown/input_features/Escherichia_coli/ ./Escherichia_coli/ ``` -mc cp embl/alphapulldown/input_features/Escherichia_coli/Escherichia_coli ./Escherichia_coli + +Alternatively, you can mirror the contents of the organism’s directory, ensuring all files are synced between the source and your local directory: + +```bash +mc mirror embl/alphapulldown/input_features/Escherichia_coli/ Escherichia_coli/ ``` + +This command mirrors the remote directory to your local system, keeping both locations in sync. From 522f25bf31cf76e403499aa9cb20ea1d3c12e7ec Mon Sep 17 00:00:00 2001 From: Konstantin Gilep <82955438+gilep@users.noreply.github.com> Date: Tue, 15 Oct 2024 12:35:06 +0200 Subject: [PATCH 4/7] Update README.md (Features Database p3) --- README.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 9b87ed6e..517e8306 100644 --- a/README.md +++ b/README.md @@ -1709,14 +1709,21 @@ Instead of generating feature files locally, you can download them from the **Al ## Installation +>[!NOTE] +>For EMBL cluster users: +>You can access the directory with generated features files at +>`/g/alphafold/input_features/` + To access the Features Database, you need to install the [MinIO Client](https://min.io/docs/minio/linux/reference/minio-mc.html) (`mc`). ### Steps: -1. Download the `mc` binary. +1. [Download](https://min.io/docs/minio/linux/reference/minio-mc.html#install-mc) the `mc` binary. 2. Make the binary executable. 3. Move it to your `PATH` for system-wide access. +Example for AMD64 architecture: + ```bash curl -O https://dl.min.io/client/mc/release/linux-amd64/mc chmod +x mc From f4471b082d4506a350045957d5f66b7e858ee6b6 Mon Sep 17 00:00:00 2001 From: Konstantin Gilep <82955438+gilep@users.noreply.github.com> Date: Tue, 15 Oct 2024 12:53:51 +0200 Subject: [PATCH 5/7] Update README.md (Install CCP4 and other) --- README.md | 44 +++++++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index 517e8306..f037a770 100644 --- a/README.md +++ b/README.md @@ -184,6 +184,8 @@ alphafold_database/ # Total: ~ 2.2 TB (download: 438 > [!NOTE] > Since the local installation of all genetic databases is space-consuming, you can alternatively use the [remotely-run MMseqs2 and ColabFold databases](https://github.com/sokrypton/ColabFold). Follow the corresponding [instructions](#13-run-using-mmseqs2-and-colabfold-databases-faster). However, for AlphaPulldown to function, you must download the parameters stored in the `params/` directory of the AlphaFold database. +
+
# Snakemake AlphaPulldown @@ -243,6 +245,7 @@ After responding to these prompts, your Slurm profile named *slurm_noSidecar* fo **Download The Pipeline**: This will download the version specified by '--tag' of the snakemake pipeline and create the repository AlphaPulldownSnakemake or any other name you choose. + ```bash snakedeploy deploy-workflow \ https://github.com/KosinskiLab/AlphaPulldownSnakemake \ @@ -250,6 +253,28 @@ snakedeploy deploy-workflow \ --tag 1.4.0 cd AlphaPulldownSnakemake ``` +>[!NOTE] +>If you want to use the latest version from GitHub replace `--tag X.X.X` to `--branch main` + +**Install CCP4 package**: +To install the software needed for [the anaysis step](https://github.com/KosinskiLab/AlphaPulldown?tab=readme-ov-file#3-analysis-and-visualization), please follow these instructions: + +```bash +singularity pull docker://kosinskilab/fold_analysis:latest +singularity build --sandbox fold_analysis_latest.sif +# Download the top one from https://www.ccp4.ac.uk/download/#os=linux +tar xvzf ccp4-9.0.003-linux64.tar.gz +cd ccp4-9 +cp bin/pisa bin/sc /software/ +cp /lib/* /software/lib64/ +singularity build +``` + +Then open `AlphaPulldownSnakemake/config/config.yaml` in a text editor and change the path to the analysis container to: + +```yaml +analysis_container : "/path/to/new_image.sif" +``` ## 2. Configuration @@ -363,25 +388,6 @@ Executing the command above will perform submit the following jobs to the cluste ![Snakemake rulegraph](manuals/dag.png) -For using CCP4 programs to further enrich generated statistics, please follow these instructions: -```bash -singularity pull docker://kosinskilab/fold_analysis:latest -singularity build --sandbox fold_analysis.sif - -# Download the top one from https://www.ccp4.ac.uk/download/#os=linux -cp -r ccp4-9.0.003-linux64.tar.gz /tmp -cd /tmp -tar xvzf ccp4-9.0.003-linux64.tar.gz -cd ccp4-9 -cp bin/pisa bin/sc /software/ -cp /lib/* /software/lib64/ - -singularity build -``` -Then open AlphaPulldownSnakemake/config/config.yaml in a text editor and change the path to the analysis container to: -```yaml -analysis_container : "/path/to/new_image.sif" -```

From 06fbb7a51d3317a170363fc0c096a6342b93d669 Mon Sep 17 00:00:00 2001 From: Konstantin Gilep <82955438+gilep@users.noreply.github.com> Date: Tue, 15 Oct 2024 13:42:02 +0200 Subject: [PATCH 6/7] Update README.md (Table of content, CCP4 move to install) --- README.md | 193 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 109 insertions(+), 84 deletions(-) diff --git a/README.md b/README.md index f037a770..c96aa00e 100644 --- a/README.md +++ b/README.md @@ -4,82 +4,93 @@ ## Table of Contents - -* [AlphaPulldown: Version 2.0.0 (Beta)](#alphapulldown-version-200-beta) - * [Table of Contents](#table-of-contents) -* [About AlphaPulldown](#about-alphapulldown) - * [Overview](#overview) -* [Alphafold databases](#alphafold-databases) -* [Snakemake AlphaPulldown](#snakemake-alphapulldown) - * [1. Installation](#1-installation) - * [2. Configuration](#2-configuration) - * [3. Execution](#3-execution) -* [Run AlphaPulldown Python Command Line Interface](#run-alphapulldown-python-command-line-interface) - * [0. Installation](#0-installation) - * [0.1. Create Anaconda environment](#01-create-anaconda-environment) - * [0.2. Installation using pip](#02-installation-using-pip) - * [0.3. Installation for the Downstream analysis tools](#03-installation-for-the-downstream-analysis-tools) - * [0.4. Installation for cross-link input data by AlphaLink2 (optional!)](#04-installation-for-cross-link-input-data-by-alphalink2-optional) - * [0.5. Installation for developers](#05-installation-for-developers) - * [1. Compute multiple sequence alignment (MSA) and template features (CPU stage)](#1-compute-multiple-sequence-alignment-msa-and-template-features-cpu-stage) - * [1.1. Basic run](#11-basic-run) - * [Input](#input) - * [Script Execution](#script-execution) - * [Output](#output) - * [Next step](#next-step) - * [1.2. Example bash scripts for SLURM (EMBL cluster)](#12-example-bash-scripts-for-slurm-embl-cluster) - * [Input](#input-1) - * [Script Execution](#script-execution-1) - * [Next step](#next-step-1) - * [1.3. Run using MMseqs2 and ColabFold Databases (Faster)](#13-run-using-mmseqs2-and-colabfold-databases-faster) - * [Run MMseqs2 Remotely](#run-mmseqs2-remotely) - * [Output](#output-1) - * [Run MMseqs2 Locally](#run-mmseqs2-locally) - * [Next step](#next-step-2) - * [1.4. Run with custom templates (TrueMultimer)](#14-run-with-custom-templates-truemultimer) - * [Input](#input-2) - * [Script Execution](#script-execution-2) - * [Output](#output-2) - * [Next step](#next-step-3) - * [2. Predict structures (GPU stage)](#2-predict-structures-gpu-stage) - * [2.1. Basic run](#21-basic-run) - * [Input](#input-3) - * [Script Execution: Structure Prediction](#script-execution-structure-prediction) - * [Output](#output-3) - * [Next step](#next-step-4) - * [2.2. Example run with SLURM (EMBL cluster)](#22-example-run-with-slurm-embl-cluster) - * [Input](#input-4) - * [Script Execution](#script-execution-3) - * [Output and the next step](#output-and-the-next-step) - * [2.3. Pulldown mode](#23-pulldown-mode) - * [Multiple inputs "pulldown" mode](#multiple-inputs-pulldown-mode) - * [2.4. All versus All mode](#24-all-versus-all-mode) - * [Output and the next step](#output-and-the-next-step-1) - * [2.5. Run with Custom Templates (TrueMultimer)](#25-run-with-custom-templates-truemultimer) - * [Input](#input-5) - * [Script Execution for TrueMultimer Structure Prediction](#script-execution-for-truemultimer-structure-prediction) - * [Output and the next step](#output-and-the-next-step-2) - * [2.6. Run with crosslinking-data (AlphaLink2)](#26-run-with-crosslinking-data-alphalink2) - * [Input](#input-6) - * [Run with AlphaLink2 prediction via AlphaPulldown](#run-with-alphalink2-prediction-via-alphapulldown) - * [Output and the next step](#output-and-the-next-step-3) - * [3. Analysis and Visualization](#3-analysis-and-visualization) - * [Create Jupyter Notebook](#create-jupyter-notebook) - * [Next step](#next-step-5) - * [Create Results table](#create-results-table) - * [Next step](#next-step-6) -* [Downstream analysis](#downstream-analysis) - * [Jupyter notebook](#jupyter-notebook) - * [Results table](#results-table) - * [Results management scripts](#results-management-scripts) - * [Decrease the size of AlphaPulldown output](#decrease-the-size-of-alphapulldown-output) - * [Convert Models from PDB Format to ModelCIF Format](#convert-models-from-pdb-format-to-modelcif-format) - * [1. Convert all models to separate ModelCIF files](#1-convert-all-models-to-separate-modelcif-files) - * [2. Only convert a specific single model for each complex](#2-only-convert-a-specific-single-model-for-each-complex) - * [3. Have a representative model and keep associated models](#3-have-a-representative-model-and-keep-associated-models) - * [Associated Zip Archives](#associated-zip-archives) - * [Miscellaneous Options](#miscellaneous-options) - + + +- [AlphaPulldown: Version 2.0.0 (Beta)](#alphapulldown-version-200-beta) + * [Table of Contents](#table-of-contents) +- [About AlphaPulldown](#about-alphapulldown) + * [Overview](#overview) +- [Alphafold databases](#alphafold-databases) +- [Snakemake AlphaPulldown ](#snakemake-alphapulldown) + * [1. Installation](#1-installation) + * [2. Configuration](#2-configuration) + * [3. Execution](#3-execution) +- [Run AlphaPulldown Python Command Line Interface](#run-alphapulldown-python-command-line-interface) + * [0. Installation](#0-installation) + + [0.1. Create Anaconda environment](#01-create-anaconda-environment) + + [0.2. Installation using pip](#02-installation-using-pip) + + [0.3. Installation for the Downstream analysis tools](#03-installation-for-the-downstream-analysis-tools) + + [0.4. Installation for cross-link input data by AlphaLink2 (optional!)](#04-installation-for-cross-link-input-data-by-alphalink2-optional) + + [0.5. Installation for developers](#05-installation-for-developers) + * [1. Compute multiple sequence alignment (MSA) and template features (CPU stage)](#1-compute-multiple-sequence-alignment-msa-and-template-features-cpu-stage) + + [1.1. Basic run](#11-basic-run) + - [Input](#input) + - [Script Execution](#script-execution) + - [Output](#output) + - [Next step](#next-step) + + [1.2. Example bash scripts for SLURM (EMBL cluster)](#12-example-bash-scripts-for-slurm-embl-cluster) + - [Input](#input-1) + - [Script Execution](#script-execution-1) + - [Next step](#next-step-1) + + [1.3. Run using MMseqs2 and ColabFold Databases (Faster)](#13-run-using-mmseqs2-and-colabfold-databases-faster) + - [Run MMseqs2 Remotely](#run-mmseqs2-remotely) + - [Output](#output-1) + - [Run MMseqs2 Locally](#run-mmseqs2-locally) + - [Next step](#next-step-2) + + [1.4. Run with custom templates (TrueMultimer)](#14-run-with-custom-templates-truemultimer) + - [Input](#input-2) + - [Script Execution](#script-execution-2) + - [Output](#output-2) + - [Next step](#next-step-3) + * [2. Predict structures (GPU stage)](#2-predict-structures-gpu-stage) + + [2.1. Basic run](#21-basic-run) + - [Input](#input-3) + - [Script Execution: Structure Prediction](#script-execution-structure-prediction) + - [Output](#output-3) + - [Next step](#next-step-4) + + [2.2. Example run with SLURM (EMBL cluster)](#22-example-run-with-slurm-embl-cluster) + - [Input](#input-4) + - [Script Execution](#script-execution-3) + - [Output and the next step](#output-and-the-next-step) + + [2.3. Pulldown mode](#23-pulldown-mode) + - [Multiple inputs "pulldown" mode](#multiple-inputs-pulldown-mode) + + [2.4. All versus All mode](#24-all-versus-all-mode) + - [Output and the next step](#output-and-the-next-step-1) + + [2.5. Run with Custom Templates (TrueMultimer)](#25-run-with-custom-templates-truemultimer) + - [Input](#input-5) + - [Script Execution for TrueMultimer Structure Prediction](#script-execution-for-truemultimer-structure-prediction) + - [Output and the next step](#output-and-the-next-step-2) + + [2.6. Run with crosslinking-data (AlphaLink2)](#26-run-with-crosslinking-data-alphalink2) + - [Input](#input-6) + - [Run with AlphaLink2 prediction via AlphaPulldown](#run-with-alphalink2-prediction-via-alphapulldown) + - [Output and the next step](#output-and-the-next-step-3) + * [3. Analysis and Visualization](#3-analysis-and-visualization) + + [Create Jupyter Notebook](#create-jupyter-notebook) + - [Next step](#next-step-5) + + [Create Results table](#create-results-table) + - [Next step](#next-step-6) +- [Downstream analysis](#downstream-analysis) + * [Jupyter notebook](#jupyter-notebook) + * [Results table ](#results-table) + * [Results management scripts](#results-management-scripts) + + [Decrease the size of AlphaPulldown output](#decrease-the-size-of-alphapulldown-output) + + [Convert Models from PDB Format to ModelCIF Format](#convert-models-from-pdb-format-to-modelcif-format) + - [1. Convert all models to separate ModelCIF files](#1-convert-all-models-to-separate-modelcif-files) + - [2. Only convert a specific single model for each complex](#2-only-convert-a-specific-single-model-for-each-complex) + - [3. Have a representative model and keep associated models](#3-have-a-representative-model-and-keep-associated-models) + - [Associated Zip Archives](#associated-zip-archives) + - [Miscellaneous Options](#miscellaneous-options) +- [Features Database](#features-database) + * [Installation](#installation) + + [Steps:](#steps) + + [Verify installation:](#verify-installation) + * [Configuration](#configuration) + * [Downloading Features](#downloading-features) + + [List available organisms:](#list-available-organisms) + + [Download specific protein features:](#download-specific-protein-features) + + [Download all features for an organism:](#download-all-features-for-an-organism) + + # About AlphaPulldown @@ -435,14 +446,25 @@ pip install -U "jax[cuda12]" ### 0.3. Installation for the Downstream analysis tools -To create the Results table, you need to have [Singularity](https://apptainer.org/admin-docs/master/installation.html) installed. +**Install CCP4 package**: +To install the software needed for [the anaysis step](https://github.com/KosinskiLab/AlphaPulldown?tab=readme-ov-file#3-analysis-and-visualization), please follow these instructions: -Download the singularity image: +```bash +singularity pull docker://kosinskilab/fold_analysis:latest +singularity build --sandbox fold_analysis_latest.sif +# Download the top one from https://www.ccp4.ac.uk/download/#os=linux +tar xvzf ccp4-9.0.003-linux64.tar.gz +cd ccp4-9 +cp bin/pisa bin/sc /software/ +cp /lib/* /software/lib64/ +singularity build +``` -* If your results are from AlphaPulldown prior to version 1.0.0: [alpha-analysis_jax_0.3.sif](https://www.embl-hamburg.de/AlphaPulldown/downloads/alpha-analysis_jax_0.3.sif). -* If your results are from AlphaPulldown with version >=1.0.0: [alpha-analysis_jax_0.4.sif](https://www.embl-hamburg.de/AlphaPulldown/downloads/alpha-analysis_jax_0.4.sif). +Then open `AlphaPulldownSnakemake/config/config.yaml` in a text editor and change the path to the analysis container to: -Chrome users may not be able to download it after clicking the link. If so, please right-click and select "Save link as". +```yaml +analysis_container : "/path/to/new_image.sif" +``` ### 0.4. Installation for cross-link input data by [AlphaLink2](https://github.com/Rappsilber-Laboratory/AlphaLink2/tree/main) (optional!) @@ -526,6 +548,9 @@ Please [add your SSH key to your GitHub account](https://docs.github.com/en/auth ## 1. Compute multiple sequence alignment (MSA) and template features (CPU stage) +>[!Note] +>If you work with proteins from model organisms you can directly download the features files from the [AlphaPulldown Features Database](#features-database) and skip this step. + ### 1.1. Basic run This is a general example of `create_individual_features.py` usage. For information on running specific tasks or parallel execution on a cluster, please refer to the corresponding sections of this chapter. @@ -982,7 +1007,7 @@ source activate AlphaPulldown run_multimer_jobs.py \ --mode=custom \ --monomer_objects_dir= \ - --data_dir= \ + --data_dir= I am running a few minutes late; my previous meeting is running over. --protein_lists= \ --output_path= \ --num_cycle= \ @@ -1432,7 +1457,7 @@ For usage of the Jupyter Notebook, refer to the [Downstream analysis](#downstrea ### Create Results table -Making a CSV table with structural properties and scores requires the download of the singularity image `alpha-analysis.sif`. Please refer to the installation [instruction](#3-installation-for-the-downstream-analysis-step-tools). +Making a CSV table with structural properties and scores requires the download of the singularity image `fold_analysis.sif`. Please refer to the installation [instruction](#03-installation-for-the-downstream-analysis-tools). To execute the singularity image (i.e. the sif file) run: @@ -1440,7 +1465,7 @@ To execute the singularity image (i.e. the sif file) run: singularity exec \ --no-home \ --bind :/mnt \ - /alpha-analysis_jax_0.4.sif \ + /fold_analysis.sif \ run_get_good_pae.sh \ --output_dir=/mnt \ --cutoff=10 From 15533cdff3475e42bfbc8cf3e6d5295ee19a6fbe Mon Sep 17 00:00:00 2001 From: Konstantin Gilep <82955438+gilep@users.noreply.github.com> Date: Tue, 15 Oct 2024 13:47:26 +0200 Subject: [PATCH 7/7] Update README.md (CCP4 installation fix) --- README.md | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index c96aa00e..44a22ef7 100644 --- a/README.md +++ b/README.md @@ -281,16 +281,16 @@ cp /lib/* /software/lib64/ singularity build ``` -Then open `AlphaPulldownSnakemake/config/config.yaml` in a text editor and change the path to the analysis container to: +## 2. Configuration + +Adjust `config/config.yaml` for your particular use case. + +If you want to use CCP4 for analysis, open `config/config.yaml` in a text editor and change the path to the analysis container to: ```yaml analysis_container : "/path/to/new_image.sif" ``` -## 2. Configuration - -Adjust `config/config.yaml` for your particular use case. - **input_files** This variable holds the path to your sample sheet, where each line corresponds to a folding job. For this pipeline we use the following format specification: @@ -460,12 +460,6 @@ cp /lib/* /software/lib64/ singularity build ``` -Then open `AlphaPulldownSnakemake/config/config.yaml` in a text editor and change the path to the analysis container to: - -```yaml -analysis_container : "/path/to/new_image.sif" -``` - ### 0.4. Installation for cross-link input data by [AlphaLink2](https://github.com/Rappsilber-Laboratory/AlphaLink2/tree/main) (optional!) $\text{\color{red}Update the installation manual after resolving the dependency conflict.}$