Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github/workflows		.github/workflows
data		data
helper_scripts		helper_scripts
img		img
install		install
model		model
pdb_utils		pdb_utils
result		result
result_length		result_length
README.md		README.md
residue_constants.py		residue_constants.py
test.sh		test.sh
test_len.sh		test_len.sh
train.sh		train.sh
train_unknown_protein.sh		train_unknown_protein.sh

Repository files navigation

2DIR

2DIR: Predicting protein dynamic structures using two-dimensional infrared spectroscopy, with unknown structures.

Updated

Our latest dataset now contains 51,728 different proteins, all sourced from RCSB and SWISSPROT (AFDB-SWISSPROT). You can find them in the Quick Start section below, it includes the Two-Dimensional Infrared Spectroscopy (2DIR) data and PDB data for all proteins.. You will need to randomly split the dataset into training and test sets yourself.

Requirements

Operating System: Linux (Recommended)
No non-standard hardware is required.

Getting started

To get started using 2DIR, clone the repo:

git clone https://github.com/ZhuLvs/2DIR.git
cd 2DIR

Install Dependencies

conda create -n 2DIR python=3.8
conda activate 2DIR
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

pip install -r ./install/requirements.txt

Quick Start

First, you need to download the training dataset for 2DIR. this link and save it under the data directory.

The PDBFliess dataset link is this link.

Then, you calculate the distance matrix (distance map) between the CA atoms of each residue in the protein PDB structure file. The calculation script can be found in the helper_scripts directory.

You may manually modify the parameters in model/main.py.

Training

Before training begins, the protein residue distance matrix needs to be padded to ensure uniform size, which facilitates model processing, accelerates training, and so on. The padding code can be found in the helper_scripts directory and can be modified as needed.

Known Length Protein

bash train.sh

Unknown Length Protein

bash train_unknown_protein.sh

Inference

After inference is complete, the predicted results need to be trimmed based on the protein length, following the format provided in data/output.txt.

Known Length Protein

bash test.sh

Unknown Length Protein

For proteins with unknown lengths, you need to run model/pre_length.py to predict the protein length, and then refer to the scripts in the helper_scripts directory for trimming and processing.

bash test_len.py

To generate the protein backbone structure from the protein residue distance matrix, please use the gradient descent algorithm available in the PyRosetta protocols, providing the predicted residue distances from the model and constraints from residue_constants.py.

For obtaining the amino acid sequence of unknown proteins, it is recommended to use the backbone structure as input for ProteinMPNN. The model typically converges after around 300 epochs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

2DIR

Updated

Requirements

Getting started

Install Dependencies

Quick Start

Training

Known Length Protein

Unknown Length Protein

Inference

Known Length Protein

Unknown Length Protein

About

Releases

Packages

Languages

ZhuLvs/2DIR

Folders and files

Latest commit

History

Repository files navigation

2DIR

Updated

Requirements

Getting started

Install Dependencies

Quick Start

Training

Known Length Protein

Unknown Length Protein

Inference

Known Length Protein

Unknown Length Protein

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages