Skip to content

Commit

Permalink
Preparing to open the repository to public
Browse files Browse the repository at this point in the history
  • Loading branch information
khb7840 committed Oct 23, 2024
1 parent 705fe16 commit e6e3146
Show file tree
Hide file tree
Showing 17 changed files with 28,446 additions and 0 deletions.
71 changes: 71 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,77 @@
Folddisco is a bioinformatics tool for indexing and searching
for discontinous motifs in protein structures.

## Command
### Installation
```bash
# Default
cargo install --features foldcomp --path .
```

### Indexing
```bash
# default for small databases
folddisco index -p <PDB_DIR|FOLDCOMP_DB> -i <INDEX_PATH> -t <THREADS>
# For large databases (number of structures > 65536). This will generate a 8GB fixed-size offset file
folddisco index -p <PDB_DIR|FOLDCOMP_DB> -i <INDEX_PATH> -t <THREADS> -m big

# Setting the number of bins and feature
folddisco index -p <PDB_DIR|FOLDCOMP_DB> -i <INDEX_PATH> -t <THREADS> -d <DISTANCE_BINS> -a <ANGLE_BINS> -y <FEATURE_TYPE>
```

```bash
# Example
# Indexing human proteome with 12 threads
folddisco index -p h_sapiens -i index/h_sapiens_folddisco -t 12
```

You can also download pre-built index files
- [Human proteome](https://foldcomp.steineggerlab.workers.dev/h_sapiens_folddisco.tar.gz)
- [E. coli proteome](https://foldcomp.steineggerlab.workers.dev/e_coli_folddisco.tar.gz)

### Querying
```bash
# default
folddisco query -i <INDEX> -p <QUERY_PDB> -q <QUERY_RESIDUES> -r -t <THREADS>
# -r flag is for residue matching & rmsd calculation
# -v flag is for verbose output (measures step-by-step runtime)

# Using the whole structure as a query
folddisco query -i <INDEX> -p <QUERY_PDB> -r -t <THREADS>

# Using query.txt file
folddisco query -i <INDEX> -q <QUERY_FILE> -r -t <THREADS>

# Using distance and angle threshold
folddisco query -i <INDEX> -p <QUERY_PDB> -q <QUERY_RESIDUES> -d <DISTANCE_THRESHOLD> -a <ANGLE_THRESHOLD> -r -t <THREADS>
```

```bash
# Example
# Zinc finger query to human proteome
folddisco query -i index/h_sapiens_folddisco -p query/1G2F.pdb -q F207,F212,F225,F229 -r -d 0.5 -a 5 -t 12
folddisco query -i index/h_sapiens_folddisco -q query/zinc_finger.txt -r -d 0.5 -a 5 -t 12

# Serine protease query to human proteome
folddisco query -i index/h_sapiens_folddisco -p query/4CHA.pdb -q B57,B102,C195 -r -t 12
folddisco query -i index/h_sapiens_folddisco -q query/serine_protease.txt -r -t 12
```


## Index list
- `index/`
- `h_sapiens_folddisco`: Human proteome, 23K structures
- `e_coli_folddisco`: E. coli proteome, 4K structures

## Example query list
- `query/`
- `1G2F.pdb`: Zinc finger protein
- `4CHA.pdb`: Serine protease
- `1LAP.pdb`: Aminopeptidase
- `zinc_finger.txt`: 1G2F.pdb F207,F212,F225,F229
- `serine_protease.txt`: 4CHA.pdb B57,B102,C195
- `aminopeptidase.txt`: 1LAP.pdb 250,255,273,332,334
- `knottin.txt`: 2N6N.pdb 3,10,15,16,21,23,28,30


## Contributor
Expand Down
File renamed without changes.
File renamed without changes.
Empty file added index/.keep
Empty file.
3,548 changes: 3,548 additions & 0 deletions query/1G2F.pdb

Large diffs are not rendered by default.

5,094 changes: 5,094 additions & 0 deletions query/1LAP.pdb

Large diffs are not rendered by default.

6,160 changes: 6,160 additions & 0 deletions query/1SU6.pdb

Large diffs are not rendered by default.

9,397 changes: 9,397 additions & 0 deletions query/2N6N.pdb

Large diffs are not rendered by default.

4,171 changes: 4,171 additions & 0 deletions query/4CHA.pdb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions query/aminopeptidase.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
query/1LAP.pdb 250,255,273,332,334
1 change: 1 addition & 0 deletions query/knottin.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
query/2N6N.pdb 3,10,15,16,21,23,28,30
1 change: 1 addition & 0 deletions query/serine_peptidase.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
query/4CHA.pdb B57,B102,C195
1 change: 1 addition & 0 deletions query/zinc_finger.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
query/1G2F.pdb F207,F212,F225,F229
1 change: 1 addition & 0 deletions query/zinc_finger_with_output.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
query/1G2F.pdb F207,F212,F225,F229 ./zinc_finger.folddisco.out.tsv
File renamed without changes.

0 comments on commit e6e3146

Please sign in to comment.