If you use this code or application, please cite the original paper published by Bioinformatics: 10.1093/bioinformatics/btx760
This collection of utility scripts creates files which can be served by the Underdark Go web service which is part of the FUn Framework.
For a complete overview and detailed installation instructions for this project, please visit the project website.
- Python 3
- numpy
- pandas
- scipy
- sklearn
It is important to have all input files in a correct file format which is a plain-text file containing information on one molecule per line, structured as follows
c1ccccc1 Benzene 1;0;0;1;0;1;1;1;1;1;0 1.25;6
C1CCCC1 Cyclopentan 1;0;0;1;1;1;1;1;1;0;1 0.75;5
Where each line contains the SMILES, an arbitrary label, a fingerprint vector, and any number of numerical properties for which the colour maps will be generated. The default delimiters are whitespace
and ;
, both can be changed by modifying the script doitall.sh
.
To generate the files for Underdark Go, make sure the files are in the correct format and all dependencies are met and clone this repository
git clone https://github.com/reymond-group/pca.git
Next, make sure the bash script is executable
chmod +x doitall.sh
Finally, run the bash script which will in turn run the necessary python scripts
./doitall.sh inputFile databaseName fingerprintName n
where inputFile
is a plain-text file formatted according to the information provided above, databaseName
and fingerprintName
are arbitrary names chosen for the database and the fingerprint respectively. n
is an integer setting the resolution of the final cubic map. It is good practice to provide low (n <= 250) and high (n >= 500) resolution versions of each map. While most maps are probably sparse and do not approach the maximum number of rendered bins n3, these numbers might have to be lowered for densly populated maps.
Example
./doitall.sh my-awesome-data.txt ACMEbase Xfp 250
PCA models generated by the above process can be exposed via web services to process additional fingerprints in order to project them onto the database space. Faerun can make use of this service to directly project data points on the currently loaded database. In order to enable this functionality please set the pcaUrl
option in the Faerun configuration file to the address of this service. See here how to change Faeruns configuration.
The easiest way to run the PCA service is running the docker container for the service
docker run -d -p 80:80 -v /your/host/dir:/models --name pcaservice daenuprobst/planes
where /your/host/dir
contains the models you wish to provide via the service. Model files are generated by the above mentioned script doitall.sh
and are named as databaseName.fingerprintName.3.pkl
.
Once the service is up and running, you are able to post json data to it, the message content should have the following format
{
"database": "databaseName",
"fingerprint": "fingerprintName",
"dimensions": 3,
"binning": true,
"resolution": 250,
"data": [
[2,2,4,5,3,0,0,0,2,0,0,0,0,19,14,0,0,0,0,0,0,1,1,0,3,5,2,1,1,6,4,0,2,0,0,6,5,0,8,1,0,1],
[0,0,3,4,11,0,0,0,1,0,0,0,0,22,16,1,0,0,0,0,0,0,1,0,4,5,7,4,0,5,1,0,0,0,0,3,3,0,15,1,0,0],
...
]
}
on success, the service will return a message containing the x,y,z coordinates of the fingerprints that were submitted
{
"success": true,
"database": "surechembl",
"fingerprint": "mqn",
"dimensions": 3,
"data": [
[153.12, -23.35653, 27.12],
[282.162, 35.47863, -2.64],
...
]
}
on error the following message is returned
{
"success": false,
"error": 'Oops! Something went wrong.'
}
an example can be found here.