This is the code repo for the paper Deep Learning-based Method for Automatic Resolution of GC-MS Data from Complex Samples. We proposed an Automatic Resolution method (AutoRes) for overlapped peaks in complex GC-MS data based on the pseudo-Siamese convolutional neural network (pSCNN) architecture. It consists of two pSCNN models (pSCNN1 and pSCNN2) with the same architecture but different inputs. Two pSCNN models were trained with 400,000 augmented spectral pairs, respectively. They can predict the selective region (pSCNN1) and elution region (pSCNN2) of compounds in an untargeted manner. The predicted regions are used as inputs to the full rank resolution (FRR) method, which can be easily achieved for the overlapping peaks.
We recommend to use conda and pip.
By using the environment.yml
, requirements.txt
file, it will install all the required packages.
git clone https://github.com/dyjfan/AutoRes.git
cd AutoRes
conda env create -f environment.yml
conda activate autores
The mass spectral pairs of the training pSCNN1 and pSCNN2 models are obtained using the data_augmentation_1 and data_augmentation_2 functions.
aug_eval1 = data_augmentation_1(spectra, n, maxn, noise_level=0.001)
aug_eval2 = data_augmentation_2(spectra, c, n, m, maxn, noise_level=0.001)
Optionnal args
- spectra : Mass spectral library
- c : Similar sublibrary
- n :Number of amplified mass spectral pairs
- m : Number of amplified mass spectral pairs with high similarity.
- maxn :Number of components
Train the model based on your own training dataset with build_pSCNN function.
model = build_pSCNN(para)
Optionnal args
- para : Hyperparameters for model training
Automatic Resolution of GC-MS data files by using the AutoRes function.
AutoRes(ncr, model1, model2, filename)
Optionnal args
- ncr : GC-MS data
- model1 : pSCNN1 model
- model2 : pSCNN2 model
- filename : GC-MS data filename
An example has been provided in test.ipynb script for the convenience of users. Users can run it directly by placing the unzipped data file and AutoRes-1.0 file in the same directory after downloading.
It should be noted that the NIST_Spec.db database is not available for download due to copyright reasons. Users who need to use the database should obtain it from NIST.