Skip to content
forked from dyjfan/AutoRes

Deep Learning-based Method for Automatic Resolution of GC-MS Data from Complex Samples - modified for .CDF Agilent files from GC-MSD 7890A-5975C

License

Notifications You must be signed in to change notification settings

micabake/AutoRes

 
 

Repository files navigation

AutoRes

This is the code repo for the paper Deep Learning-based Method for Automatic Resolution of GC-MS Data from Complex Samples. We proposed an Automatic Resolution method (AutoRes) for overlapped peaks in complex GC-MS data based on the pseudo-Siamese convolutional neural network (pSCNN) architecture. It consists of two pSCNN models (pSCNN1 and pSCNN2) with the same architecture but different inputs. Two pSCNN models were trained with 400,000 augmented spectral pairs, respectively. They can predict the selective region (pSCNN1) and elution region (pSCNN2) of compounds in an untargeted manner. The predicted regions are used as inputs to the full rank resolution (FRR) method, which can be easily achieved for the overlapping peaks.

Package required:

We recommend to use conda and pip.

By using the environment.yml, requirements.txt file, it will install all the required packages.

git clone https://github.com/dyjfan/AutoRes.git
cd AutoRes
conda env create -f environment.yml
conda activate autores

Data augmentation

The mass spectral pairs of the training pSCNN1 and pSCNN2 models are obtained using the data_augmentation_1 and data_augmentation_2 functions.

aug_eval1 = data_augmentation_1(spectra, n, maxn, noise_level=0.001)
aug_eval2 = data_augmentation_2(spectra, c, n, m, maxn, noise_level=0.001)

Optionnal args

  • spectra : Mass spectral library
  • c : Similar sublibrary
  • n :Number of amplified mass spectral pairs
  • m : Number of amplified mass spectral pairs with high similarity.
  • maxn :Number of components

Model training

Train the model based on your own training dataset with build_pSCNN function.

model = build_pSCNN(para)

Optionnal args

  • para : Hyperparameters for model training

Automatic Resolution

Automatic Resolution of GC-MS data files by using the AutoRes function.

AutoRes(ncr, model1, model2, filename)

Optionnal args

  • ncr : GC-MS data
  • model1 : pSCNN1 model
  • model2 : pSCNN2 model
  • filename : GC-MS data filename

Clone the repository and run it directly

git clone

An example has been provided in test.ipynb script for the convenience of users. Users can run it directly by placing the unzipped data file and AutoRes-1.0 file in the same directory after downloading.

It should be noted that the NIST_Spec.db database is not available for download due to copyright reasons. Users who need to use the database should obtain it from NIST.

Contact

About

Deep Learning-based Method for Automatic Resolution of GC-MS Data from Complex Samples - modified for .CDF Agilent files from GC-MSD 7890A-5975C

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 59.0%
  • Python 41.0%