Skip to content

A fully automatic method based on Transformer for resolution of overlapping peaks in gas chromatography-mass spectrometry

Notifications You must be signed in to change notification settings

micabake/GCMSFormer

 
 

Repository files navigation

GCMSFormer

This is the code repo for GCMSFormer mehtod. We proposed the GCMSFormer for resolving the overlapped peaks in complex GC-MS data based on a Transformer model. The GCMSFormer model was trained, validated, and tested with 100,000 augmented simulated overlapped peaks in a ratio of 8:1:1, and its bilingual evaluation understudy (BLEU) on the test set was 0.9988. With the aid of the orthogonal projection resolution method (OPR), GCMSFormer can predict the pure mass spectra of all components in overlapped peaks (mass spectral matrix S), and then use the least squares method to find the concentration distribution matrix C. The automatic resolution of the overlapped peaks can be easily achieved.

Package required:

We recommend to use conda and pip.

By using the environment.yml, requirements.txt file, it will install all the required packages.

git clone https://github.com/zxguocsu/GCMSFormer.git
cd GCMSFormer
conda env create -f environment.yml
conda activate GCMSFormer

Data augmentation

The overlapped peak dataset for training, validating and testing the GCMSFormer model is obtained using the gen_datasets functions.

TRAIN, VALID, TEST, tgt_vacob = gen_datasets(para)

Optionnal args

  • para : Data augmentation parameters

Model training

Train the model based on your own training dataset with train_model function.

model, Loss = train_model(para, TRAIN, VALID, tgt_vacob)

Optionnal args

  • para : Hyperparameters for model training
  • TRAIN : Training set
  • VALID : Validation set
  • tgt_vacob : Library

Resolution

Automatic Resolution of GC-MS data files by using the Resolution function.

Resolution(path, filename, model, tgt_vacob, device)

Optionnal args

  • path : GC-MS data path
  • filename : GC-MS data filename
  • model : GCMSFormer model
  • tgt_vacob : Library
  • device : Data distribution devices (cuda/cpu)

Clone the repository and run it directly

git clone

An example has been provided in test.ipynb script for the convenience of users. The GC-MS file used in it is available in the file Essential Oil Data.

Contact

About

A fully automatic method based on Transformer for resolution of overlapping peaks in gas chromatography-mass spectrometry

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 84.0%
  • Python 16.0%