This is the code repo for GCMSFormer mehtod. We proposed the GCMSFormer for resolving the overlapped peaks in complex GC-MS data based on a Transformer model. The GCMSFormer model was trained, validated, and tested with 100,000 augmented simulated overlapped peaks in a ratio of 8:1:1, and its bilingual evaluation understudy (BLEU) on the test set was 0.9988. With the aid of the orthogonal projection resolution method (OPR), GCMSFormer can predict the pure mass spectra of all components in overlapped peaks (mass spectral matrix S), and then use the least squares method to find the concentration distribution matrix C. The automatic resolution of the overlapped peaks can be easily achieved.
We recommend to use conda and pip.
By using the environment.yml
, requirements.txt
file, it will install all the required packages.
git clone https://github.com/zxguocsu/GCMSFormer.git
cd GCMSFormer
conda env create -f environment.yml
conda activate GCMSFormer
The overlapped peak dataset for training, validating and testing the GCMSFormer model is obtained using the gen_datasets functions.
TRAIN, VALID, TEST, tgt_vacob = gen_datasets(para)
Optionnal args
- para : Data augmentation parameters
Train the model based on your own training dataset with train_model function.
model, Loss = train_model(para, TRAIN, VALID, tgt_vacob)
Optionnal args
- para : Hyperparameters for model training
- TRAIN : Training set
- VALID : Validation set
- tgt_vacob : Library
Automatic Resolution of GC-MS data files by using the Resolution function.
Resolution(path, filename, model, tgt_vacob, device)
Optionnal args
- path : GC-MS data path
- filename : GC-MS data filename
- model : GCMSFormer model
- tgt_vacob : Library
- device : Data distribution devices (cuda/cpu)
An example has been provided in test.ipynb script for the convenience of users. The GC-MS file used in it is available in the file Essential Oil Data.