Deep Learning for Antibiotic Resistance Gene Detection

PUBH 8885: Computational Biology Fall 2024 Vedant Mahangade, Matthew Mollerus, Lucia Sanchez

This repo contains our work reproducing the work of DeepARG by Arango-Argoty et al., as well as our efforts to expand upon it by finetuning a DNA language model for ARG detection.

The DeepARG directory contains the Jupyter notebooks in which we recreated their work (called 'reproduce_model_type.ipynb' for each type of model) for both short read and long read models and expanded upon it by training models that could use DNA rather than amino acid sequences as inputs. Additionally, it contains notebooks that processed their data and retrieved DNA sequences for their original amino acid sequences and to perform ablation testing through training linear models on their features.

The scripts directory contains notebooks for training the large language model, as well as for testing we performed on it's output and performance.

Overall, we failed to precisely reproduce their results, we believe due to discrepancies between the dataset they described and the one they made publically available, as well as some unclear points in their description of their modelling. The models we reproduced showed the following results:

Models that took amino acid sequences as inputs generally performed better than equivalent models that took DNA sequences as inputs; however, the latter may be more useful in realworld metagenomic applications.
Ablation testing with linear models shows that for alignment based methods, such as that implemented in the DeepARG paper, the complexity of deep learning is likely unjustified, as linear models performed similarly or better.
Finetuned language models outperformed the relevant alignment-based models, indicating that their contextual understanding of sequence data may produce superior results.

More details of our implementation and results can be found in the notebooks and in the slides in this directory.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
DeepARG		DeepARG
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Sharable Links for model weights.txt		Sharable Links for model weights.txt
presentation.pdf		presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning for Antibiotic Resistance Gene Detection

About

Releases

Packages

Contributors 2

Languages

omicsEye/ResLens

Folders and files

Latest commit

History

Repository files navigation

Deep Learning for Antibiotic Resistance Gene Detection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages