The ChemNLP project aims to
- create an extensive chemistry dataset and
- use it to train large language models (LLMs) that can leverage the data for a wide range of chemistry applications.
For more details see our information material section below.
- Introduction presentation
- Project proposal
- Task board
- awesome-chemistry-datasets repository to collect interesting chemistry datasets
- Weekly meetings are set up soon! Please join our Discord community for more information.
Feel free to join our #chemnlp
channel on our OpenBioML discord server to start the discussion in more detail.
ChemNLP is an open-source project - your involvement is warmly welcome! If you're excited to join us, we recommend the following steps:
- Join our Discord server.
- Have a look at our contributing guide.
- Looking for ideas? See our task board to see what we may need help with.
- Have an idea? Create an issue!
Our OpenBioML ChemNLP project is not afiliated to the ChemNLP library from NIST and we use "ChemNLP" as a general term to highlight our project focus. The datasets and models we create through our project will have a unique and recognizable name when we release them.
See https://openbioml.org, especially our approach and partners.