Skip to content

SciWatch is a Python package designed to facilitate scientific monitoring for researchers

License

Notifications You must be signed in to change notification settings

AghilesAzzoug/SciWatch

Repository files navigation

tests docs mit_license

SciWatch is a Python package designed to facilitate scientific monitoring for data scientists and AI researchers (mainly). It serves as a useful tool for staying up-to-date with the latest developments in the ever-evolving world of science and technology. By effortlessly retrieving relevant scientific papers and technical blogs, SciWatch empowers researchers to keep their knowledge current and expand their horizons in their respective fields.

Usage

  1. Setup senders

See senders documentation for details

Example for with Gmail, setup the following env variables:

export [email protected]
export gmail_token=your_token
  1. Write a config (scrapping_config.toml)
title = "LLM & AL Watch" # Will be used as email title

end_date = "now" # will search content up to now (exec. time)
time_delta = "02:00:00:00" # will look for content up to two days ago

recipients = ["[email protected]"]

# define your queries
[[query]]
title = "LLM" # LLM query
raw_content = """intitle:(GPT* OR LLM* OR prompt* OR "Large language models"~2) AND incontent:(survey OR review OR evaluation* OR benchmark* OR optimization*)"""

[[query]]
title = "AL" # Active Learning on VRD (or benchmarks/surveys)
raw_content = """intitle:("active learning") AND incontent:(VRD OR documents OR survey* OR benchmark*)"""

# define your sources
[[source]]
type = "arxiv" # check for Computer Science papers on Arxiv
use_abstract_as_content = true
search_topic = "cs"
max_documents = 200

[[source]]
type = "openai_blog" # check for latest blogs on OpenAI blog (mainly for GPT updates)
max_documents = 20
  1. Run the watcher
from sci_watch.sci_watcher import SciWatcher

watcher = SciWatcher.from_toml("scrapping_config.toml")

watcher.exec()  # if some relevant content is retrieved, recipients will receive an Email

You might get an email like this:

Documentation

For full documentation, including grammar syntax, check the docs.

Contributing

Contribution are welcome by finding issues or by pull requests. For major changes, please open an issue first to discuss/explain what you would like to change.

  1. Fork the project
  2. Create your feature branch following the convention feature/feature-name (git checkout -b feature/feature-name)
  3. Run pre-commit (make pre-commit)
  4. Commit your changes (git commit -m "a meaningful message please")
  5. Push to the branch (git push origin feature/feature-name)
  6. Open a Pull Request

Roadmap

  • (feat) Add GPT support for papers summarization
  • (feat) Add better error handling (while scrapping, calling OpenAI API, etc.)
  • (refactor) Refactor configuration file parsing (and a lot of other things)
  • (perf) Add short-circuit evaluation for queries
  • (perf) Run sources only once for all queries
  • (perf) Process queries asynchronously

Feel free to post an issue or send an email if you have any idea :)

License

Copyright 2024 Aghiles Azzoug

SciWatch is free and open-source software distributed under the terms of the MIT license.

Contact

Aghiles Azzoug - LinkedIn - [email protected]