SciWatch is a Python package designed to facilitate scientific monitoring for data scientists and AI researchers (mainly). It serves as a useful tool for staying up-to-date with the latest developments in the ever-evolving world of science and technology. By effortlessly retrieving relevant scientific papers and technical blogs, SciWatch empowers researchers to keep their knowledge current and expand their horizons in their respective fields.
- Setup senders
See senders documentation for details
Example for with Gmail, setup the following env variables:
export [email protected]
export gmail_token=your_token
- Write a config (
scrapping_config.toml
)
title = "LLM & AL Watch" # Will be used as email title
end_date = "now" # will search content up to now (exec. time)
time_delta = "02:00:00:00" # will look for content up to two days ago
recipients = ["[email protected]"]
# define your queries
[[query]]
title = "LLM" # LLM query
raw_content = """intitle:(GPT* OR LLM* OR prompt* OR "Large language models"~2) AND incontent:(survey OR review OR evaluation* OR benchmark* OR optimization*)"""
[[query]]
title = "AL" # Active Learning on VRD (or benchmarks/surveys)
raw_content = """intitle:("active learning") AND incontent:(VRD OR documents OR survey* OR benchmark*)"""
# define your sources
[[source]]
type = "arxiv" # check for Computer Science papers on Arxiv
use_abstract_as_content = true
search_topic = "cs"
max_documents = 200
[[source]]
type = "openai_blog" # check for latest blogs on OpenAI blog (mainly for GPT updates)
max_documents = 20
- Run the watcher
from sci_watch.sci_watcher import SciWatcher
watcher = SciWatcher.from_toml("scrapping_config.toml")
watcher.exec() # if some relevant content is retrieved, recipients will receive an Email
You might get an email like this:
For full documentation, including grammar syntax, check the docs.
Contribution are welcome by finding issues or by pull requests. For major changes, please open an issue first to discuss/explain what you would like to change.
- Fork the project
- Create your feature branch following the convention feature/feature-name (
git checkout -b feature/feature-name
) - Run pre-commit (
make pre-commit
) - Commit your changes (
git commit -m "a meaningful message please"
) - Push to the branch (
git push origin feature/feature-name
) - Open a Pull Request
- (feat) Add GPT support for papers summarization
- (feat) Add better error handling (while scrapping, calling OpenAI API, etc.)
- (refactor) Refactor configuration file parsing (and a lot of other things)
- (perf) Add short-circuit evaluation for queries
- (perf) Run sources only once for all queries
- (perf) Process queries asynchronously
Feel free to post an issue or send an email if you have any idea :)
Copyright 2024 Aghiles Azzoug
SciWatch is free and open-source software distributed under the terms of the MIT license.
Aghiles Azzoug - LinkedIn - [email protected]