-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script and snakemake rule to pull external data #190
Conversation
-Setup an 'external_files' script that pulls external data (where possible). -Altered the preprocess snakemake file to run the 'external_files' script and generate missing external files as required for other rules. -Tested with two files from the Harvard Dataverse.
-Added missing file to esternal_files.py -Removed external file retrievals from all other scripts. -Updated preprocess snakefile.
-Now use historical electricity data from EMBER for the regression underlying the demand projections. -EMBER data only runs from 2020 but will be updated year by year going into the future whereas the world in data dataset was static. The projected demand values and associated model results are comparable with a slightly better R2 based on the EMBER data. -Added the EMBER dataset to the 'external_files.py' script and associated snakemake rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maartenbrinkerink thanks for getting this! Everything looks good on my side. The only changes I made include.
- Moved the retrieve rule to
retrieve.smk
file to hold retrieval rules. - Removed the hardcoding of the files in the
external_files.py
data - Removed the retrieved data from the repository. As users will now just automatically download this data! :)
These are just suggestions - please feel free to revert anything you dont like!
@@ -0,0 +1,36 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a separate rule file, as as more data retrieval rules will probably need to get added soon. Just helps with code organization! :)
message: | ||
"Downloading external files..." | ||
params: | ||
files = get_external_links() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the hard-coded portion in the external_files.py
script and pass in the files/urls though snakemake. If running the script directly (for debugging), you can only pass in one file + url right now.
if __name__ == "__main__": | ||
|
||
if "snakemake" in globals(): | ||
external_files = snakemake.params.files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When passing in arguments directly through snakemake, we can pass in data structures like dictionaries (compared to only strings without a parser via command line). This is how we get around the hard coding of the files and improve re-usability!
Also added the |
Thanks @trevorb1, the changes work for me! |
Description
-Added 'external_files.py' that pulls all external files based on retrievable url's.
-Added a rule to the preprocess snakemake file to run said script.
-Removed all try/except statements from other scripts.
-Changed the source for historical electricity demand from Our World In Data to EMBER (the latter is updated yearly).
@trevorb1 On my end this is functional, i.e. you can remove the different PLEXOS files and run the workflow. The added rule will redownload the files first before doing anything else. Note that at this point I added the names of the files for which it is relevant both in the 'external_files.py' script as well as in the preprocess snakemake file. If there is a more efficient way to do this let me know.
Issue Ticket Number
#119
#189
Documentation