-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean versioned dataset automatically/periodically #1658
Comments
#406 was also drafted due to issues with limited storage, even though I'd like to point out that "limited" can still be pretty generous (e.g. was already using 169 GB storage in #406). At some point, it's reasonable to want to have a way to delete data. :D |
Found a related comment in #1076
|
I have a question too. How do we know what/where to delete?
|
Closing in favour of #1799 so we can collect all thoughts + comments in the same place. |
Description
In
catalog.yml
file, we can enable versioning by addingversioned = True
, so Kedro will generate data with timestamp as the folder name each time. Currently, I don’t need the data older than a week, for example. It would be convenient if there is a functionality in Kedro that can enable users to automatically/periodically remove the older data.Context
I'm working on on-premise environment with limited storage, so I don't need the generated data older than a certain time period. Another context would be to remove historical data due to data privacy/protection issue.
Possible Implementation
Add
delete
option for dataset incatalog.yml
whenversioned = True
is enabled. Possible parameter could be time, and by default it could be infinite time (i.e., not deleting at all)A highly relevant previous issue: Support archiving/deleting old/unused datasets #406
The text was updated successfully, but these errors were encountered: