Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to remove (versioned) datasets #1799

Open
2 tasks
merelcht opened this issue Aug 22, 2022 · 0 comments
Open
2 tasks

Add functionality to remove (versioned) datasets #1799

merelcht opened this issue Aug 22, 2022 · 0 comments
Labels
Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation

Comments

@merelcht
Copy link
Member

Introduction

Different users have asked for a way to automatically or programmatically remove data saved during pipeline runs. This could be because of memory issues, but also simply because data that's older than a certain time period isn't useful anymore.

Background

#1658
#406

Task

  • Propose a way of allowing users to remove datasets. E.g. adding a field on the dataset in the catalog, through a CLI command etc.
  • Investigate how much effort it would be to implement the actual deletion of data. Can we rely on fsspec for this or would we need custom code to delete data locally, from cloud (s3 etc.), databases and so on.
@merelcht merelcht added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Aug 22, 2022
@merelcht merelcht added this to the Redesign Catalog and Datasets milestone Feb 6, 2023
@merelcht merelcht added this to the Dataset Versioning milestone Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation
Projects
Status: No status
Development

No branches or pull requests

1 participant