Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Vizier GIT-friendly #317

Open
okennedy opened this issue May 22, 2024 · 1 comment
Open

Make Vizier GIT-friendly #317

okennedy opened this issue May 22, 2024 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers layer-api An issue involving the vizier API layer
Milestone

Comments

@okennedy
Copy link
Contributor

What pain point is this feature intended to address? Please describe.
At present, there's no (easy) way to share Vizier database files. The current state of the art is to zip up the vizier.db directory, or to use the 'export' feature. Neither is particularly amenable to collaborative development.

Describe the solution you'd like
Fundamentally, it would be nice to have a way to drop a vizier.db folder into a VCS. The limiting factors to this at the moment are:

  1. The Vizier.db SQLite database can get quite large. We did add a GC/Dedup feature that should keep it more in check, but even so, it is not unlikely that a database will eventually exceed the file size limit of public VCS hosts like GitHub.
  2. The Vizier.db SQLite database is updated on every edit, and is a binary file. This means that, for all practical purposes, the SQLite database can't be delta'd and ends up getting pushed in its entirety on every edit.
  3. The SQLite database can't (easily) be diffed, not only because it is a binary file, but even a logical diff of the database would need to take into account semantic considerations, like key identifiers (which could conflict if two people add workflows/etc... in parallel), and foreign key relationships (e.g., filenames that need to correspond to artifact identifiers). This means any conflict requires the user to take extensive, error-prone manual resolution steps.
  4. It's possible that file artifacts could exceed the size cap of a VCS system; GIT-LFS support should be included.

Describe alternatives you've considered

  • Zip up and email Vizier.db
  • Vizier's export feature
@okennedy okennedy added enhancement New feature or request good first issue Good for newcomers layer-api An issue involving the vizier API layer labels May 22, 2024
@okennedy okennedy added this to the Eventually milestone May 22, 2024
@okennedy
Copy link
Contributor Author

Vizier already supports computing deltas of workflows. One approach that hits bullets 1-3 might be:

  1. Add a log directory to vizier.db
  2. Add a .gitignore to vizier.db that explicitly ignores Vizier.db (maybe this means we can move the cache directory here too!)
  3. Automatically log updates (more/less the delta bus) to a logfile. We could open a new logfile (marked with a timestamp for chronological integration) or use something like GIT's hash-based versioning. Treat this logfile as the canonical system state.
  4. On launch (and maybe while running) detect the presence of new logfiles in the log directory and patch the database accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers layer-api An issue involving the vizier API layer
Projects
None yet
Development

No branches or pull requests

1 participant