-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++] Refactor metadata
and create
to respect timestamps
#2180
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #2180 +/- ##
=======================================
Coverage 90.61% 90.61%
=======================================
Files 37 37
Lines 3900 3900
=======================================
Hits 3534 3534
Misses 366 366
Flags with carried forward coverage won't be shown. Click here to find out more.
|
056d4b9
to
bea1599
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Looks good to me but is also rather comprehensive so may be prudent to gather another +1 before merging.
Gotcha, will wait for another review. And I'll try to break these PRs up even more next time 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me with the caveat that there were a lot of changes, and I'm not entirely sure I understood the implications of everything. I do have two comments:
- Storing the metadata in memory without re-syncing after a write may lead to an in memory representation of the metadata that does not match the actual metadata on disk if there was another write. This is fine, so long as we are aware and okay with that. (This could be the case even if we did re-open the array, it's just more likely to get out of sync here.)
- I added a couple nits about not using or removing brackets around a single line
if
statement, but there were a lot of other places that the brackets were not used or removed. The convention in the core TileDB library is to always have brackets, but I didn't want to flag all of the instances here if we intentionally have a different convention in this library.
// Metadata values need to be accessible in write mode as well. When adding | ||
// or deleting values in the array, instead of closing to update to | ||
// metadata; then reopening to read the array; and again reopening to | ||
// restore the array back to write mode, we just store the modifications to | ||
// this cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are other writes that happen simultaneously, this metadata map might not reflect any actual representation that you could obtain (even with time traveling). Are we okay with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be a better way to cache the metadata so that we deal with simulatenous writes correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading the data from disk is safer but slower, so it really just depends on what trade-offs we want to make.
metadata
and create
to respect timestamps
@nguyenv what else needs to happen to get this PR merged? Please let me know if/how I can help. |
* Store read-mode `Array` or `Group` that holds metadata values valid as a class memeber * `create` methods take in timestamps which indicate when the metadata values for `soma_object_type` and `encoding_version` should be written and when the write-mode `SOMAObject` should be opened * Make `soma_object_type` and `encoding_version` consts * Use keystroke saver `TimestampRange` * Refactor unit tests to reflect these changes
2b355c7
to
ca62162
Compare
Thanks @nguyenv ! |
Issue and/or context:
#2182
I am refactoring the create and metadata methods in C++ after I noticed issues while working on transitioning the Python
SOMADataFrame
write path to useclib.DataFrame
.Changes:
Array
orGroup
that holds metadata values valid as a class memebercreate
methods now take in timestamps which indicate when the metadata values forsoma_object_type
andencoding_version
should be written (this is consistent to how it is done in the TileDB-SOMA API) and when the write-modeSOMAObject
should be openedcreate
methods now only openArray
orGroup
oncesoma_object_type
,encoding_version
, and the current encoding version (1
) constsTimestampRange
typeSOMAObject
s are writing and reading metadata correctly with timestampscreate
can be written toNotes for Reviewer:
These changes have been pulled out of a larger branch: https://github.com/single-cell-data/TileDB-SOMA/tree/viviannguyen/array-write-path