-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c++, r] Update and refactor nanoarrow #2188
Conversation
This pull request has been linked to Shortcut Story #42316: [r] Refactor to make use of |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #2188 +/- ##
===========================================
- Coverage 77.97% 66.49% -11.48%
===========================================
Files 140 142 +2
Lines 10855 12743 +1888
Branches 217 511 +294
===========================================
+ Hits 8464 8474 +10
- Misses 2303 4169 +1866
- Partials 88 100 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
|
dc7819d
to
4964ec1
Compare
649dc55
to
ab3f6d0
Compare
b341e0e
to
ab3f6d0
Compare
a73b741
to
52a2f4a
Compare
21110d1
to
bed7757
Compare
2a27207
to
5f7e0e0
Compare
The dual nanoarrows (from inside the R package, and inside libtiledbsoma) can likely be remedied. The location in tiledbsoma should be in externals/ anyway -- a follow-up PR seems apt. |
I created #2362 to track this |
@johnkerl One triplet of |
81ee27a
to
5450dfe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
Thanks @eddelbuettel !! Just to set an expectation, this may not backport automatically to |
@@ -108,17 +154,20 @@ std::unique_ptr<ArrowSchema> ArrowAdapter::arrow_schema_from_tiledb_array( | |||
auto nattr = tiledb_schema.attribute_num(); | |||
|
|||
std::unique_ptr<ArrowSchema> arrow_schema = std::make_unique<ArrowSchema>(); | |||
arrow_schema->format = "+s"; | |||
arrow_schema->format = strdup("+s"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I'm the one who added this but I'm not sure that it's necessary here. I may have overcorrected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is very much needed. It is just stylistic to use strdup()
which allocates. If you peek into what nanoarrow
does it actually allocates explicitly with malloc
(as do I over in tiledb-r)
ArrowErrorCode ArrowSchemaSetFormat(struct ArrowSchema* schema, const char* format) {
if (schema->format != NULL) {
ArrowFree((void*)schema->format);
}
if (format != NULL) {
size_t format_size = strlen(format) + 1;
schema->format = (const char*)ArrowMalloc(format_size);
if (schema->format == NULL) {
return ENOMEM;
}
memcpy((void*)schema->format, format, format_size);
} else {
schema->format = NULL;
}
return NANOARROW_OK;
}
This is different from what it also has 'view' inits for schema which just borrow the pointed to memory because the contract the viewer can assume the viewed object exists.
In short, your strdup()
was fine. And after 30+ plus years of C it possibly taught me another K+R (at style) function so all good!
Also note that if you peek at the matching release
function they free()
so there is an assumption of allocated memory here.
return NANOARROW_TYPE_LARGE_BINARY; | ||
else | ||
throw TileDBSOMAError(fmt::format( | ||
"ArrowAdapter: Unsupported TileDB datatype string: {} ", sv)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nanoarrow format strings here, not TileDB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually Arrow format string (which is the same as Arrow). Will adjust.
a27a233
to
7bb42fb
Compare
* Update nanoarrow vendored files to nanoarrow 0.4.0 * Low-level wiring of nanoarrow at sr_* level * More lower-level wiring of nanoarrow * WIP snapshot with nanoarrow wired into libtiledbsoma * Ensure nullable is set correctly in either case * Context wrapped in a special purpose struct should not finalize * Simpler and faster r-ci.yaml * Use nanoarrow 0.4.0 consistently * Refined arrow_adapter * Set increased timeout for download.file to survive GH flakyness * Turn trace back of, do not include carrow in cli * Do not include carrow.h in reindexer.cc * WIP changes expanding type map, suppressing schema release * [c++] Fix segfault issues * Add additional necessary strdup * No longer to protect one statement * Support TILEDB_DATETIME_DAY aka Date as well * Meh * Meh with version 14.0.0 and not 14.0.6 because ... sure * Remove initialization setters covered by nanoarrow use * Ensure DATETIME columns get Arrow coltype reset * Add more date and datetime support * Additional conversion * Post-rebase change * Heeding time to the lord of linting is time well spent some say * Heeding time to the lord of linting is time well spent some say * Correct another delete to free * Additional non-nullptr protection * make format * Additional test conditioner * Correcting one buffer size selection * make format * Remove carrow.h and reference to it * Cleanups * Use nanoarrow.{c,hpp} via tiledbsoma/utils/ * Re-activate -Werror * Chore * High-productivity afternoon * Correct an format string error message --------- Co-authored-by: Vivian Nguyen <[email protected]>
* Update nanoarrow vendored files to nanoarrow 0.4.0 * Low-level wiring of nanoarrow at sr_* level * More lower-level wiring of nanoarrow * WIP snapshot with nanoarrow wired into libtiledbsoma * Ensure nullable is set correctly in either case * Context wrapped in a special purpose struct should not finalize * Simpler and faster r-ci.yaml * Use nanoarrow 0.4.0 consistently * Refined arrow_adapter * Set increased timeout for download.file to survive GH flakyness * Turn trace back of, do not include carrow in cli * Do not include carrow.h in reindexer.cc * WIP changes expanding type map, suppressing schema release * [c++] Fix segfault issues * Add additional necessary strdup * No longer to protect one statement * Support TILEDB_DATETIME_DAY aka Date as well * Meh * Meh with version 14.0.0 and not 14.0.6 because ... sure * Remove initialization setters covered by nanoarrow use * Ensure DATETIME columns get Arrow coltype reset * Add more date and datetime support * Additional conversion * Post-rebase change * Heeding time to the lord of linting is time well spent some say * Heeding time to the lord of linting is time well spent some say * Correct another delete to free * Additional non-nullptr protection * make format * Additional test conditioner * Correcting one buffer size selection * make format * Remove carrow.h and reference to it * Cleanups * Use nanoarrow.{c,hpp} via tiledbsoma/utils/ * Re-activate -Werror * Chore * High-productivity afternoon * Correct an format string error message --------- Co-authored-by: Dirk Eddelbuettel <[email protected]> Co-authored-by: Vivian Nguyen <[email protected]>
Issue and/or context:
We rely on
nanoarrow
to take advantage of higher-level structures for the (otherwise 'naked'void *
of the C API of Arrow.nanoarrow
helps here, and has made good strides in crecent releases.Changes:
We update the 'vendored'
nanoarrow
to release 0.4.0, and make extended use of it. This works well: simple examples work fine and show the desired memory behaviour. However, on larger examples we see interactions with (local) handling of Arrow objects inlibtiledbsoma
so the scope of the PR will have to extended to handle allocating, releases and setting up Arrow objects viananoarrow
.Notes for Reviewer:
This PR is informational to run some CI; all local tests pass. However, at present an invasice change inlibtiledbsoma
makes this unsuitable for a merge at present meaning it is also not yet fully ready for a review.The secret cabal powers of CI are currently smiling upon us (after a lot of help from several people, and a last minute bug fix) so the 'draft' label is now off. There are remaining 'drafty' things here (such as a move of
nanoarrow.*
to a directory belowexternal/
but we can do so in a later PR.