Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++] Integrate SOMAColumn: Arrow adapter methods, part 1 #3405

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

XanthosXanthopoulos
Copy link
Collaborator

@XanthosXanthopoulos XanthosXanthopoulos commented Dec 6, 2024

This PR replaces the Arrow schema to TileDB schema transformation to use the SOMAColumn create methods.
Also there are a set of new data converters from arrow arrays to std::array for simplification.

This migration also enforces a current domain restriction for string dimensions to libtiledbsoma in addition to the restriction being present only on the R and Python APIs.

@XanthosXanthopoulos XanthosXanthopoulos changed the title Integrate SOMAColumn in Arrow adapter methods [WIP] Integrate SOMAColumn in Arrow adapter methods Part 2 Dec 8, 2024
@XanthosXanthopoulos XanthosXanthopoulos changed the title Integrate SOMAColumn in Arrow adapter methods Part 2 [c++] Integrate SOMAColumn in Arrow adapter methods Part 2 Dec 8, 2024
@XanthosXanthopoulos XanthosXanthopoulos marked this pull request as ready for review December 8, 2024 16:25
@johnkerl johnkerl changed the title [c++] Integrate SOMAColumn in Arrow adapter methods Part 2 [c++] Integrate SOMAColumn in Arrow adapter methods, part 2 Dec 9, 2024
Copy link
Member

@nguyenv nguyenv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the Skip and Take are for and document it? It looks like Take is the index of the column to retrieve and Skip is relevant only for geometry columns (where it's always 2)?

Also is there a way to use std::variant or a templated type instead of std::any or would that make things too complicated?

Comment on lines +743 to +747
/**
* Return a copy of the data in a specified column of an arrow table.
* Complex column types are supported. The for each sub column are an
* std::array<T, 2> casted as an std::any object.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* Return a copy of the data in a specified column of an arrow table.
* Complex column types are supported. The for each sub column are an
* std::array<T, 2> casted as an std::any object.
*/
/**
* Return a copy of the data in a specified column of an arrow table.
* Complex column types are supported. The type for each sub column is
* an std::array<T, 2> casted as an std::any object.
*/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skip and Take are used in 2 places with 2 specific sets on values (either Skip=3 and Take=2 or Skip=0 and Take=2) and are independent of the geometry column. Their usage is to extract specific subranges of ArrowArray data and they come in handy during ArrowSchema -> TileDBSchema where the arrow array provided has 5 values per dimension and we only need the last 2 to set the current domain.

As to using std::variant, adding more SOMAColumn types would require changing multiple variants. The use of std::any here is to enable runtime polymorphism and indirectly introduces a runtime type check (via any_cast, make_any) between the templated function and the actual dimension type. std::variant can provide all the above it is just a different style I am open to discuss further.

@XanthosXanthopoulos XanthosXanthopoulos force-pushed the xan/sc-59427/soma-column-arrow-integration branch from b1bd03c to 5485141 Compare December 11, 2024 11:44
@XanthosXanthopoulos XanthosXanthopoulos force-pushed the xan/sc-59427/soma-column-arrow-integration branch from 5485141 to 0e69ed7 Compare December 13, 2024 16:07
@XanthosXanthopoulos XanthosXanthopoulos changed the base branch from xan/sc-59427/soma-column to xan/sc-59427/soma-geometry-column December 13, 2024 16:08
@XanthosXanthopoulos XanthosXanthopoulos changed the title [c++] Integrate SOMAColumn in Arrow adapter methods, part 2 [c++] Integrate SOMAColumn: Arrow adapter methods, part 1 Dec 13, 2024
@XanthosXanthopoulos XanthosXanthopoulos force-pushed the xan/sc-59427/soma-column-arrow-integration branch from 0e69ed7 to d6d6187 Compare December 13, 2024 16:10
@XanthosXanthopoulos XanthosXanthopoulos force-pushed the xan/sc-59427/soma-geometry-column branch 4 times, most recently from 8daf17e to a426c7a Compare January 8, 2025 19:04
Base automatically changed from xan/sc-59427/soma-geometry-column to main January 8, 2025 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[c++] Add an abstraction layer between SOMA columns and TileDB dimensions and attributes
2 participants