Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept Python dict from client "write_dataframe" and TableAdapter #771

Merged
merged 9 commits into from
Aug 6, 2024

Conversation

nmaytan
Copy link
Contributor

@nmaytan nmaytan commented Jul 17, 2024

In ref of #733.

Both cases insist upon a dataframe-compatible shape. For write_dataframe:

d = {'a': [1,2,3], 'b': [4,5,6,7]}
c.write_dataframe(d, key='n')
[...]
ArrowInvalid: Column 1 named b expected length 3 but got length 4

pyarrow.lib.Table.validate() is in the stack trace, and docs say that a pyarrow Table is:

A collection of top-level named, equal length Arrow arrays.

For TableAdapter:

d = {'a': [1,2,3,4,5], 'b': [4,5,6,7,8,9]}
tdf = DataFrameAdapter.from_pydict(d, npartitions=1)
[...]
ValueError: An error occurred while calling the from_dict method registered to the pandas backend.
Original Message: All arrays must be of the same length

This comes from dask.dataframe.from_dict.

Lastly, Dan pointed out that this dict support let us slightly simplify generated_minimal.py, and I've confirmed that a simple dict replicates the example identically.

client['C'].read()
Out[4]:
      x    y    z
0   1.0  2.0  3.0
1   1.0  2.0  3.0
2   1.0  2.0  3.0
3   1.0  2.0  3.0
4   1.0  2.0  3.0
..  ...  ...  ...
95  1.0  2.0  3.0
96  1.0  2.0  3.0
97  1.0  2.0  3.0
98  1.0  2.0  3.0
99  1.0  2.0  3.0

[100 rows x 3 columns]

Checklist

  • Add a Changelog entry
  • Add the ticket number which this PR closes to the comment section

@nmaytan nmaytan marked this pull request as ready for review July 25, 2024 14:03
tiled/adapters/table.py Outdated Show resolved Hide resolved
tiled/structures/table.py Outdated Show resolved Hide resolved
tiled/_tests/test_writing.py Outdated Show resolved Hide resolved
@nmaytan nmaytan requested a review from danielballan July 26, 2024 18:24
@danielballan danielballan merged commit 17589bb into bluesky:main Aug 6, 2024
9 checks passed
@nmaytan nmaytan deleted the df_from_dict branch August 16, 2024 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants