Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: change binary array type from LargeBinaryArray to BinaryArray #3924

Merged
merged 12 commits into from
May 18, 2024

Conversation

etolbakov
Copy link
Collaborator

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

#3913

What's changed and what's your intention?

  • change binary array type from LargeBinaryArray to BinaryArray

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

@github-actions github-actions bot added the docs-not-required This change does not impact docs. label May 13, 2024
@etolbakov
Copy link
Collaborator Author

To verify the read parquet files functionality the following steps have been taken (many thanks to @evenyag)

  1. The latest official release has been used to start Greptime instance in the standalone mode.
  2. Following the steps of the blob.sql test a table with binary column type has been created
CREATE TABLE blobs (b BYTEA, t timestamp time index);

INSERT INTO blobs VALUES('\xaa\xff\xaa'::BYTEA, 1), ('\xAA\xFF\xAA\xAA\xFF\xAA'::BYTEA, 2), ('\xAA\xFF\xAA\xAA\xFF\xAA\xAA\xFF\xAA'::BYTEA, 3);

SELECT * FROM blobs;
  1. Once the blobs table has been updated, the following SQL select fush_table('blobs'); has been used for flushing the table on disk.
  2. Then the patched version of Greptime has been built and run in the standalone mod (the same data directory as in the step 1 was used)
  3. Attempt to read the blobs table succeeded,

@tisonkun tisonkun requested review from evenyag May 13, 2024 22:51
@tisonkun
Copy link
Collaborator

@etolbakov May I ask what do you think this patch is still a draft? AFAICS it's ready for review.

src/datatypes/src/vectors/helper.rs Outdated Show resolved Hide resolved
src/datatypes/src/vectors/helper.rs Outdated Show resolved Hide resolved
src/datatypes/src/vectors/helper.rs Show resolved Hide resolved
@evenyag
Copy link
Contributor

evenyag commented May 14, 2024

To verify the read parquet files functionality the following steps have been taken

Thanks! This is a good news. I'm also considering constructing a test case that uses the ParquetReader to read a file with a different arrow schema. There are two approaches:

  1. Just uses the parquet file generated by an old DB release
  2. Constructs the parquet file manually using the ArrowWriter

The second way is more readable.

@etolbakov etolbakov marked this pull request as ready for review May 14, 2024 21:19
@etolbakov etolbakov requested a review from a team as a code owner May 14, 2024 21:19
Copy link

codecov bot commented May 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.13%. Comparing base (11ad5b3) to head (1eb1ddf).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3924      +/-   ##
==========================================
- Coverage   85.40%   85.13%   -0.28%     
==========================================
  Files         978      978              
  Lines      169100   169283     +183     
==========================================
- Hits       144427   144124     -303     
- Misses      24673    25159     +486     

Copy link
Contributor

@evenyag evenyag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's almost done. The only issue is that we should cast the BinaryArray to a LargeBinaryArray.

src/mito2/src/test_util/sst_util.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/parquet.rs Outdated Show resolved Hide resolved
src/mito2/src/test_util/sst_util.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/parquet.rs Outdated Show resolved Hide resolved
src/mito2/src/test_util.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/parquet.rs Outdated Show resolved Hide resolved
src/mito2/src/test_util/sst_util.rs Outdated Show resolved Hide resolved
src/mito2/src/test_util/sst_util.rs Outdated Show resolved Hide resolved
src/mito2/src/sst/parquet.rs Outdated Show resolved Hide resolved
src/mito2/src/test_util/sst_util.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@evenyag evenyag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@evenyag evenyag requested a review from fengjiachun May 18, 2024 07:59
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fengjiachun fengjiachun added this pull request to the merge queue May 18, 2024
@fengjiachun
Copy link
Collaborator

Thank you!

Merged via the queue into GreptimeTeam:main with commit 6a9a929 May 18, 2024
20 checks passed
@etolbakov etolbakov deleted the switch-binary-array-type branch May 18, 2024 08:43
WenyXu pushed a commit to WenyXu/greptimedb that referenced this pull request May 21, 2024
…reptimeTeam#3924)

* chore: change binary array type from LargeBinaryArray to BinaryArray

* fix: adjust try_into_vector logic

* fix: apply CR suggestions, add tests

* chore: fix failing test

* chore: fix integration test

* chore: adjust the assertions according to changed implementation

* chore: add a test with LargeBinary type

* chore: apply CR suggestions

* chore: simplify tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-not-required This change does not impact docs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants