-
Notifications
You must be signed in to change notification settings - Fork 57
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Wire up the ETag from S3's upload response back to the BlobDTO's MD5 …
…field, to handle multipart upload correctly (#915) With the late breaking design change to use subscoped tokens instead of direct S3 PUTs, we ended up using the same file handling logic as BDECs. This meant going through JDBC and the S3 SDK. As part of recent testing I've discovered that for files greater than 16MB, S3 splits the file into a multipart upload. The ETag of such a file is NOT the MD5 hash, which is what's also documented. For BDECs, we calculate the MD5 hash ourselves and send it to snowflake, where it's stored in the fileContentKey field. For parquet files operating specifically in the iceberg table, there is a check in XP to ensure that the ETag of the blob being read is identical to the fileContentKey stored in snowflake metadata. Connecting these dots - what's happening before this fix is that for iceberg ingestion of files greater than 16 MB, the SDK sends the MD5 hash into the fileContentKey property whereas XP expects it to be the ETag value (which is NOT the MD5 of the contents IF its a multipart upload). The proper fix is to make JDBC return the ETag value after uploading the file, through all the layers of JDBC classes, to the API that ingest SDK uses (uploadWithoutConnection). Since we need to fix this right away, this PR copies over those parts of JDBC that are used for iceberg ingestion. As soon as JDBC driver has the new fix we'll remove all these classes. Note that this PR accidentally changes the timeout to 20 seconds, another PR tomorrow is going to make that change and i'll back it out of this branch before merging.
- Loading branch information
1 parent
59ce9e0
commit 84727c3
Showing
22 changed files
with
2,645 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
<component name="ProjectRunConfigurationManager"> | ||
<configuration default="false" name="IcebergBigFilesIT.testMultiplePartUpload" type="JUnit" factoryName="JUnit" nameIsGenerated="true"> | ||
<module name="snowflake-ingest-sdk" /> | ||
<extension name="coverage"> | ||
<pattern> | ||
<option name="PATTERN" value="net.snowflake.ingest.streaming.internal.it.*" /> | ||
<option name="ENABLED" value="true" /> | ||
</pattern> | ||
</extension> | ||
<option name="PACKAGE_NAME" value="net.snowflake.ingest.streaming.internal.it" /> | ||
<option name="MAIN_CLASS_NAME" value="net.snowflake.ingest.streaming.internal.it.IcebergBigFilesIT" /> | ||
<option name="METHOD_NAME" value="testMultiplePartUpload" /> | ||
<option name="TEST_OBJECT" value="method" /> | ||
<option name="VM_PARAMETERS" value="-ea --add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED" /> | ||
<method v="2"> | ||
<option name="Make" enabled="true" /> | ||
</method> | ||
</configuration> | ||
</component> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.