SNOW-1794373: Support DataFrameWriter.insertInto/insert_into #2835

sfc-gh-aling · 2025-01-08T21:26:26Z

Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes SNOW-NNNNNNN
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
  - If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
- If this is a new feature/behavior, I'm adding the Local Testing parity changes.
- I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
Please describe how your code solves the related issue.

Implement DataFrameWriter.insert_into/insertInto which adds dataframe content to existing table requiring same schema;
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameWriter.insertInto.html

support in both live and local testing mode

…rage-data-frame-writer-insert-into

…insert-into

sfc-gh-aalam · 2025-01-10T21:18:54Z

src/snowflake/snowpark/dataframe_writer.py

+        full_table_name = (
+            table_name if isinstance(table_name, str) else ".".join(table_name)
+        )


reminder to add a TODO to built ast for this API https://snowflakecomputing.atlassian.net/browse/SNOW-1489960

sfc-gh-aalam · 2025-01-10T21:26:32Z

src/snowflake/snowpark/dataframe_writer.py

+        if target_table.schema != self._dataframe.schema:
+            raise SnowparkClientException(
+                f"Schema of the DataFrame: {self._dataframe.schema} does not match the schema of the table {full_table_name}: {target_table.schema}."
+            )


I'm not sure if we should do this check on the client side. I think it is possible to append the data by type coercion even though the schemas are not an exact match.

sfc-gh-aalam · 2025-01-10T21:27:52Z

src/snowflake/snowpark/dataframe_writer.py

+            parse_table_name(table_name) if isinstance(table_name, str) else table_name
+        )
+
+        target_table = self._dataframe._session.table(qualified_table_name)


does it work in pyspark even when table does not exist? I think this would fail if table doesn't exist.

sfc-gh-aling added 4 commits January 8, 2025 13:15

impl

0e5f74d

add doc

c384722

Merge remote-tracking branch 'origin/main' into SNOW-1794373-api-cove…

61fb586

…rage-data-frame-writer-insert-into

update changelog

5da3d71

sfc-gh-aling requested review from a team as code owners January 8, 2025 21:26

sfc-gh-aling requested review from sfc-gh-yuwang, sfc-gh-aalam and sfc-gh-jrose January 8, 2025 21:26

sfc-gh-aling added 2 commits January 8, 2025 14:03

fix doc test

6d41a99

Merge branch 'main' into SNOW-1794373-api-coverage-data-frame-writer-…

96f0abf

…insert-into

sfc-gh-aalam reviewed Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-1794373: Support DataFrameWriter.insertInto/insert_into #2835

SNOW-1794373: Support DataFrameWriter.insertInto/insert_into #2835

sfc-gh-aling commented Jan 8, 2025 •

edited

Loading

sfc-gh-aalam Jan 10, 2025

sfc-gh-aalam Jan 10, 2025

sfc-gh-aalam Jan 10, 2025

SNOW-1794373: Support DataFrameWriter.insertInto/insert_into #2835

Are you sure you want to change the base?

SNOW-1794373: Support DataFrameWriter.insertInto/insert_into #2835

Conversation

sfc-gh-aling commented Jan 8, 2025 • edited Loading

sfc-gh-aalam Jan 10, 2025

Choose a reason for hiding this comment

sfc-gh-aalam Jan 10, 2025

Choose a reason for hiding this comment

sfc-gh-aalam Jan 10, 2025

Choose a reason for hiding this comment

sfc-gh-aling commented Jan 8, 2025 •

edited

Loading