Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1794373: Support DataFrameWriter.insertInto/insert_into #2835

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-aling
Copy link
Contributor

@sfc-gh-aling sfc-gh-aling commented Jan 8, 2025

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-NNNNNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

Implement DataFrameWriter.insert_into/insertInto which adds dataframe content to existing table requiring same schema;
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameWriter.insertInto.html

support in both live and local testing mode

Comment on lines +942 to +944
full_table_name = (
table_name if isinstance(table_name, str) else ".".join(table_name)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder to add a TODO to built ast for this API https://snowflakecomputing.atlassian.net/browse/SNOW-1489960

Comment on lines +951 to +954
if target_table.schema != self._dataframe.schema:
raise SnowparkClientException(
f"Schema of the DataFrame: {self._dataframe.schema} does not match the schema of the table {full_table_name}: {target_table.schema}."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should do this check on the client side. I think it is possible to append the data by type coercion even though the schemas are not an exact match.

parse_table_name(table_name) if isinstance(table_name, str) else table_name
)

target_table = self._dataframe._session.table(qualified_table_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work in pyspark even when table does not exist? I think this would fail if table doesn't exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants