-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1794373: Support DataFrameWriter.insertInto/insert_into #2835
base: main
Are you sure you want to change the base?
Changes from all commits
0e5f74d
c384722
61fb586
5da3d71
6d41a99
96f0abf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,7 @@ | |
) | ||
from snowflake.snowpark.async_job import AsyncJob, _AsyncResultType | ||
from snowflake.snowpark.column import Column, _to_col_if_str | ||
from snowflake.snowpark.exceptions import SnowparkClientException | ||
from snowflake.snowpark.functions import sql_expr | ||
from snowflake.snowpark.mock._connection import MockServerConnection | ||
from snowflake.snowpark.row import Row | ||
|
@@ -913,4 +914,45 @@ def parquet( | |
**copy_options, | ||
) | ||
|
||
@publicapi | ||
def insert_into( | ||
self, table_name: Union[str, Iterable[str]], overwrite: bool = False | ||
) -> None: | ||
""" | ||
Inserts the content of the DataFrame to the specified table. | ||
It requires that the schema of the DataFrame is the same as the schema of the table. | ||
|
||
Args: | ||
table_name: A string or list of strings representing table name. | ||
If input is a string, it represents the table name; if input is of type iterable of strings, | ||
it represents the fully-qualified object identifier (database name, schema name, and table name). | ||
overwrite: If True, the content of table will be overwritten. | ||
If False, the data will be appended to the table. Default is False. | ||
|
||
Example:: | ||
|
||
>>> # save this dataframe to a json file on the session stage | ||
>>> df = session.create_dataframe([["John", "Berry"]], schema = ["FIRST_NAME", "LAST_NAME"]) | ||
>>> df.write.save_as_table("my_table", table_type="temporary") | ||
>>> df2 = session.create_dataframe([["Rick", "Berry"]], schema = ["FIRST_NAME", "LAST_NAME"]) | ||
>>> df2.write.insert_into("my_table") | ||
>>> session.table("my_table").collect() | ||
[Row(FIRST_NAME='John', LAST_NAME='Berry'), Row(FIRST_NAME='Rick', LAST_NAME='Berry')] | ||
""" | ||
full_table_name = ( | ||
table_name if isinstance(table_name, str) else ".".join(table_name) | ||
) | ||
validate_object_name(full_table_name) | ||
qualified_table_name = ( | ||
parse_table_name(table_name) if isinstance(table_name, str) else table_name | ||
) | ||
|
||
target_table = self._dataframe._session.table(qualified_table_name) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does it work in pyspark even when table does not exist? I think this would fail if table doesn't exist. |
||
if target_table.schema != self._dataframe.schema: | ||
raise SnowparkClientException( | ||
f"Schema of the DataFrame: {self._dataframe.schema} does not match the schema of the table {full_table_name}: {target_table.schema}." | ||
) | ||
Comment on lines
+951
to
+954
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if we should do this check on the client side. I think it is possible to append the data by type coercion even though the schemas are not an exact match. |
||
self.save_as_table(table_name, mode="truncate" if overwrite else "append") | ||
|
||
insertInto = insert_into | ||
saveAsTable = save_as_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reminder to add a TODO to built ast for this API https://snowflakecomputing.atlassian.net/browse/SNOW-1489960