-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update modin dependency to 0.30.1 #1965
Conversation
When modin 0.30.1 will be available in Snowflake anaconda? |
@sfc-gh-azhan should be very soon. Do you know why all the tests would pass even though it is in Anaconda yet? |
Those tests here are using pypi. Only the Sproc tests or Snowflake streamlit or notebook tests relies on Anaconda. |
Once I done with https://snowflakecomputing.atlassian.net/browse/SNOW-1563225, I could run your PR for those sproc tests and I believe they will fail before modin is upgraded in Anaconda. |
73ecc02
to
941c43d
Compare
941c43d
to
56f53b8
Compare
309d69f
to
ed226f5
Compare
611471e
to
a9add72
Compare
@sfc-gh-joshi Thanks for the PR, I like the very clear description with enough details. |
I started https://ci-dev-142.int.snowflakecomputing.com/job/SnowparkPandasStoredProcPrecommitTest/158/. Let's see how it goes. @sfc-gh-joshi |
ce6755c
to
5f59431
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for keeping our dependencies up-to-date. (But please resolve all open conversations before merging.)
(from discussion on slack) Holding this PR until after 1.24 is cut. |
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-1552497
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
This PR updates modin to 0.30.1, which unpins the pandas patch version and fixes bugs related to the extensions/docstring modules.
Dependency-related changes:
test-pandas-patch-versions
), which runs the Snowpark pandas daily test suite on AWS with Python 3.9 against all supported pandas versions currently available in anaconda (2.1 and 2.2; we should add 2.3 once it's available in Snowflake anaconda)snowflake-snowpark-python
available in Anaconda pin the modin dependency to0.28.1
, so attempting to specifymodin==0.30.1
when creating a stored procedure makes the solver pick an older version ofsnowflake-snowpark-python
from before Snowpark pandas was added (1.16.0).As a workaround, inWithintest_modin_stored_procedures.py
we need to uploadsnowflake-snowpark-python
as a staged package. Callingapply
andapplymap
(which both create UDFs) within a stored procedure created in this way fails; I have XFAILed those tests for now and need help debugging them (either in this PR or as a separate one).test_modin_stored_procedures.py
we pinpandas==2.2.1
andmodin==0.28.1
to test the version available in Anaconda; other UDF/stored proc-creating tests are XFAILed when the pandas version is 2.2.3 or newer.Frontend changes:
pd.DataFrame.modin
accessor object for non-pandas methods, such aspd.DataFrame.modin.to_pandas
andpd.DataFrame.modin.to_ray
. We will automatically raise NotImplementedError for all methods on this accessor object exceptto_pandas
.DataFrame
/Series.drop_duplicates
was changed to use a different query compiler method to better support distributed backends; for now I've added back an override with the old frontend implementation.Series.unique
was changed to have an additionalto_pandas()
call, which does not affect the number of queries since the old implementation calledto_numpy()
, but does change the return type. I've added back the old frontend implementation as an override.