Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update modin dependency to 0.30.1 #1965

Merged
merged 27 commits into from
Oct 28, 2024
Merged

Conversation

sfc-gh-joshi
Copy link
Contributor

@sfc-gh-joshi sfc-gh-joshi commented Jul 23, 2024

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1552497

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
  3. Please describe how your code solves the related issue.

This PR updates modin to 0.30.1, which unpins the pandas patch version and fixes bugs related to the extensions/docstring modules.

Dependency-related changes:

  • Moves the pinned modin version from 0.28.1 to 0.30.1--it is available in Snowflake anaconda but currently blocked on prod/preprod, and must be manually enabled for testing accounts in those environments. We will unblock modin 0.30.1 when the release containing this change is completed.
  • Adds a new job to the daily test (test-pandas-patch-versions), which runs the Snowpark pandas daily test suite on AWS with Python 3.9 against all supported pandas versions currently available in anaconda (2.1 and 2.2; we should add 2.3 once it's available in Snowflake anaconda)
  • Some native pandas error messages checked in tests have changed; relevant tests now check the error message based on pandas version.
  • IMPORTANT The most recent available versions of snowflake-snowpark-python available in Anaconda pin the modin dependency to 0.28.1, so attempting to specify modin==0.30.1 when creating a stored procedure makes the solver pick an older version of snowflake-snowpark-python from before Snowpark pandas was added (1.16.0). As a workaround, in test_modin_stored_procedures.py we need to upload snowflake-snowpark-python as a staged package. Calling apply and applymap (which both create UDFs) within a stored procedure created in this way fails; I have XFAILed those tests for now and need help debugging them (either in this PR or as a separate one). Within test_modin_stored_procedures.py we pin pandas==2.2.1 and modin==0.28.1 to test the version available in Anaconda; other UDF/stored proc-creating tests are XFAILed when the pandas version is 2.2.3 or newer.

Frontend changes:

  • modin 0.30.1 introduces the pd.DataFrame.modin accessor object for non-pandas methods, such as pd.DataFrame.modin.to_pandas and pd.DataFrame.modin.to_ray. We will automatically raise NotImplementedError for all methods on this accessor object except to_pandas.
  • DataFrame/Series.drop_duplicates was changed to use a different query compiler method to better support distributed backends; for now I've added back an override with the old frontend implementation.
  • Series.unique was changed to have an additional to_pandas() call, which does not affect the number of queries since the old implementation called to_numpy(), but does change the return type. I've added back the old frontend implementation as an override.

@sfc-gh-joshi sfc-gh-joshi added NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs and removed snowpark-pandas labels Jul 23, 2024
@sfc-gh-dpetersohn sfc-gh-dpetersohn marked this pull request as ready for review July 31, 2024 18:50
@sfc-gh-dpetersohn sfc-gh-dpetersohn requested review from a team as code owners July 31, 2024 18:50
@sfc-gh-azhan
Copy link
Collaborator

When modin 0.30.1 will be available in Snowflake anaconda?

@sfc-gh-dpetersohn
Copy link
Contributor

@sfc-gh-azhan should be very soon. Do you know why all the tests would pass even though it is in Anaconda yet?

@sfc-gh-azhan
Copy link
Collaborator

@sfc-gh-azhan should be very soon. Do you know why all the tests would pass even though it is in Anaconda yet?

Those tests here are using pypi. Only the Sproc tests or Snowflake streamlit or notebook tests relies on Anaconda.

@sfc-gh-azhan
Copy link
Collaborator

Once I done with https://snowflakecomputing.atlassian.net/browse/SNOW-1563225, I could run your PR for those sproc tests and I believe they will fail before modin is upgraded in Anaconda.

@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1552497-modin-0.30.1 branch from 73ecc02 to 941c43d Compare August 19, 2024 18:37
@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1552497-modin-0.30.1 branch from 941c43d to 56f53b8 Compare October 1, 2024 21:48
@sfc-gh-joshi sfc-gh-joshi marked this pull request as draft October 1, 2024 21:49
@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1552497-modin-0.30.1 branch 3 times, most recently from 309d69f to ed226f5 Compare October 9, 2024 22:13
@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1552497-modin-0.30.1 branch from 611471e to a9add72 Compare October 11, 2024 00:02
@sfc-gh-joshi sfc-gh-joshi marked this pull request as ready for review October 11, 2024 00:03
.github/workflows/daily_modin_precommit.yml Outdated Show resolved Hide resolved
.github/workflows/precommit.yml Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
tests/integ/modin/test_modin_stored_procedures.py Outdated Show resolved Hide resolved
tests/integ/modin/test_modin_stored_procedures.py Outdated Show resolved Hide resolved
@sfc-gh-jjiao
Copy link
Contributor

@sfc-gh-joshi Thanks for the PR, I like the very clear description with enough details.

@sfc-gh-azhan
Copy link
Collaborator

tox.ini Show resolved Hide resolved
.github/workflows/daily_modin_precommit.yml Outdated Show resolved Hide resolved
.github/workflows/daily_modin_precommit.yml Outdated Show resolved Hide resolved
.github/workflows/daily_modin_precommit.yml Outdated Show resolved Hide resolved
tests/integ/modin/test_modin_stored_procedures.py Outdated Show resolved Hide resolved
tests/integ/modin/test_modin_stored_procedures.py Outdated Show resolved Hide resolved
@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi-SNOW-1552497-modin-0.30.1 branch from ce6755c to 5f59431 Compare October 23, 2024 20:07
@sfc-gh-joshi sfc-gh-joshi requested a review from a team as a code owner October 23, 2024 20:07
Copy link
Contributor

@sfc-gh-helmeleegy sfc-gh-helmeleegy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for keeping our dependencies up-to-date. (But please resolve all open conversations before merging.)

@sfc-gh-joshi
Copy link
Contributor Author

(from discussion on slack) Holding this PR until after 1.24 is cut.

@sfc-gh-joshi sfc-gh-joshi enabled auto-merge (squash) October 28, 2024 19:57
@sfc-gh-joshi sfc-gh-joshi merged commit db1b634 into main Oct 28, 2024
39 checks passed
@sfc-gh-joshi sfc-gh-joshi deleted the joshi-SNOW-1552497-modin-0.30.1 branch October 28, 2024 20:21
@github-actions github-actions bot locked and limited conversation to collaborators Oct 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants