Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default template incompatible with uv, rye, pdm #4116

Closed
astrojuanlu opened this issue Aug 23, 2024 · 25 comments · Fixed by #4263
Closed

Default template incompatible with uv, rye, pdm #4116

astrojuanlu opened this issue Aug 23, 2024 · 25 comments · Fixed by #4263
Assignees

Comments

@astrojuanlu
Copy link
Member

astrojuanlu commented Aug 23, 2024

Description

uv add <dep> fails with our default template (projects created with kedro new --tools).

(And so do rye and pdm)

Updated after tech design:
Things to do:

  • There was consensus on the idea of moving dependencies to the [project.dependencies] table of pyproject.toml
  • Add pip install -e . in requirements.txt
  • After a bit more discussion there was convergence around the idea of leaving a -e . with some comments + "environment dependencies" (for example ipython, pytest, ruff, which aren't needed to run the pipelines themselves)

Context

uv expects to be able to write to the [project.dependencies] table, but we're using dynamic dependencies, so in a way this is expected to fail:

Steps to Reproduce

uvx kedro new --name=kedro-telemetry-test-starter --tools=none --example=n
cd kedro-telemetry-test-starter
uv add kedro

Expected Result

Success (no changes to pyproject.toml, uv.lock is created)

Actual Result

Using Python 3.12.3 interpreter at: /opt/homebrew/opt/[email protected]/bin/python3.12
Creating virtualenv at: .venv
warning: No `requires-python` value found in the workspace. Defaulting to `>=3.12`.
error: Failed to build: `kedro-telemetry-test-starter @ file:///private/tmp/kedro-telemetry-test-starter`
  Caused by: Build backend failed to determine extra requires with `build_editable()` with exit status: 1
--- stdout:
configuration error: You cannot provide a value for `project.dependencies` and list it under `project.dynamic` at the same time
DESCRIPTION:
    According to PEP 621:  Build back-ends MUST raise an error if the metadata
    specifies a field statically as well as being listed in dynamic.

GIVEN VALUE:
    {
        "dependencies": [
            "kedro"
        ],
        "...": " # ...",
        "dynamic": [
            "dependencies",
            "version"
        ]
    }

OFFENDING RULE: 'PEP 621'

DEFINITION:
    {
        "see": "https://packaging.python.org/en/latest/specifications/pyproject-toml/#dynamic"
    }
--- stderr:
Traceback (most recent call last):
  File "<string>", line 14, in <module>
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/setuptools/build_meta.py", line 463, in get_requires_for_build_editable
    return self.get_requires_for_build_wheel(config_settings)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/setuptools/build_meta.py", line 332, in get_requires_for_build_wheel
    return self._get_build_requires(config_settings, requirements=[])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/setuptools/build_meta.py", line 302, in _get_build_requires
    self.run_setup()
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/setuptools/build_meta.py", line 318, in run_setup
    exec(code, locals())
  File "<string>", line 1, in <module>
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/setuptools/__init__.py", line 111, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 158, in setup
    dist.parse_config_files()
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/_virtualenv.py", line 22, in parse_config_files
    result = old_parse_config_files(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/setuptools/dist.py", line 606, in parse_config_files
    pyprojecttoml.apply_configuration(self, filename, ignore_option_errors)
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 70, in apply_configuration
    config = read_configuration(filepath, True, ignore_option_errors, dist)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 135, in read_configuration
    validate(subset, filepath)
  File "/Users/juan_cano/.cache/uv/builds-v0/.tmpH11hii/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 59, in validate
    raise ValueError(f"{error}\n{summary}") from None
ValueError: invalid pyproject.toml config: `project.dependencies`.
configuration error: You cannot provide a value for `project.dependencies` and list it under `project.dynamic` at the same time

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 0.19.8
  • Python version used (python -V): 3.12
  • Operating system and version: macOS Sonoma
@astrojuanlu astrojuanlu changed the title Default template incompatible with uv Default template incompatible with uv, rye, pdm Aug 23, 2024
@astrojuanlu
Copy link
Member Author

(Writing this with uv in mind but it applies to rye and pdm as well)

(For the record, the default template is also incompatible with Poetry, but for different reasons. This might change once Poetry 2.0 ships with PEP 621 support. For now, we're not making any effort to support Poetry, see #1722.)

Notice that there's not such thing as "migrating the template to uv": our pyproject.toml is already compatible with PEP 621 so there's nothing to migrate as long as we stay within the bounds of the standard. The problem is our use of experimental support for requirements.txt-powered dynamic dependencies in setuptools.

So there's several things we can do about this issue:

  • Nothing.
    • Less than ideal if uv gets popular (and I think it will).
  • Duplicate dependencies so they're both in pyproject.toml and requirements.txt.
    • Probably a mess to maintain, and also quite error prone.
  • Move all dependencies to pyproject.toml and remove requirements.txt.
  • Move all dependencies to pyproject.toml and replace the contents of requirements.txt with -e ..
    • This might look "weird" but it addresses the problem 👍🏼 and also keeps pip install -r requirements.txt workflows working 👍🏼 (with the addition that the project itself would be installed)

@Vaslo
Copy link
Contributor

Vaslo commented Aug 24, 2024

Small sample of two people here but I will say after the most recent changes to uv I'd really love to see alignment between this project and uv. Makes 100% sense why Kedro is the way it is given the ubiquity and common practice of using pip, but I'm also seeing more projects move in the this better direction that rye/uv/poetry are taking. Thanks for thinking about it in the meantime.

@pietroppeter
Copy link

I am very new to both kedro and uv but I am wondering, is it possible as a first step to have a starter template that will have the same toml setup as uv? (see also #681 (comment))

@astrojuanlu
Copy link
Member Author

astrojuanlu commented Aug 27, 2024

Workaround from #681 (comment) :

$ uv init --lib
$ uvx kedro-init .
$ uv add kedro
$ uv run kedro registry list
$ uv run kedro pipeline create data_processing

is it possible as a first step to have a starter template that will have the same toml setup as uv?

In principle we'd like to avoid the proliferation of starters. Switching to a static [project.dependencies] table seems (hopefully) like a low-effort thing that would make our templates compatible with all modern workflow managers.

Watch this space and thanks for the patience 🙏🏼

@pietroppeter
Copy link

sounds reasonable (wishing to avoid templates).

as reported here #681 (comment) the workaround works for me and it seems that the rest of the missing things to add are straightforward (I honestly do not mind adding them one by one, it helps me understand better the system instead of using a template).

Thanks for the support, I am much more confident now in starting with uv and kedro together and will keep a look on how it evolve this. Have a great day!

@deepyaman
Copy link
Member

deepyaman commented Aug 27, 2024 via email

@astrojuanlu
Copy link
Member Author

Another workaround: https://github.com/astrojuanlu/copier-kedro

@astrojuanlu
Copy link
Member Author

I said users can be educated over a year ago in the thread you linked: #2569 (comment) I think it's well past time Kedro following modern best practices--this was always part of the value prop of the template. I don't like the weird requirements.txt with just -e . in there; if you want to document an alternative path in the FAQ, fine, but people shouldn't be defaulting to this weird flow.

@deepyaman Yeah and back then I was still on the fence...

On one hand, yes people should understand that library workflows need pip install -e ., full stop.

However,

  • Maybe this workflow is, and has always been, too low level. In fact, uv run automatically does it for you (and possibly other workflow tools too?). So maybe we shouldn't lean too heavily on that?
  • Kedro projects are libraries and applications. We currently (mis)use requirements.txt for unpinned dependencies, but there's a point to be made about having unpinned library dependencies in pyproject.toml, and a resolved version of those in requirements.txt. In fact, I was routinely using uv pip compile --universal pyproject.toml -o requirements.txt. Until we see the outcome of the ongoing PEP 751 discussions, requirements.txt is the de-facto standard for dependency locking, and I don't think it would be wise to completely remove it from Kedro templates.

That being said, I admit that my -e . proposal doesn't really use requirements.txt files as lockfiles. We could instead do something like

$ grep 'dependencies =' -A12 pyproject.toml
dependencies = [
  "ipython>=8.10",
  "jupyterlab>=3.0",
  "kedro~=0.19.8",
  "kedro-datasets[pandas-csvdataset, pandas-exceldataset, pandas-parquetdataset]>=3.0; python_version >= '3.9'",
  "kedro-datasets[pandas.CSVDataset, pandas.ExcelDataset, pandas.ParquetDataset]>=1.0; python_version < '3.9'",
  "kedro-viz>=6.7.0",
  "kedro[jupyter]",
  "notebook",
  "scikit-learn~=1.5.1; python_version >= '3.9'",
  "scikit-learn<=1.4.0,>=1.0; python_version < '3.9'",
]
dynamic = [ "version",]
$ cat requirements.in 
-e .
$ uv pip compile --universal requirements.in -o requirements.txt
...
$ head requirements.txt
# This file was autogenerated by uv via the following command:
#    uv pip compile --universal requirements.in -o requirements.txt
-e .
    # via -r requirements.in
aiofiles==24.1.0
    # via kedro-viz
annotated-types==0.7.0
    # via pydantic
antlr4-python3-runtime==4.9.3
    # via omegaconf

in other words, ship a fully resolved and universal requirements.txt in our template, and maintain the unpinned, library dependencies in pyproject.toml, hence fixing this issue and converging with the rest of the Python ecosystem.

@astrojuanlu
Copy link
Member Author

We discussed this on 2024-08-28 Tech Design. Some notes:

  • We clarified that it's only the uv/poetry/rye/pdm add workflow that breaks, users could potentially keep adding dependencies to their requirements.txt, currently consumed by setuptools
  • Although there was no support for the "do nothing" option
  • There was consensus on the idea of moving dependencies to the [project.dependencies] table of pyproject.toml, the only question left being what to do with requirements.txt
  • After a bit more discussion there was convergence around the idea of leaving a -e . with some comments + "environment dependencies" (for example ipython, pytest, ruff, which aren't needed to run the pipelines themselves)
    • This has the advantage of retaining the requirements.txt and, as @DimedS and @noklam pointed out, also the added benefit of installing the project itself in editable mode, which in turn helps with pytest and other tools
  • As an aside, @deepyaman and @idanov were of the opinion that even removing the requirements.txt technically this wouldn't be a breaking change, and that it would only affect newly created projects.
  • Still, @merelcht and others expressed some reluctance to the idea of outright removing the file, especially given that there's no standard lockfile format in Python (until PEP 751 or a successor are approved)
    • @noklam did some research and found that some projects have removed requirements.txt, but many others haven't (for example pytorch, polars. pandas, fastapi)
  • @merelcht also suggested creating or documenting an automatic process that would help users migrate their dependencies from requirements.txt to pyproject.toml
  • @DimedS suggested updating our docs with pip install [-e] . rather than pip install -r requirements.txt, but there wasn't enough time to discuss the implications of that
    • And in addition, if we do add -e . to requirements.txt, both workflows would be equivalent, save for "environment dependencies"

Extra links:

@astrojuanlu
Copy link
Member Author

@astrojuanlu
Copy link
Member Author

From the 2023 Python survey https://lp.jetbrains.com/python-developers-survey-2023/

image

@astrojuanlu
Copy link
Member Author

Another thing we should probably amend:

warning: No `requires-python` value found in the workspace. Defaulting to `>=3.10`.

Our template and starters should have a requires-python that's consistent with our Python version support policy

@Vaslo
Copy link
Contributor

Vaslo commented Sep 2, 2024

Another workaround: https://github.com/astrojuanlu/copier-kedro

I ran the two commands to get the copier to run successfully and messed around to try and get uv working with Kedro. The only issue is that it didn't create a few of the folders like notebook and data (and it's child folders). Did I do something wrong or is there a different way to do this?

@astrojuanlu
Copy link
Member Author

@Vaslo good point. Try again :)

@Vaslo
Copy link
Contributor

Vaslo commented Sep 3, 2024

@Vaslo good point. Try again :)

This edition is great! Thank you for taking a few minutes to do this. I'll work with this process until all the uv stuff is a little more mature. I'm having a few issues with a package on that end that I'm trying to iron out as well, but it's forcing me to learn both the Kedro and uv way.

@noklam
Copy link
Contributor

noklam commented Sep 9, 2024

Things to do:

  • There was consensus on the idea of moving dependencies to the [project.dependencies] table of pyproject.toml, the only question left being what to do with requirements.txt
  • After a bit more discussion there was convergence around the idea of leaving a -e . with some comments + "environment dependencies" (for example ipython, pytest, ruff, which aren't needed to run the pipelines themselves)

@astrojuanlu
Copy link
Member Author

A bit more data to support this: uv is now the third most common installer for Kedro in the entire history of the project, ahead of Poetry

(query)

image

@egerlach
Copy link

egerlach commented Sep 24, 2024

I'm starting to muck around with kedro today and found that I could get it running within a PDM-managed virtual environment pretty easily. If I ran pdm init after kedro new, then it noticed that there was a requirements.txt present and offered to import it.

I'm only at the point of doing the tutorial (which I'm going to continue tomorrow), but I think the only QoL feature I would like as a PDM user is to have the kedro cmd detect that I'm in a PDM environment by checking for the .pdm_python file and using that Python environment to run any operations in.

EDIT: A few minor conflicts came up when I tried to package the spaceflights demo:

  • Importing requirements.txt directly into the pyproject.toml conflicted with dependencies being present in project.dynamic in pyproject.toml
    • Also removed dependencies from tool.setuptools.dynamic
  • PDM adds a project.version by default which conflicts with version being present in project.dynamic (I removed PDM's addition there)

@lrcouto
Copy link
Contributor

lrcouto commented Oct 31, 2024

Regarding the possibility of putting -e . on the requirements.txt file to mantain the pip install -r requirements.txt workflow while still installing from pyproject.toml:

This does work when testing manually, but when running this change through our CI, Behave does not play well with it and it causes our e2e tests to fail.

ERROR:  . is not a valid editable requirement. It should either be a path to a local project or a VCS URL (beginning with bzr+http, bzr+https, bzr+ssh, bzr+sftp, bzr+ftp, bzr+lp, bzr+file, git+http, git+https, git+ssh, git+git, git+file, hg+file, hg+http, hg+https, hg+ssh, hg+static-http, svn+ssh, svn+http, svn+https, svn+svn, svn+file).

Using only . instead of -e . works with the e2e tests, but then we don't get the editable install functionality.

Because of that plus the fact that we're using the pip install -r requirements.txt method of installing dependencies on several of our tests, recommend it on multiple places on our documentation and docstrings, and it's still the most popular way to install python dependencies, I'd like to suggest breaking this task in two parts. The first would be to ensure you can use our project templates with the aforementioned package managers without issue. The second would be to remove the duplication in requirements.txt while ensure everything still works.

What do you guys think?

@astrojuanlu
Copy link
Member Author

Hmm I think that error has more to do with how do we manually unpack the requirements.txt:

def _install_project_requirements(context):
install_reqs = (
Path("kedro/templates/project/{{ cookiecutter.repo_name }}/requirements.txt")
.read_text(encoding="utf-8")
.splitlines()
)
install_reqs = [req for req in install_reqs if "{" not in req and "#" not in req]
install_reqs.append("kedro-datasets[pandas-csvdataset]")
call([context.pip, "install", *install_reqs], env=context.env)

In the end, this will be calling ["pip", "install", "-e ."], which is equivalent to this cursed call (⚠️ notice the quotes!):

❯ pip install "-e ."
ERROR:  . is not a valid editable requirement. It should either be a path to a local project or a VCS URL (beginning with bzr+http, bzr+https, bzr+ssh, bzr+sftp, bzr+ftp, bzr+lp, bzr+file, git+http, git+https, git+ssh, git+git, git+file, hg+file, hg+http, hg+https, hg+ssh, hg+static-http, svn+ssh, svn+http, svn+https, svn+svn, svn+file).
❯ pip install -e .  # Proceeds normally lol
Obtaining file:///Users/juan_cano/Projects/QuantumBlackLabs/Kedro/kedro
  Installing build dependencies ...

@lrcouto
Copy link
Contributor

lrcouto commented Nov 8, 2024

Reposting what I posted previously on slack here, on the matter of requirements.txt:

Found a possible workaround for the requirements.txt file issue.

Putting "-e" + the absolute path to the project instead of "-e ." on requirements.txt will work when running pip install -r requirements.txt and also will not break Behave. So the idea I had was to write the project path on the requirements file on the moment of project creation.

Something like:

        result_path = cookiecutter(template=template_path, **cookiecutter_args)
        with open(result_path + "/requirements.txt", "w") as requirements_file:
            requirements_file.write("-e " + result_path)

However, because during the tests we pull the latest release of kedro-starters (and that fallback we implemented earlier will not help here because there is no version change), the tests will still break because it's pulling the version of the starter pyproject files that is using dynamic dependencies and pointing to requirements.txt

So I think the easiest way to handle this would be:

Merge #4263 and kedro-org/kedro-starters#246 as they are.
This will leave the main branch of Kedro in a way in which creating a project with the default template would have all of the dependencies on pyproject.toml, and using a starter template will not, unless you use the --checkout flag to point to the main branch of kedro-starters. Because, by default, project creation will use the latest kedro-starters release. But things will not break, because we'll still have requirements.txt as it is.

Then, after the next Kedro release, make the changes to deal with requirements.txt. It will be easier to test then, and it's easier to guarantee it won't break any of the existing workflows.

@galenseilis
Copy link

galenseilis commented Nov 8, 2024

Note that Rye is just using uv under-the-hood: astral-sh/rye#1342

@galenseilis
Copy link

Does this incompatibility also exist for Hatch?

https://hatch.pypa.io/latest/

@astrojuanlu
Copy link
Member Author

astrojuanlu commented Nov 9, 2024

@galenseilis Even though there's no hatch add command, Hatch expects dependencies in the [project.dependencies] table, so I guess the answer is Yes

@galenseilis
Copy link

@galenseilis Even though there's no hatch add command, Hatch expects dependencies in the [project.dependencies] table, so I guess the answer is Yes

Thanks for looking into that!

I also had not noticed that the CLI commands for hatch are not as flushed out as the other tools. Thank you for raising my awareness of that.

@github-project-automation github-project-automation bot moved this from In Review to Done in Kedro Framework Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

10 participants