Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a --group option for installing from [dependency-groups] found in pyproject.toml files #13065

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sirosen
Copy link
Contributor

@sirosen sirosen commented Nov 5, 2024

Update: This PR has undergone a major revision. See this comment for the revised proposal.
The initial comment here is preserved, as it contains useful context and is necessary to understand the following discussion.


This changeset implements --group as an option for loading dependency groups, leveraging the dependency-groups library1.
resolves #12963

Example usage

pip install --group GROUP1 --group GROUP2
pip download --group GROUP1

pyproject.toml loading and working dir

Per the discussion in #12963 , this is implemented with support only for loading data out of the pyproject.toml file found in the working directory.

Although in theory there might be a desire to have any of the following interfaces in the future...

pip install --group GROUP1 --pyproject-file /scratch/foo/pyproject.toml
PIP_PYPROJECT_PATH=/scratch/foo/pyproject.toml pip install --group GROUP1
pip install --project-dir /scratch/foo --group GROUP1

...there is not a clear "winner" amongst these and they are only usable with --group.
Until or unless there is a more strongly demonstrated need to specify the path in some way, no attempt is made here to support it.2

--group as the Option Name

--group is known to match the option provided by uv for installation from groups.
There are some lines of argument in favor (e.g., the initial post which mentions this), and some arguments against (e.g., my note that identical option names could incorrectly imply identical behavior).
Ultimately, I chose to propose --group because of a UV-independent argument that it is short and declarative.

I am open to arguments from pip maintainers in favor of other names -- but I think --group is a good name and that "potential confusion with uv" is a bad reason not to choose it if it's the best name for the feature.
Any alternative name should have some reason that it would also be good for users. e.g., --dependency-group GROUP1 could work because it's very explicit.

--group as a repeated option vs comma-delimited --groups

I originally proposed an interface along the lines of --groups foo,bar,baz in #12963.
I'll just make a quick note that as soon as I started implementing this, I could not see any particularly good reason to add specialized parsing.
Option parsing can collect these for us trivially, and it removes the potential for ambiguity in cases like --groups foo,bar --groups foo,baz.

Wrapped dependency-groups lib

Use of dependency-groups passes through a wrapper module which conforms the errors from that library to InstallationError.
Details of the implementation do (intentionally) leak out, most notably the error strings.

This can be revised if there's a desire for pip to control the error messages more tightly.

Tests

It was not clear to me exactly how much effort I should invest in unit and functional tests.
I therefore generally tried to follow the guide of "test the subset of features which map to existing tests of requirements.txt files".

New unit tests and functional tests explore some particular cases, but they are intentionally a little bit limited.
If desirable, we can retest most of the behaviors provided by dependency_groups.resolve under the unit tests.

Commits & Review

Here's a summary of the (at time of writing) commit series:

  • Add 'dependency-groups==1.3.0' to vendored libs
  • Implement Dependency Group option: --group
  • Add unit tests for dependency group loading
  • Add initial functional tests for dependency-groups
  • Add a news fragment for --group support

Notably, the first commit contains the whole vendoring step with no usage. So it should be possible to diff the HEAD of the PR against that commit to see the "real work" here.

Footnotes

  1. As a potentially important aside, I intend to open a thread to move dependency-groups into github.com/pypa/ for better continuity of ownership and maintenance.

  2. Some users will likely be disappointed with this decision. But the nice thing about not deciding anything today is that it can easily be added if a strong argument is made. "No is temporary; yes is forever."

Steps taken:
- add `dependency-groups==1.3.0` to vendor.txt
- add dependency-groups to vendor __init__.py
- run vendoring sync
- examine results to confirm apparent correctness (rewritten tomli
  imports)
`--group` is supported on `download` and `install` commands.
The option is parsed into the more verbose and explicit
`dependency_groups` name on the parsed args.

Both of these commands invoke the same processor for resolving
dependency groups, which loads `pyproject.toml` and resolves the list
of provided groups against the `[dependency-groups]` table.

A small alteration is made to `pip wheel` to initialize
`dependency_groups = []`, as this allows for some lower-level
consistency in the handling of the commands.
A new unit test module is added for parsing dependency groups and used
to verify all of the pip-defined behaviors for handling
dependency-groups.

In one path, the underlying exception message from `dependency-groups`
is exposed to users, where it should offer some explanation of why
parsing failed, and this is therefore tested.

Some related changes are applied to the dependency groups usage sites
in the src tree. The signature of the dependency group requirement
parse function is simplified, and its usage is therefore updated.
A bugfix is applied to add a missing `f` on an intended f-string.
This initial suite of tests is modeled fairly closely on existing
tests for requirements files.

Tests cover the following cases:
- installing an empty dependency group (and nothing else)
- installing from a simple / straightforward group
- installing from multiple groups in a single command
- normalizing names from the CLI and pyproject.toml to match
- applying a constraints file to a dependency-group install
@zanieb
Copy link

zanieb commented Nov 5, 2024

Exciting to see this up!

Per the discussion in #12963 , this is implemented with support only for loading data out of the pyproject.toml file found in the working directory.

To clarify for those that have not read the discussion, I feel like we did not reach consensus that automatically discovering a pyproject.toml in the working directory was the right solution. I'll repeat that I think this is a surprising and significant change for pip.

I'll disclaim that I am one of the people opposed, but so was @pfmoore — I feel like this merits further discussion.

that "potential confusion with uv" is a bad reason not to choose it if it's the best name for the feature.

As a minor note, we have not added dependency group support to the uv pip interface — we're waiting for a name choice here. I still think --group is a good option, but we are very likely to support whatever is chosen here.

@sirosen
Copy link
Contributor Author

sirosen commented Nov 5, 2024

I feel like we did not reach consensus that automatically discovering a pyproject.toml in the working directory was the right solution.

I agree with that assessment. I'm proposing it as an initial option, but am 100% ready to adjust course if there's an alternative with pip maintainer support.

I'll repeat that I think this is a surprising and significant change for pip.

Maybe I'm inferring too much, but does this suggest that there's an alternative UX you would find less surprising?

I've thought up alternatives, but none of them seem clearly better to me.


And 👍 to the note about keeping uv pip / pip in sync.

@pfmoore
Copy link
Member

pfmoore commented Nov 5, 2024

I'm OK with --group, if I'd written the code I would probably have gone with the more explicit --dependency-group, but there's very little in it (--group is shorter to type, for what it's worth...)

It did occur to me when this PR was submitted that we hadn't reached agreement on auto-discovering pyproject.toml, and I'm still a little uncomfortable about it. One concern I have is that we will potentially get people asking us to locate "the project's pyproject.toml when the working directory is not at the root of the project, and that's when we really do need to accept that we're introducing the concept of "the current project" to pip.

I don't know the right answer here. It feels to me like the "traditional" role of pip, as a standalone installer, is being eroded by the ecosystem drift towards the "everything is a project" model. I don't like that, but maybe at some point I have to accept the inevitable and stop blocking things based on an outdated view of what pip is for. I would definitely like to know what the other @pypa/pip-committers think about this, though. There's also a wider discussion about how pip fits into the modern packaging ecosystem that I think the maintainers need to have, but that hasn't happened yet either.

I don't want to block things on the basis of some grand philosophical debate, so if someone says "let's just do it for now and worry about the bigger picture later", I'm OK with that.

@henryiii
Copy link
Contributor

henryiii commented Nov 5, 2024

What about adding a --project-dir option that defaults to .? Then at least it can be made explicit? ${project-dir}/pyproject.toml would then be the path to the pyproject.toml.

FWIW, I also like --group, it's a lot shorter than --dependency-group; short enough I don't think there would be a strong desire for a short option for it. Pip mostly installs dependencies already.

Original post:

I think users would like for this to work without having to add some new --project option to tell pip where the project it - it's specified in -e. already:

pip install -e. --group test

However, then I think most people would expect this to work:

pip install . --group test

which is a lot more dubious; another clearer example would be

pip install ./some/thing ./other/thing --group test

Which isn't clear to me at all which of the two directories would be used - or both. Or neither and just the current directory.

If there was a --project-dir option, this would be solved, though then the command line is really long and repetitive:

pip install -e. --project-dir . --group test

But that's fully explicit. I think either defaulting the project-dir to . or allowing -e to also set the default directory would help shorten the command line usage (both would be too confusing, IMO) - at the expense of potential confusion, but having a clear and consistent rule would help. But maybe it might make sense to start with the "explicit" version, then propose either of those simplifications?

FWIW, I'm pretty happy if this gets in in any state, and I like shorter CLIs if possible. :)

@sirosen
Copy link
Contributor Author

sirosen commented Nov 5, 2024

What about adding a --project-dir option that defaults to .? Then at least it can be made explicit? ${project-dir}/pyproject.toml would then be the path to the pyproject.toml.

This was one option I considered, as well as, similar, --pyproject-file="./pyproject.toml" as a default behavior.
I'll circle back on this below, but I much prefer an option for the filename to one for the dir.

It did occur to me when this PR was submitted that we hadn't reached agreement on auto-discovering pyproject.toml, and I'm still a little uncomfortable about it.

I read some of your comments there as weakly supporting the idea of ./pyproject.toml as the only behavior, but that may have been my mistake.

One concern I have is that we will potentially get people asking us to locate "the project's pyproject.toml when the working directory is not at the root of the project, and that's when we really do need to accept that we're introducing the concept of "the current project" to pip.

I do not like the idea that pip would expand to do any "discovery" process. Even a simple process, like crawling up parent dirs to find a pyproject.toml, would represent a pretty dramatic change in scope.

Is this perhaps an argument in favor of accepting the pyproject file (or dir?) as an option? Any user who asked "why doesn't pip find my pyproject.toml?" would get a pretty easy answer of "because you didn't pass --pyproject-file=... and it's not in the working dir".


I'm much more comfortable with the idea of adding an option for the pyproject.toml file than one for the project dir.

A file option keeps pip uniformly file-oriented. You can have -c, -r, and now --pyproject-file, all of which may drive behaviors of your commands. This adds the notion that pip may have behaviors (this being the first) which are driven by reading a pyproject.toml file, but it doesn't seem to be a huge paradigm shift to me.

A dir option adds the notion that there is a "project directory", and I'm much less clear on what that means. It would only drive pyproject.toml today, but it implies that pip has some non-file-driven behaviors which this controls.

@zanieb
Copy link

zanieb commented Nov 5, 2024

Maybe I'm inferring too much, but does this suggest that there's an alternative UX you would find less surprising?

I made two concrete recommendations in the issue (both of which are similar to @henryiii's suggestions here):

  1. Require --project <dir> to explicitly define the project path. Groups can only be read from the project.
  2. Read groups for for any source tree in the installation.

For discussion purposes, let's list a couple more proposals:

  1. Discover the pyproject.toml in the working directory
  2. Discover the pyproject.toml in any parent directory

I don't want to just repeat the discussion from there — I think there's a fair bit of context on the upsides and downsides to each of those in the thread already. Unfortunately, I don't think there's an obviously superior option here. I'll try to summarize some thoughts briefly:

(1) is very explicit which is good for teaching but can be repetitive and verbose
(2) is the most intuitive for existing users but hides a lot of complexity which will cause some confusion
(3) is a departure from existing pip behavior (not project or working directory aware)
(4) is what uv does, but it seems like there is consensus this is out of scope for pip right now

Since @sirosen replied while I was authoring this..

I much prefer an option for the filename to one for the dir.

The pyproject.toml filename is standardized. It don't think it make sense to ask users to type it. It'd need to be included in every invocation of the option which makes it feel redundant. If we allowed alternative filenames, I'd feel differently. It does have the benefit of clarifying the expected value for the option and might help with ambiguity I raised previously like pip install --editable . --project . --group test. I worry there are still confusing cases though, like pip install --editable . --pyproject ./pyproject.toml.

@potiuk
Copy link
Contributor

potiuk commented Nov 5, 2024

Fascinating discussion. I am also ok for accepting any option, but I think the best way will be to use --group (and I like short version of it) for all directories that are specified explicitly (either via --editable or not).

Just to make a discussion more concrete - (and provide context) we are eyeing into refactoring airflow monorepo for Airflow 3 (it's partially done already but we are also waiting for this one to land), and we will have 90+ sub-projects eventually in Airflow monorepo (yep, I know it's crazy).

We are now recommending uv for contributors now (due to workspace feature), but also very strong on making sure that pip workflows works for our contributors (even if they are a bit more complex for the contributors to run). We have > 3000 contributors, and I would hate to lose them if - for whatever reason - they are not able to use auv (for example because admins in their corporate entity have some strange rules on what tools could be run). I've already heard stories about "having to configure company proxy for uv` to make it works.

So - for multiple reasons we make sure in our docs and workflows that both uv and pip are supported.

In our case we currently have (this wil change and improve but here it is):

pyproject.toml -> main airflow project
task_sdk
    pyproject.toml   ->  new task_sdk project for Airlfow 3
providers
    pyproject.toml -> 90+ providers - we will split it to 90+ subprojects (90+ pyproject.toml files) 😱  

eventually it will be:

airflow
     pyproject.toml -> main airflow project
task_sdk
    pyproject.toml   ->  new task_sdk project for Airlfow 3
providers
         amazon
               pyproject.toml
         google
               pyproject.toml

What I really would like with groups is:

  • pip install -e ./airflow --group devel -> only install "airflow" with devel group of deps for airflow
  • pip install -e ./airflow -e ./task_sdk --group devel -> install both airlfow and tas_sdk - both with devel group of deps if defined in both
  • pip install -e ./airflow -e ./providers/google --group devel -> install both airlfow and google provider both with devel groups of deps if defined in both (and only from airflow if devel is not defined in google)

Generally i'd be for not having --project flag, but choosing pyproject.toml files coming from the "folders" or "--editable" projects specified explicitly. I somehow find it pretty confusing to specify them independently.

In uv with workspace defined, the --group could work on the whole workspace instead:

  • uv sync --devel woudl automatically install all development deps, additionally we could specify the default group in uv workspace that should be used for uv sync by default (details to be worked out).

@pfmoore
Copy link
Member

pfmoore commented Nov 5, 2024

I read some of your comments there as weakly supporting the idea of ./pyproject.toml as the only behavior, but that may have been my mistake.

Weakly in the sense of "yeah, I guess we might have to do that". It has problems, as you mention, though, so I'm definitely not enthusiastic about it 🙁

Is this perhaps an argument in favor of accepting the pyproject file (or dir?) as an option? Any user who asked "why doesn't pip find my pyproject.toml?" would get a pretty easy answer of "because you didn't pass --pyproject-file=... and it's not in the working dir".

Yes. And stronger, it's an argument for requiring the --pyproject-file option, so we don't even need the "in the working dir" qualifier. The problem with defaulting to the working directory is that people can argue that "the default isn't helpful". Whereas requiring that the user specifies the file every time avoids this by not having a default (helpful or otherwise). If nothing else, I'd argue that we should start with no default, and if experience shows that there's a clear consensus on a default value, we can add it later. Users could experiment with defaults by setting PIP_PYPROJECT_FILE=./pyproject.toml in their environment.

I'm much more comfortable with the idea of adding an option for the pyproject.toml file than one for the project dir.

+1. Your arguments make sense to me. (But I'm not convinced about defaulting the pyproject.toml file).

This adds the notion that pip may have behaviors (this being the first) which are driven by reading a pyproject.toml file, but it doesn't seem to be a huge paradigm shift to me.

There are two aspects I don't like:

  1. It's the first time pip has assumed any sort of file structure. At the moment, we don't default the argument to --editable as ., or the argument of -r as requirements.txt, and we don't assume pip is being run in a source tree. But now we're going to start assuming that --group implies we're running in a source tree. But only when --group is specified. And it doesn't actually need to be a source tree, it could just be a file called pyproject.toml with a dependency groups section and nothing else. It doesn't even need to be called pyproject.toml...
  2. It suggests to users that other behaviours should assume you're working in a source tree. Which will add a maintenance cost of having to explain repeatedly that --group is a special case. Your comment "this being the first" hints at this.

@pfmoore
Copy link
Member

pfmoore commented Nov 5, 2024

Read groups for for any source tree in the installation.

I'm a strong -1 on this. There are so many edge cases that don't have intuitive (or in some cases even plausible) interpretations that I don't even think this is the "user friendly" option. It's deceptively straightforward in simple cases, but will end up causing nasty bugs as soon as people do something unusual.

I suggest that we simply drop this as a possibility, as it feels like people are only thinking about the "obvious" cases, and I have no appetite for coming up with multiple problematic cases just so that people can suggest workaround after workaround. (For example, if a requirements file includes -e some/global/project what happens if the user specifies --group dev and dev is specified in that project as well as the user's project? What if the user doesn't want the dev group from the global project? What if they do?)

@sirosen
Copy link
Contributor Author

sirosen commented Nov 5, 2024

In general, I am only interested in implementing behaviors which are easy to reason about and don't contain avoidable ambiguities. Even within that confined space, we are debating how to create a UX in which most naive users won't be easily confused.

I worry there are still confusing cases though, like pip install --editable . --pyproject ./pyproject.toml.

I think this is a good point of concern. A user could do something like...

pip install -e ./foo --pyproject ./bar/pyproject.toml

and expect... ?
A user who does this intentionally and expects a special result clearly has some incorrect mental model for what's happening.

I'd like to have a solution in which the above usage, or its analogue, emits a warning or error, instructing the user that they're misapplying the options.

And stronger, it's an argument for requiring the --pyproject-file option, so we don't even need the "in the working dir" qualifier. ... Users could experiment with defaults by setting PIP_PYPROJECT_FILE=./pyproject.toml in their environment.

I find this convincing in principle, but I worry that it makes for a very verbose CLI experience for interactive usage.
Is it reasonable to consider adding a short opt to make it easier? e.g. -p / --pyproject.file FILENAME? That allows

pip install -p ./pyproject.toml --group lint

If we go down this path, I have a question:
Are you suggesting that the default be to read a PIP_PYPROJECT_FILE env var if one is set? I would be ok with that. It seems consistent with the env-config loader behavior which exists (though I haven't investigated much).

And an initial expectation about requirements and behavior

  • --pyproject-file as an would be required if you use --group
  • --pyproject-file without group would be an error, indicating that it must be used as --group

The check for --group could be expanded in the future, if other options use --pyproject-file.

I much prefer an option for the filename to one for the dir.

The pyproject.toml filename is standardized. It don't think it make sense to ask users to type it. It'd need to be included in every invocation of the option which makes it feel redundant. If we allowed alternative filenames, I'd feel differently.

This is still on my mind. I agree that it seems redundant, but it also is the most in-keeping with pip not having a built-in notion of projects, workflows, etc.

Does it make a difference that the proposed behavior would allow someone to pass something other than pyproject.toml? For example, you could use, under this proposed UX, the following:1

pip install --group test --pyproject-file ./tox.toml

Maybe that's a point against this, but I want to note it and understand if it elicits a strong positive or negative response from anyone. The behavior as implemented today doesn't have to worry about the filename. I was thinking --pyproject-file would accept an arbitrary filename, and not validate that the basename is pyproject.toml.
If --pyproject-file is expected to check the filename, I want to clarify that now.

Footnotes

  1. And it's probably a bad idea! But it's kind of interesting.

@zanieb
Copy link

zanieb commented Nov 5, 2024

In general, I am only interested in implementing behaviors which are easy to reason about and don't contain avoidable ambiguities

I think this rules out (2) — I'm happy not to discuss that option further as suggested by @pfmoore but it's worth reiterating (as you have) that users will expect this and there should be warnings that guide them to the correct behavior.

Does it make a difference that the proposed behavior would allow someone to pass something other than pyproject.toml?

The idea that this would be allowed is what is most concerning to me about including the filename, especially since it's such a clearly defined standard (a file named pyproject.toml in the root directory of the project).

As a minor note regarding -p, this short flag is already pretty overloaded. In uv it's short for --python (selection of an interpreter) and we also have --package(selection of a workspace member package) and --project (selection of a project directory). I agree a short-flag would be nice to have here. -p is taken in uv though, we would not be able to support it.

As I was looking at the CLI to write the above, I realized we recently added --project to target a project directory in uv (astral-sh/uv#7603). That wasn't why I was pushing for it, but that does mean that it would be the best choice for compatibility — uv pip already supports it today.

@pfmoore
Copy link
Member

pfmoore commented Nov 5, 2024

I find this convincing in principle, but I worry that it makes for a very verbose CLI experience for interactive usage.

I know. That's the biggest drawback here. But I'd rather that we're verbose rather than confusing...

Is it reasonable to consider adding a short opt to make it easier?

I'm against that - while this is important functionality in the wider ecosystem, it's not well aligned to pip's core feature set, and I don't think it deserves one of the limited single-character option names.

Are you suggesting that the default be to read a PIP_PYPROJECT_FILE env var if one is set?

No, pip has a general feature that all command line options can be specified in a config file, or via an environment variable. So simply by having the command line option we automatically get the ability to set a default in those ways.

Does it make a difference that the proposed behavior would allow someone to pass something other than pyproject.toml?

I'm not comfortable about it. But to be fair, the implementation currently allows an invalid pyproject.toml (no [project] section, no [build-system] section). So this is just another aspect of that. If it matters, you could validate the argument to require it to have a filename of pyproject.toml.

@notatallshaw
Copy link
Member

There's a lot of design discussion going on in this PR that to me seems to boil down to:

  1. Should pip have a concept of a project?
  2. And if so what should it assume the project structure looks like?

If the answer to 1 for this PR is "no", I'd like to point out that it doesn't stop pip adding a concept of a project in the future. If --pyproject-file is added now, it doesn't stop there being a future PR that makes pip "project aware"

And if the answer to 1 for this PR is "yes", then I'd like people to consider that the answer for 2 could affect a lot more than just this feature, for example if at some point pip reads it's own configuration out of pyproject.toml (i.e. #13003). So I would caution to consider this design quite carefully, looking at what other tools have done here and what challenges they've faced. I would be in favor of something as minimal and unopinionated about user workflows as possible, but I don't have a strong sense for what that is.

@sirosen
Copy link
Contributor Author

sirosen commented Nov 6, 2024

  1. Should pip have a concept of a project?
  2. And if so what should it assume the project structure looks like?

If the answer to 1 for this PR is "no", I'd like to point out that it doesn't stop pip adding a concept of a project in the future. If --pyproject-file is added now, it doesn't stop there being a future PR that makes pip "project aware"

I'm inclined to answer (1) as "no", at least within the scope of this PR.
My intent was to propose a the most narrow version of this change that I could.

If a narrow change is not possible, I'd rather step back and make a more complete proposal which starts from the notion that pip will be aware of a "current project", to see if that gets traction. But, at least right now, I still believe that a simple and narrow version of this is possible.

Are you suggesting that the default be to read a PIP_PYPROJECT_FILE env var if one is set?

No, pip has a general feature that all command line options can be specified in a config file, or via an environment variable. So simply by having the command line option we automatically get the ability to set a default in those ways.

Ah, that explains why I didn't see the kind of env var logic I expected! I noticed the config loading PIP_* but didn't follow what it meant.

Upcoming update

For now, I plan to update the PR to implement --pyproject-file as required when --group is used, and with errors if either is used alone. I'll update the PR title as well, once I make the change.

I'll also validate that the filename provided is pyproject.toml. I see no strong reason not to validate this.

Idea: --pyproject-path

At the cost of some mildly more complex behavior, it would be possible to accept a path arg which can be the path to a pyproject.toml file or a directory. Thus allowing:

# equivalent
pip install --group foo --pyproject-path .
pip install --group foo --pyproject-path ./pyproject.toml

# equivalent
pip download --group bar --pyproject-path ./baz
pip download --group bar --pyproject-path ./baz/pyproject.toml

with helptext to the tune of

--pyproject-path   The path to a pyproject.toml file or a directory containing one.
                   Used to resolve `--group` options to `[dependency-groups]`.

Is this a good idea, worth pursuing? It adds some slightly more elaborate behavior for a mildly better user-experience. Because it's still described as "the file or dir containing the file", it doesn't give up the "file and path oriented" direction.

I'll update the PR without it for now, but am happy to switch to it if it seems like a good solution.

@pfmoore
Copy link
Member

pfmoore commented Nov 6, 2024

Is this a good idea, worth pursuing?

That works for me. The --python option uses similar logic.

@potiuk
Copy link
Contributor

potiuk commented Nov 6, 2024

Is this a good idea, worth pursuing?

That works for me. The --python option uses similar logic.

I like it too. It also works nice as a building block of our pip guideline part for airflow contributors.

@notatallshaw
Copy link
Member

I'll also validate that the filename provided is pyproject.toml. I see no strong reason not to validate this.

I still think pip should accept any filename for the reasons I outlined in #12963 (comment).

However given the path is explicit there is the obvious workaround of creating a directory and sticking a custom pyproject.toml there.

And nothing stops removing this validation in the future if opinion on this changes.

@potiuk
Copy link
Contributor

potiuk commented Nov 6, 2024

I still think pip should accept any filename for the reasons I outlined in #12963 (comment).

Side comment @notatallshaw - this problem in "messy monorepo" could be solved by converting the "custom named" toml files into putting pyproject.toml file for each team in a separate sub-folder of that directory where you currently have custom named files (directories named same way as currrent files). I think that is way better approach - especially that IDEs and such already have "if pyproject.toml" use the right schema etc.

@henryiii
Copy link
Contributor

henryiii commented Nov 6, 2024

I personally don't like the "any filename" support, as the PEP specifically is for pyproject.toml, and I can provide a bit of context from an identical feature in cibuildwheel: there, it looks for [tool.cibuildwheel] in pyproject.toml, but you can also pass a path to a config file. Unlike almost all (usually later) implementations of this, like hatch, pdm, tox, ruff, and uv, the "other" file still keeps the tool.cibuildwheel header, so it's simpler to parse and simpler to teach. However, by not using a fixed file name (like cibuildwheel.toml), there's no way for validation tools like those using SchemaStore to know that your file is actually a cibuildwheel configuration file. If I were to get a second chance, I'd require the file to be cibuildwheel.toml. And because most tools have not kept the same structure for their configuration, tool.<tool>.dependency-groups would be a top-level [dependency-groups] in their configuration, so I don't think you can just drop this into tox.toml or wherever. IMO, this was proposed specifically as a pyproject.toml feature, and multiple groups are allowed instead of multiple files.

Is this a good idea, worth pursuing?

I don't mind this, though. While it does allow someone to put a specific file in, it is optimized for the "correct" case following the PEPs.

Also, I second that subdirectories with pyproject.toml's would be better, IDEs and validators would like that at least.

@notatallshaw
Copy link
Member

this problem in "messy monorepo" could be solved by converting the "custom named" toml files into putting pyproject.toml file for each team in a separate sub-folder of that directory where you currently have custom named files (directories named same way as currrent files).

Then we would have a clean monorepo 😛.

Sure, that's the goal, but technical nuances and resource capacity have so far got in the way. For fresh projects we do this and can use opinionated tools like poetry, but for the older projects we stick to tools that don't force specific workflows or project structure, and pip has always been excellent in this regard.

Anyway, I think I made my point in the linked post, and as long as there is an explicit path there is a workaround. I didn't mean to belabor it here.

@sirosen
Copy link
Contributor Author

sirosen commented Nov 6, 2024

And nothing stops removing this validation in the future if opinion on this changes.

This, to me, is an important reason to include the requirements that the filename is pyproject.toml.

If pip supports only pyproject.toml today and is later loosened to allow use of other filenames, we won't expect this to break workflows. But if we start the other way around -- allowing any filename -- then adding the validation at a later date probably would break workflows.

So the better short-term choice, from a compatibility perspective, is to validate the name.

@zanieb
Copy link

zanieb commented Nov 6, 2024

Why include the -path suffix? That looks like it breaks from the naming scheme pip uses for flags that accept paths.

@sirosen
Copy link
Contributor Author

sirosen commented Nov 7, 2024

It can be --pyproject, if that's more consistent. I don't have any desire to break with convention -- I'm genuinely looking for the least controversial path here! 😁

@pfmoore
Copy link
Member

pfmoore commented Nov 7, 2024

Personally I think --pyproject is too similar to --project. People will mistype it, and it pushes back to the whole idea of "pip should have a concept of a current project". I don't have anything new to say on the latter topic, so I'll just make this comment and leave it at that for now.

@sirosen
Copy link
Contributor Author

sirosen commented Nov 7, 2024

I think that some minimal concept of project awareness is implied by reading a pyproject.toml file. We can quibble about whether or not that's true, but I don't think deciding that is necessary for us to move forward here.

The fun thing about agreeing with your second sentence is that the comments I might add about the first one become immaterial!
I do not consider "project awareness" to be the sort of topic addressed by this PR. I think we're in very strong agreement here. And this jives with what Paul is saying about taking a definitive personal stance that "pip is not project aware."

I am focusing narrowly on behaviors which "read from a specfiied pyproject.toml file" and setting aside what that implies ideologically about "what kind of tool pip is" or similar topics. I leave that decision making entirely to the pip maintainers and not only am not trying to convince them of anything -- I have no desire to try to convince them of anything.

The intention of my commentary there was to help inform the flag name, not increase the scope of behavior.

👍 And the input on this topic is much appreciated. I think it has been a big help.

I'm still attracted to --pyproject-path . as supported usage. I agree that it's not fully consistent with the other option names, but the only alternative which stands up to scrutiny is --pyproject-file, which implicitly requires the full filename and cannot take the dir.
Pragmatically speaking, I think being able to pass . as the directory is desirable.

But that leads me to consider...

Actually, there's an argument that we shouldn't use two options at all. At the risk of proposing yet another colour for the bikeshed, we could invent a syntax for specifying the group and filename together. Something like dev{./pyproject.toml}, for example. That syntax is ugly, IMO, but maybe there's something better that we could use.

Speaking from the perspective of someone who has been trying to work out how to implement this, I actually have been starting to think in a similar way. I was maybe shy about saying it before since I thought I was overcomplicating the interface, but this is really a significant simplification for the implementation.

For example, consider this usage in one of the proposals:

$ export PIP_PYPROJECT_PATH="."
$ pip install -r foo.txt

This can't throw an error, even though only "half of the required options were provided". Confusing, right?

I agree that any syntax would need to be pip-specific. I also have a particular strong preference: I would like the pyproject file to come first. My reason for this is that it gives us an unambiguous parse with minimal extra characters. We can look for the first pyproject.toml: substring and split on it.
I would favor, in this view...

$ pip install --group "./pyproject.toml: test"

# note, also allows:
$ pip install --group "./foolib/pyproject.toml: build" --group "./barlib/pyproject.toml: build"

@zanieb
Copy link

zanieb commented Nov 7, 2024

As a minor note, we very briefly discussed including the group and path in a single argument at #12963 (comment).

In uv, we report dependency groups as <project-name>:<group-name> in resolution error messages. I'm not sure how I feel about a --group /path/to/pyproject.toml:dev syntax. (Since group names have a limited character set, I think this would be fine to parse on Windows too).

One other point I'll make here is around the uv pip interface...

I'm not really participating in this thread in the interest of uv — I'm trying to help drive a good decision here.

@zanieb
Copy link

zanieb commented Nov 7, 2024

I don't have any qualms about supporting --group <path>:<group>, it's not compatible with the current uv interface but I think that'd be fine. I like that it solves the main complexities we've been circling.

@sirosen
Copy link
Contributor Author

sirosen commented Nov 7, 2024

I'm not sure how I feel about a --group /path/to/pyproject.toml:dev syntax. (Since group names have a limited character set, I think this would be fine to parse on Windows too).

All cards on the table: I'm not sure how I feel about it either.

One of the most difficult parts of PEP 735 discussions for me was pushing back against the various requests for "special dependency groups" syntax. I understand the initial attraction of those ideas, but I'm so happy that I stuck to the position that there would be no new or special syntax -- I believe it would have made these sorts of discussions much harder, not easier.

So coming back around and proposing a pip-specific syntax for exactly the thing which I worked for months to convince folks (including myself) should not exist... Feels very strange!

I do like that it would let you pull groups from multiple files, which otherwise is not possible.


EDIT: A quick note to clarify for anyone who didn't follow the PEP.

People wanted syntax for dependency groups based on package names, project directories, and pyproject.toml files. If we had included a syntax for dependency groups in the spec, it's pretty unlikely that we would have had such perfect foresight that it would have been exactly what we need in this scenario.

@pfmoore
Copy link
Member

pfmoore commented Nov 7, 2024

I don't have any qualms about supporting --group <path>:<group>

I could live with that, too. The syntax is a lot better than the one I suggested. Honestly, I don't think anything here is ideal syntax, precisely because the whole dependency group feature is a slightly uncomfortable fit with pip's model. So we'll have to compromise somewhere.

I'm not really participating in this thread in the interest of uv — I'm trying to help drive a good decision here.

Understood, and your input is much appreciated. I wanted to be clear on how this would impact uv (in my view) but I wasn't trying to make any sort of point by doing so. Sorry if it seemed that I was.

So coming back around and proposing a pip-specific syntax for exactly the thing which I worked for months to convince folks (including myself) should not exist... Feels very strange!

I understand that feeling. I'm a little uncomfortable about it myself, too. I do think that having it be a detail of pip's CLI syntax, rather than a packaging standard, is an important difference.

While we're mentioning things that make us uncomfortable, it did occur to me that if we made the default for the location of pyproject.toml be the current directory1, we get back the simple --group dev syntax for the common case where the user is in the project root directory. Which is nice, but like you said it feels like I'm proposing exactly what I argued against in earlier messages 😟

Footnotes

  1. Being very clear that it's the current directory, not the project directory or anything like that...

Comment on lines +327 to +332
script.scratch_path.joinpath("pyproject.toml").write_text(
"""\
[dependency-groups]
empty = []
"""
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Wrap in a textwrap.dedent?

@@ -38,6 +38,7 @@ class DownloadCommand(RequirementCommand):
def add_options(self) -> None:
self.cmd_opts.add_option(cmdoptions.constraints())
self.cmd_opts.add_option(cmdoptions.requirements())
self.cmd_opts.add_option(cmdoptions.dependency_groups())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't include it because I didn't think we'd want the option to be presented for pip wheel.

I think we do want it on pip wheel -- the mental model I have is pip create-wheelhouse is the longer name for pip wheel.

Essentially, it's creating a bundle of wheels that you can pass to pip install with the same requirement strings/arguments. And, --group makes sense within that context IMO.

@pradyunsg
Copy link
Member

pradyunsg commented Nov 8, 2024

Per the discussion in #12963 , this is implemented with support only for loading data out of the pyproject.toml file found in the working directory.

My feeling on this whole topic is that the quoted blurb describes the bare-minimum that we can reasonably implement here without strong disagreements on this topic, so that we shouldn't change the scope of this PR (to add any project-level concepts or any custom path handling).

My 2 cents on this: Let's tackle that whole question in a follow up PR/change/issue.

AFAICT, all the proposals at hand are additive (the --group {path}:{groupname} as well as the other variants discussed here) -- something that has already been mentioned here IIUC. I'd prefer that we have this implementation and iterate on top of this, rather than try to figure out a solution for the model-mismatch between pip's operational model & the model needed for a project-based workflow in this PR.

[--group bikeshedding]

--group is the nicest name IMO. It's implied that pip's operating with packages and their dependencies, so I don't think we need a more explicit --dependency-group name here. If someone is really strongly attached, I'd rather have both as aliases.

Note that longer option names is something we've used in the past to indicate that we want to discourage use of these options (I don't have the discussion handy, but one factoid there was UX research that indicated that longer option names had a strong correlation with lower usage) -- so I'm strongly hesitant on a longer option name here.

@pfmoore
Copy link
Member

pfmoore commented Nov 8, 2024

My feeling on this whole topic is that the quoted blurb describes the bare-minimum that we can reasonably implement here without strong disagreements on this topic

I'm not sure this is true - loading pyproject.toml from the current directory is the part that caused much of the existing debate (because people conflate "current directory" and "current project").

What I want to avoid is having to spend time rehashing this whole debate when someone says that pip should search upward for pyproject.toml because they did something like

. .venv/scripts/activate
cd docs
pip install --group doc
python build_docs.py

Or people do pip install --editable ./lib --group deps and complain because they get the dependency group from the current directory's pyproject.toml rather than the one in lib. Both of these cases have been discussed here, so these are not theoretical cases.

If we can find a way to document that the decision to not try to auto-discover the right pyproject.toml is deliberate, and we do not intend to add that in future, I'm willing to go with the minimal --group dev syntax.

@zanieb
Copy link

zanieb commented Nov 8, 2024

AFAICT, all the proposals at hand are additive...

Yeah the discussion is not about adding more functionality, it's about alternatives to implicit discovery of the pyproject.toml.

@sirosen
Copy link
Contributor Author

sirosen commented Nov 8, 2024

My feeling on this whole topic is that the quoted blurb describes the bare-minimum that we can reasonably implement here without strong disagreements on this topic

I'm not sure this is true - loading pyproject.toml from the current directory is the part that caused much of the existing debate (because people conflate "current directory" and "current project").

Right. I started from a low-confidence belief that it was the smallest option with the least disagreement.

The discussion here has inclined me to think that it will cause user confusion (even if it's minor) and lead to issues being filled.

If we can find a way to document that the decision to not try to auto-discover the right pyproject.toml is deliberate, and we do not intend to add that in future, I'm willing to go with the minimal --group dev syntax.

My current, pessimistic, belief is that such documentation would not be sufficient.
When you have a use case and you're able to imagine a "solution", it can be very hard to understand the maintainer perspective.

I would expect users to open issues (or just complain in public forums, which is less impactful but no fun) with the notion that "searching up the directory tree would be a small, practical improvement" and suchlike for any other "easy improvement" pip "should" make.
Maybe that's too dim of a view to take, but that's my concern.

I started work on --group <path>:<name> where the path part is optional.
The thing that I like about this approach is that documenting it is easy. The option help text has to declare both parts, so a user trying to work out how to use it is confronted immediately with how "pyproject discovery" works.

@notatallshaw
Copy link
Member

notatallshaw commented Nov 8, 2024

I started work on --group <path>:<name> where the path part is optional.

Please make sure to test for obvious awkward paths e.g. C:\my dir\pyproject.toml, /home/: my user :/pyproject.toml

And I'll just throw another syntax suggestion out there (but I'm not going to advocate for anymore than suggesting it this once): --group <name>@<path> , because reading it out loud makes sense to me "group name at this pyproject path". But maybe the current suggestion has other benefits I'm not thinking of (shell tab completion?).

I think that's misrepresenting what I said, but my views have been crystallising as part of this discussion, and it's possible I was less definite in previous posts.

Apologies, I didn't mean to represent your views, I think I cut out some context attempting to keep some brevity, I should of worded it more like "I agree with not using that flag, I would prefer it to be left for some hypothetical future use if pip ever decides to become project aware"

@pfmoore
Copy link
Member

pfmoore commented Nov 8, 2024

My current, pessimistic, belief is that such documentation would not be sufficient.

It won't stop people asking for enhancements, but it will give us an easy way to reject those requests. "This is by design - see <url> for details".

@pfmoore
Copy link
Member

pfmoore commented Nov 8, 2024

Apologies, I didn't mean to represent your views

Not a problem. We've been exploring a lot of ideas, and my view has changed over the course of the discussion. I simply wanted to clarify that whatever you were referring to wasn't necessarily how I think now.

@henryiii
Copy link
Contributor

henryiii commented Nov 8, 2024

By the way, the obvious choice should be considered and rejected if needed, --group path[name]. People already know how to use that, due to the fact it's already used for extras. I'm not really that fond of the square brackets (some shells require quoting), but familiarity is useful when teaching, at least if it's close enough for that familiarity not to be confusing. I do think I like the other suggestions better, and the fact --group .[dev] doesn't install . by itself might be something you can count as "confusing" enough to reject this idea.

Also, what happens if you want multiple groups? Will the group flag be accepted multiple times, and you would just pass the same path if needed? Unless the [] syntax was used above, I'm assuming that's fine, as you can always make composite groups within the same pyproject.toml. (Which is also why this is not just a simple TOML read)

@henryiii
Copy link
Contributor

henryiii commented Nov 8, 2024

And I'll just throw another syntax suggestion out there

We already have repo@branch, by the way. IMO that's a potential confusion point if we also had --group name@path, path ~= repo.

@notatallshaw
Copy link
Member

notatallshaw commented Nov 8, 2024

By the way, the obvious choice should be considered and rejected if needed, --group path[name].

Semi-related, this reminded me of #13062, which is to say it should be clear, if a special syntax is decided on, whether it supports file paths in the file:// way of writing paths or not should be made clear.

@sirosen
Copy link
Contributor Author

sirosen commented Nov 8, 2024

By the way, the obvious choice should be considered and rejected if needed, --group path[name].

I appreciate you raising this in a way which is open to rejection... because I'm strongly against it! 😁

I think looking like an extra would be harmful, not helpful. It's not an extra. If a user thinks it's an extra, they're mistaken.
If they think it's "like an extra", they're not far off, but what does "like an extra" mean? IMO, best not to go there.

Also, what happens if you want multiple groups? Will the group flag be accepted multiple times, and you would just pass the same path if needed? Unless the [] syntax was used above, I'm assuming that's fine, as you can always make composite groups within the same pyproject.toml. (Which is also why this is not just a simple TOML read)

It's a good question. I don't have a working implementation yet, but I do have a clear plan:

  • if you want multiple groups from the same path, you pass the path multiple times
  • the parsing logic will internally coalesce these so that we only read each pyproject.toml once

I think this will be a rare use-case, since if you control the source you can, like you said, just add

aggregate = [{include-group = "part1"}, {include-group = "part2"}]

but rare doesn't mean excluded. It will just be a bit more verbose.

Per review, support on `pip wheel` is desirable. This is net-net
simpler, since we don't need any trickery to "dodge" the fact that it
is a `RequirementCommand` but wasn't supporting `--group`.

The desire to *not* support `--group` here was based on a mistaken
idea about what `pip wheel` does.
In discussions about the correct interface for `pip` to use
[dependency-groups], no strong consensus arose. However, the option
with the most support appears to be to make it possible to pass a file
path plus a group name.

This change converts the `--group` option to take colon-separated
path:groupname pairs, with the path part optional. The CLI parsing
code is responsible for handling the syntax and for filling in a
default path of `"pyproject.toml"`.

If a path is provided, it must have a basename of `pyproject.toml`.
Failing to meet this constraint is an error at arg parsing time.

The `dependency_groups` usage is updated to create a
DependencyGroupResolver per `pyproject.toml` file provided. This
ensures that we only parse each file once, and we keep the results of
previous resolutions when resolving multiple dependency groups from
the same file. (Technically, the implementation is a resolver per
path, which is subtly different from per-file, in that it doesn't
account for symlinks, hardlinks, etc.)
@sirosen
Copy link
Contributor Author

sirosen commented Dec 15, 2024

I want to give a quick recap because the last commit changes the interface being added here.
I'll edit the top PR comment + title to link down to this.

Revised PR

Add a new --group flag for passing dependency groups to pip install, pip wheel, and pip download.

Usage is as follows:

  --group <[path:]group>      Install a named dependency-group from a
                              "pyproject.toml" file. If a path is given, it
                              must end in "pyproject.toml:". Defaults to using
                              "pyproject.toml" in the current directory.

This option allows users to pass multiple (path: str, group: str) pairs via the CLI, and in cases where path is omitted, it is simply filled with "pyproject.toml". Dependency groups are expanded before installation begins, and any errors in the process are raised immediately (cyclic groups, missing files, non-TOML file contents, etc).

If a path is given but it has any basename other than pyproject.toml, it is immediately rejected with an error.

Implementation Details

The CLI parsing is done with str.rpartition(":"), meaning a rightmost-: char is used to split the string. This is important for correctness when : may appear in the path itself -- it is a valid dirname char on many platforms, and appears in Windows drive letter paths. If we consider pathological cases, like /home/pyproject.toml/pyproject.toml:wat/myproject/pyproject.toml:this, we can quickly convince ourselves that this will split in the right location.

The filename check uses PurePath(x).name. We could do this with plain string utilities, but it involves length checking, worrying about os.altsep and other subtleties. pathlib makes this safe and easy.

Dependency group resolution is done such that each path will be parsed once and reused. This is organized by input path, with no path normalization or other checks to avoid duplicate work. This means that --group ../foo/pyproject.toml:a ../foo/pyproject.toml:b will deduplicate work, but --group ../foo/pyproject.toml:a ../foo/../foo/pyproject.toml:b will parse the same file twice.

Example Usage

Because the pyproject.toml part is optional, one can use --group <groupname>, as in:

pip install --group dev --group docs

It's also possible to run against multiple files or to pull a group from just one other file, mixing it with the above syntax. See the following:

pip install --group dev --group ../common/pyproject.toml:dev
pip install --group ./common/pyproject.toml:lint --group ./corelib/pyproject.toml:build-all

Ergonomic Notes

  • Personally, having tried this out a bit after implementing, all of my hesitation about the syntax has vanished. I'm sure I'm biased, but it feels intuitive and straightforward -- similar to using -r. Importantly, it doesn't feel to me like pip is "starting to think about projects and project directories". It has a new file-oriented syntax with a friendly default.
  • Out of the box you get pretty good tab completion by having the path come first. I'm not sure if this is true for all modern shells, but my own zsh and bash tab-complete paths if they don't have a completer. So you get completion for the path part, and then you just hit <backspace>:group.

Review Notes

  • I could probably come up with dozens more tests, especially platform specific ones, but I don't want to make this harder to review or make ongoing pip work unnecessarily slow, so I've kept it pretty minimal. I am, as always, more than happy to do more work on the testing side!
  • It's hard to give a full summary in the helptext. I'd like to add docs for this if it merges (not sure where yet?).
  • Because you can now be parsing multiple files at once, errors need to make sure to present which file was the source of the error. I've done my best on this, but look forward to any review feedback on the error messages.

@sirosen sirosen changed the title Implement a --group option for installing from [dependency-groups] found in pyproject.toml in the current working directory Implement a --group option for installing from [dependency-groups] found in pyproject.toml files Dec 15, 2024
@aretrace
Copy link

aretrace commented Jan 8, 2025

What are the next steps for this to be considered part of the pip interface?
My understanding is PEP 735 has been accepted but pip support is still being figured out (hence the latest interface proposal up to this point).
Will discussion continue here on a potential merger?
(i ask because i would like to become more familiar with the Python development ecosystem)

@sirosen
Copy link
Contributor Author

sirosen commented Jan 8, 2025

"PEP 735 - Dependency Groups" acceptance does not imply that this or any other implementation of [dependency-groups] support will merge into pip. I think that's important to understand. This PR, or any other implementation, has to succeed on its own merits and convince the pip maintainers that the feature is appropriate to pip and is built in a maintainable way.

As far as my understanding goes, we're waiting for the pip maintainers to not only review this implementation, but also to agree on whether or not they want this interface.1 I therefore believe that the correct next step is to wait for feedback and remember that volunteer time is precious and we don't want folks to feel overburdened or burnt out. I would expect this will take place on the order of months, rather than days. Remember, the current design was only posted a couple of weeks ago, and that was close to the typical holiday and New Year's break for much of the world.

Footnotes

  1. FWIW, I'm unclear on where that discussion will happen. I presume in this PR, but maybe not?

@pfmoore
Copy link
Member

pfmoore commented Jan 8, 2025

FWIW, I'm unclear on where that discussion will happen. I presume in this PR, but maybe not?

Yes, on this PR. The lack of responses simply reflects the fact that the pip maintainers generally don’t have as much time to work on pip as people would like.

I’m broadly in favour of the latest design, although I would like to hear the other maintainers’ views. And I’d still like a discussion document1 that explains that this is just reading from a file, and not managing a project. My suggestion is to put this alongside the docs for requirements, here. Having 3 sections, “Requirement files”, “Dependency groups”, and “Constraint files”, next to each other emphasises the equivalence while keeping the topics separate.

Footnotes

  1. Which I’d prefer to be in this PR, not in a follow-up.

@sirosen
Copy link
Contributor Author

sirosen commented Jan 9, 2025

And I’d still like a discussion document1 that explains that this is just reading from a file, and not managing a project. My suggestion is to put this alongside the docs for requirements, here. Having 3 sections, “Requirement files”, “Dependency groups”, and “Constraint files”, next to each other emphasises the equivalence while keeping the topics separate.


  1. Which I’d prefer to be in this PR, not in a follow-up.

Okay, I hadn't been clear on whether or not it should be in this PR and didn't have a good hint about where to put it.
It will probably take me a few days to get a draft doc put together, but I'll include it in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Proposal: PEP 735 "Dependency Groups" Support
8 participants