Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: more precise license classifiers #17

Open
Steap opened this issue Feb 20, 2018 · 35 comments
Open

Request: more precise license classifiers #17

Steap opened this issue Feb 20, 2018 · 35 comments
Labels
new classifier request Request for a new classifier

Comments

@Steap
Copy link

Steap commented Feb 20, 2018

Hello,

Going throught the list of available classifiers at https://pypi.python.org/pypi?%3Aaction=list_classifiers , I feel like some of the license classifiers are not precise enough. For instance, there is a "License :: OSI Approved :: BSD License" that could refer to multiple licenses: BSD-2-Clause, BSD-2-Clause-Patent, BSD-3-Clause. In order to determine the actual license used by a project that only specifies "License :: OSI Approved :: BSD License", one has to look at the LICENSE file distributed with the source code.

This is an issue for downstream package maintainers for two reasons:

  • automated tools (such as pypi2deb, guix import, upt) meant to help them by parsing PyPI and generating a package may have trouble finding the exact license used by a package;
  • some versions of a license may be GPL/FSF/DFSG compatible while other versions may not: therefore it makes it harder than necessary to know whether a given package may be included in a given distribution.

I think the following licenses should be added (if possible to both pypi-legacy and warehouse):

  • License :: OSI Approved :: Academic Free License 1.1 (AFL-1.1)
  • License :: OSI Approved :: Academic Free License 1.2 (AFL-1.2)
  • License :: OSI Approved :: Academic Free License 2.0 (AFL-2.0)
  • License :: OSI Approved :: Academic Free License 2.1 (AFL-2.1)
  • License :: OSI Approved :: Academic Free License 3.0 (AFL-3.0)
  • License :: Apache Software License 1.0 (Apache-1.0)
  • License :: OSI Approved :: Apache Software License 1.1 (Apache-1.1)
  • License :: OSI Approved :: Apache Software License 2.0 (Apache-2.0)
  • License :: OSI Approved :: Apple Public Source License 1.0 (APSL-1.0)
  • License :: OSI Approved :: Apple Public Source License 1.1 (APSL-1.1)
  • License :: OSI Approved :: Apple Public Source License 1.2 (APSL-1.2)
  • License :: OSI Approved :: Apple Public Source License 2.0 (APSL-2.0)
  • License :: OSI Approved :: Artistic License 1.0 (Artistic-1.0)
  • License :: OSI Approved :: Artistic License 2.0 (Artistic-2.0)
  • License :: OSI Approved :: BSD 2-Clause "Simplified License" (BSD-2-Clause)
  • License :: OSI Approved :: BSD 2-Clause Plus Patent License (BSD-2-Clause-Patent)
  • License :: OSI Approved :: BSD 3-Clause "New" or "Revised" License (BSD-3-Clause)
  • License :: OSI Approved :: GNU Lesser General Public License v2.0 (LGPLv2.0)
  • License :: OSI Approved :: GNU Lesser General Public License v2.0 or later (LGPLv2.0+)
  • License :: OSI Approved :: GNU Lesser General Public License v2.1 (LGPLv2.1)
  • License :: OSI Approved :: GNU Lesser General Public License v2.1 or later (LGPLv2.1+)
  • License :: OSI Approved :: GNU Lesser General Public License v3.0 (LGPLv3.0)
  • License :: OSI Approved :: GNU Lesser General Public License v3.0 or later (LGPLv3.0+)

In parentheses are the spdx identifiers (see https://spdx.org/licenses/) except for LGPL* where I used identifiers similar to those currently used for the various versions of the GPL.

Regarding the LGPL classifiers, we may also state that v2 and v2+ (currently in the list of valid classifiers) refer to v2.0 and v2.0+ and not to v2.1 and v2.1+, which would remove the need for the LGPLv2 and LGPLv2.0+ classifiers.

I decided not to include less used variants of the BSD licences - they may be added in the future if need be.

What do you think about this?

@di
Copy link
Member

di commented Feb 20, 2018

Thanks for the report, @Steap. We're aware that the licenses are not fine-gained as they could be.

Right now the existing classifiers are shared between pypi-legacy and Warehouse, and our current priority is to achieve feature parity between the two so we can shut down legacy. As such, I've added this to a post-launch milestone. (this is done)

We need to ship a method to deprecate existing license classifiers (see pypi/legacy#91) and possibly pypi/warehouse#2649 as well before we can tackle adding more fine-grained and accurate licenses, as well. (this is done)

@di
Copy link
Member

di commented Apr 11, 2018

Blocked on pypi/warehouse#3628. done

@di
Copy link
Member

di commented Apr 11, 2018

Per pypi/legacy#91 we should also add:

License :: OSI Approved :: Apache License, Version 2.0 (Apache-2.0)
License :: Apache License, Version 1.1 (Apache-1.1)
License :: Apache License, Version 1.0 (Apache-1.0)

And deprecate:

License :: OSI Approved :: Apache Software License

@di
Copy link
Member

di commented Apr 27, 2018

This issue is now unblocked. After compiling all the differences between the following sources:

I think this is what needs done (red will be deprecated, green will be added):

Academic

- License :: OSI Approved :: Academic Free License (AFL)
+ License :: OSI Approved :: Academic Free License 1.1 (AFL-1.1)
+ License :: OSI Approved :: Academic Free License 1.2 (AFL-1.2)
+ License :: OSI Approved :: Academic Free License 2.0 (AFL-2.0)
+ License :: OSI Approved :: Academic Free License 2.1 (AFL-2.1)
+ License :: OSI Approved :: Academic Free License 3.0 (AFL-3.0)

Apache

- License :: OSI Approved :: Apache Software License
+ License :: Apache License, Version 1.1 (Apache-1.1)
+ License :: Apache License, Version 1.0 (Apache-1.0)
+ License :: OSI Approved :: Apache Software License 2.0 (Apache-2.0)

Apple

- License :: OSI Approved :: Apple Public Source License
+ License :: OSI Approved :: Apple Public Source License 1.0 (APSL-1.0)
+ License :: OSI Approved :: Apple Public Source License 1.1 (APSL-1.1)
+ License :: OSI Approved :: Apple Public Source License 1.2 (APSL-1.2)
+ License :: OSI Approved :: Apple Public Source License 2.0 (APSL-2.0)

Artistic

- License :: OSI Approved :: Artistic License
+ License :: OSI Approved :: Artistic License 1.0 (Artistic-1.0)
+ License :: OSI Approved :: Artistic License 2.0 (Artistic-2.0)

BSD

- License :: OSI Approved :: BSD License
+ License :: OSI Approved :: BSD 2-Clause Plus Patent License (BSD-2-Clause-Patent)
+ License :: OSI Approved :: BSD 2-Clause "Simplified" License (BSD-2-Clause)
+ License :: OSI Approved :: BSD 3-Clause "New" or "Revised" License (BSD-3-Clause)

GNU

- License :: OSI Approved :: GNU Affero General Public License v3
+ License :: OSI Approved :: GNU Affero General Public License v3.0 only (AGPL-3.0-only)
- License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)  
+ License :: OSI Approved :: GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later)
- License :: OSI Approved :: GNU Free Documentation License (FDL)
+ License :: OSI Approved :: GNU Free Documentation License v1.1 only (GFDL-1.1-only)
+ License :: OSI Approved :: GNU Free Documentation License v1.1 or later (GFDL-1.1-or-later)
+ License :: OSI Approved :: GNU Free Documentation License v1.2 only (GFDL-1.2-only)
+ License :: OSI Approved :: GNU Free Documentation License v1.2 or later (GFDL-1.2-or-later)
+ License :: OSI Approved :: GNU Free Documentation License v1.3 only (GFDL-1.3-only)
+ License :: OSI Approved :: GNU Free Documentation License v1.3 or later (GFDL-1.3-or-later)
- License :: OSI Approved :: GNU General Public License (GPL)
+ License :: GNU General Public License v1.0 only (GPL-1.0-only)
+ License :: GNU General Public License v1.0 or later (GPL-1.0-or-later)
- License :: OSI Approved :: GNU General Public License v2 (GPLv2)
+ License :: OSI Approved :: GNU General Public License v2.0 only (GPL-2.0-only)
- License :: OSI Approved :: GNU General Public License v2 or later (GPLv2+)
+ License :: OSI Approved :: GNU General Public License v2.0 or later (GPL-2.0-or-later)
- License :: OSI Approved :: GNU General Public License v3 (GPLv3)
+ License :: OSI Approved :: GNU General Public License v3.0 only (GPL-3.0-only)
- License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)  
+ License :: OSI Approved :: GNU General Public License v3.0 or later (GPL-3.0-or-later)
- License :: OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)
+ License :: OSI Approved :: GNU Library General Public License v2 only (LGPL-2.0-only)
- License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)
+ License :: OSI Approved :: GNU Library General Public License v2 or later (LGPL-2.0-or-later)
+ License :: OSI Approved :: GNU Lesser General Public License v2.1 only (LGPL-2.1-only)
+ License :: OSI Approved :: GNU Lesser General Public License v2.1 or later (LGPL-2.1-or-later)
- License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
+ License :: OSI Approved :: GNU Lesser General Public License v3.0 only (LGPL-3.0-only)
- License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
+ License :: OSI Approved :: GNU Lesser General Public License v3.0 or later (LGPL-3.0-or-later)
- License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)

The deprecated classifiers will affect a lot of projects in some cases:

Classifier # of projects
Academic Free License (AFL) 40
Apache Software License 8176
Apple Public Source License 4
Artistic License 46
BSD License >10000
GNU Affero General Public License v3 3859
GNU Affero General Public License v3 or later (AGPLv3+) 524
GNU Free Documentation License (FDL) 3
GNU General Public License (GPL) 3794
GNU General Public License v2 (GPLv2) 1152
GNU General Public License v2 or later (GPLv2+) 408
GNU General Public License v3 (GPLv3) 3124
GNU General Public License v3 or later (GPLv3+) 2081
GNU Lesser General Public License v2 (LGPLv2) 128
GNU Lesser General Public License v2 or later (LGPLv2+) 175
GNU Lesser General Public License v3 (LGPLv3) 831
GNU Lesser General Public License v3 or later (LGPLv3+) 502
GNU Library or Lesser General Public License (LGPL) 1194

@Steap, can you review?

@dstufft @ewdurbin Can you sanity-check? Currently a user attempting to publish a release with a deprecated classifier will get an error like:

HTTPError: 400 Client Error: Invalid value for classifiers. Error: Classifier
'Topic :: Communications :: Chat :: AOL Instant Messenger' has been deprecated,
see https://pypi.org/classifiers/ for a list of valid classifiers. for url:
http://upload.pypi.org/legacy/

@dstufft
Copy link
Member

dstufft commented Apr 28, 2018

Ugh, requiring 10k+ projects to modify their setup.py is not great. I guess the thing at the heart of this issue is whether classifiers are designed to be representative of the specific license of a project or if they're intended to act as a lossy mechanism to indicate the family of license something is under.

Overall my big concern here is that I'm not sure that classifiers are good enough even with these changes, in which case we're forcing a lot of churn for little benefit.

@di
Copy link
Member

di commented Apr 30, 2018

I agree, although the number of affected projects which will actually publish a new release is definitely a small fraction of that 10K number (although without doing some querying, I'm not sure how much).

It seems to me that in the case of the GNU licenses, the original classifiers are trying to be representative of a specific license, e.g. GNU General Public License v2 or later (GPLv2+) refers to a specific license, and according to https://spdx.org/licenses/ referring to this license by that name has been "deprecated".

I agree though that for the others, the classifier is trying to be more general, which only really poses a problem for the Apache license family, as not all licenses in this family are truly OSI-approved.

One other option would be to leave the "General" classifiers in place, and add more specific versions as sub-classifiers:

License :: OSI Approved :: BSD License
+ License :: OSI Approved :: BSD License :: 2-Clause Plus Patent License (BSD-2-Clause-Patent)
+ License :: OSI Approved :: BSD License :: 2-Clause "Simplified" License (BSD-2-Clause)
+ License :: OSI Approved :: BSD License :: 3-Clause "New" or "Revised" License (BSD-3-Clause)

I'd think we'd still need to do some depredations of the more specific GNU classifiers to make this work though, but the impact would be significantly less.

Thoughts?

@stain
Copy link

stain commented Aug 10, 2018

Well the whole point to deprecate is to force project pushing new updates to actually declare which version they are using.

GPL is a special case here as the changes here as they already have versions, but just attempts to align with spdx identifiers, which I think is a good thing; but perhaps not as critical as the other ones like the ambiguous "Apache License" or "BSD License" which may or may not be OSI Approved depending on what the author meant.

@tieguy
Copy link

tieguy commented Aug 24, 2018

👋 We've just run into this, so throwing in my two cents in case it helps prioritize/understand the problem.

tl;dr: it'd be nice to get this fix merged :)

We have two use cases for this data:

  • give users of the package more information to decide what their licensing obligations are
  • do a validation pass to make sure various sources of licensing metadata agree (and are therefore, presumably, accurate)

In particular, we just ran into a situation where a package's actual source code is BSD-2-Clause (per GitHub's scanner and our own analysis) but PyPi only reports the ambiguous bsd. So we can't actually do a useful analysis from just the pypi metadata; we have to crack open the source to figure out which BSD is being specified and whether it matches the GitHub metadata correctly. (This is not a small problem; our research suggests something like 15-20% of pypi packages have licensing metadata that doesn't match what GitHub reports; having found this bug I suspect that this problem drives a lot of that number.)

We can of course go into the source and figure this out, but it'd be nice if our customers (and presumably anyone else who uses pypi) can actually figure out what license they're required to use/distribute from the pypi metadata instead of having to dig into the code itself.

(Disclaimer: IAAL and I was a programmer, but I'm not your lawyer and no longer usefully a programmer ;)

@di
Copy link
Member

di commented Aug 25, 2018

Revisiting this, I don't think the "subclassifier" approach I mentioned in https://github.com/pypa/warehouse/issues/2996#issuecomment-385450514 will work, as it wouldn't let us eventually deprecate the "parent" classifier.

I think the right thing to do here is what I outlined in https://github.com/pypa/warehouse/issues/2996#issuecomment-385027197. We can reduce friction a little bit by adding the ability to tell users which new classifiers they should use instead, see pypi/warehouse#4626.

@brainwane
Copy link
Contributor

@ewdurbin and @Steap ping for your thoughts?

I'd love less license ambiguity in PyPI (for use in Libraries.io & similar projects) so I would appreciate if we could move forward on this change. But I recognize it might be a multi-step process, kind of like pypi/warehouse#3632 was for improving the quality of our email address verification (data model infrastructure-building, announcing on the announce list, etc.).

@tieguy am I right in presuming that you care more about analyzing license data from the most recent versions of packages than about archival/past releases? If so we might ask maintainers to make license-only point releases to fix this metadata issue. (Unless I am misunderstanding.)

@tieguy
Copy link

tieguy commented Sep 30, 2018

@brainwane yeah, for our use case we're primarily concerned with the latest version. So I think for our purposes something that allowed people to fix it in future releases, rather than doing mass-changes of old materials, would be sufficient.

I'm not Python's lawyer (you have Van for that ;) but happy to help with any explanatory work or other legal thinking where I can.

@dstufft
Copy link
Member

dstufft commented Sep 30, 2018

Part of me just wants to remove the license classifiers, and add a metadata field for SPDX version specifier, which seems to be more generally useful?

@di
Copy link
Member

di commented Oct 1, 2018

@dstufft One nice thing about the current license classifiers is that it's easy to search for and sort by them. We'd need to add a way to do this for the "SPDX identifier" field as well.

@Steap
Copy link
Author

Steap commented Oct 1, 2018

Sorry for my late answer, real life got in the way :-/

@brainwane I'm also willing to remove as much ambiguity as possible in the classifiers. I like the big patch in #2996 but, as others have already stated, the issue is the "transition" to these new classifiers. Juste like @tieguy I'm also only interested in the latest version of a given package.

@dstufft Every language has their own code for licenses, and every GNU/Linux or *BSD distribution too. It drives me crazy that not everyone uses spdx identifiers, which seem to be a truly unique way of identifying a license. I'm afraid that it would be a bit late to switch to spdx identifiers, though.

@dstufft
Copy link
Member

dstufft commented Oct 1, 2018

Sure, the flip side is that the current situation really only works in simple cases. For example, if you have the following two classifiers:

License :: OSI Approved :: Apache Software License 2.0 (Apache-2.0)
License :: OSI Approved :: BSD 2-Clause "Simplified" License (BSD-2-Clause)

Can I integrate this work into a GPL-2.0-only licensed software? You can't actually tell, because it depends on whether the software is licensed under Apache-2.0 AND BSD-2-Clause or if it is licensed under Apache-20 OR BSD-2-Clause.

Assuming you agree with the opinion that GPLv2 and Apache 2.0 are incompatible, if you have to comply with both the Apache-2.0 and the BSD-2-Clause license, then you cannot incorporate that work into a GPLv2 code base. This is why SPDX License Expressions have the ability to specify AND and OR explicitly. There's also the question of exceptions that we don't currently handle at all.

s-t-e-v-e-n-k referenced this issue in s-t-e-v-e-n-k/psql2mysql Oct 8, 2018
The legacy classifier does not specify the version number for the Apache
license. https://github.com/pypa/warehouse/issues/2996 has all the tears
ever.
@tieguy
Copy link

tieguy commented Mar 26, 2019

Relevant PEP: pypa/interoperability-peps#46

@SamuelMarks
Copy link

What's the status of this?

I've got about 60 Python packages that I want to release under (MIT OR Apache-2.0).

@di
Copy link
Member

di commented Jun 6, 2019

@SamuelMarks This issue is about the Classifier field, which will likely not support such fine-grained classifiers.

You should use the License field for that instead, e.g. in your setup.py:

setup(
    ...
    license="(MIT OR Apache-2.0)",
    ...
)

@Steap
Copy link
Author

Steap commented Jun 6, 2019

@di The documentation states that:

«The license field is a text indicating the license covering the package where the license is not a selection from the “License” Trove classifiers.»

The documentation you quoted is similar, and it shows the issue of having a "free format" for the license.

Should it be updated to specify that:

  1. In simple cases, when possible, you should use a "License ::" classifier
  2. Otherwise, you should use the license "field" with a valid SPDX expression?

Is there an official statement from PyPA regarding spdx identifiers/expressions? In the future, could PyPI check that the license is a valid expression when a maintainer uploads a package?

@di
Copy link
Member

di commented Jun 6, 2019

That's basically what it already says, with the exception of:

with a valid SPDX expression

the License field is a free-form field to allow anyone to license their project under any license. I don't think we have any plans to start enforcing any kind of semantics there.

However, Donald said:

Part of me just wants to remove the license classifiers, and add a metadata field for SPDX version specifier, which seems to be more generally useful?

This would be a separate hypothetical field that could be enforced.

@Steap
Copy link
Author

Steap commented Jun 6, 2019

However, Donald said:

Part of me just wants to remove the license classifiers, and add a metadata field for SPDX version specifier, which seems to be more generally useful?

This would be a separate hypothetical field that could be enforced.

How would one start a discussion about this? As the author of a tool that makes heavy use of the metadata found on pypi.org, and having worked with distributions that had little manpower (and therefore could really take advantage of non-ambiguous metadata), I would really like to see that. Should I write a PEP, write a message on a mailing-list, reach out to someone in particular?

@di
Copy link
Member

di commented Jun 6, 2019

PEP 566 changed the canonical source for field specifications to the Core Metadata Specification. In theory you could just make a PR against pypa/packaging.python.org to introduce the new metadata field (and, new metadata version).

This is wrong, we still need a PEP.

@brainwane
Copy link
Contributor

@di Could you help me understand what it would take to resolve this issue? I got a little confused.

  • design/architecture work: I infer that this requires a little more time, e.g., deciding whether we're changing how people should use the License field in the metadata or whether we add another top-level piece of metadata called "SPDX Classifier".
  • client-side work: would making this change require plumbing metadata handling in packaging, setuptools, wheel, and/or twine?
  • Warehouse work: reasonably straightforward as discussed above, I think
  • documentation: I presume we'd need to update docs on PyPI, and PyPUG -- where else?
  • UX: we'd have to decide how to display this (e.g., whether in the list of Trove classifiers), and what to do if two license-related pieces of metadata conflict.

Tell me where I'm right/wrong?

@pombredanne
Copy link

FWIW, I had actually started working (or rather slacking) on a PEP to replace or add SPDX expressions to Python packages metadata to convey clearer, simpler and better license info a few years ago but never finished that thing. See pombredanne/spdx-pypi-pep#1

@brainwane we cannot just add another set of classifiers for that IMHO. As @dstufft pointed in https://github.com/pypa/warehouse/issues/2996#issuecomment-425762903 you cannot handle anything but simple cases with a list of licenses. You need expressions for that.
FWIW, I maintain a small library to deal with expressions if we ever come down to using this and need some validation https://github.com/nexB/license-expression/ ... But I guess there is a bit more discussion needed first!

@dstufft Do you reckon we would need a PEP to get this done right?

Who is game to help working on a PEP?

@taleinat
Copy link

taleinat commented Aug 14, 2019

@pombredanne, I would help working on such a PEP.

I'm not an expert on licenses, but I've dealt with them quite a bit; see this blog post.

@di
Copy link
Member

di commented Aug 14, 2019

@brainwane You're mostly right. Since License is currently a free-form field, I think we'd need to add a new field, something like SPDX-License-Identifier. This would require a new Metadata version, so anything writing or reading metadata would need updated.

We may want to:

  • add a restriction that the License field, license classifiers in the Classifier field, and the new field mutually exclusive
  • make the new field "multiple use".
  • think about how we would display deprecation warnings for the license classifiers.

@pombredanne and @taleinat, as I said in https://github.com/pypa/warehouse/issues/2996#issuecomment-499633046, a PEP is not necessary here.

@pombredanne
Copy link

@di excellent and much simpler! But beside an update to the Metadata spec and version, updates to several tools would be needed (wheel, setuptools, pip to name a few...) correct?

@taleinat let's start crafting something together then! We could meet/chat on #pypa-dev on Freenode. I am pombreda there

@di
Copy link
Member

di commented Aug 14, 2019

@pombredanne Yep, like I said above:

This would require a new Metadata version, so anything writing or reading metadata would need updated.

@pombredanne
Copy link

@taleinat @di @brainwane here is a starter pypa/packaging.python.org#635

potto216 added a commit to openpolicedata/openpolicedata that referenced this issue Nov 18, 2021
…es/BSD-3-Clause) to the more general "BSD License" because PyPi only supports the following licenses: https://pypi.org/classifiers/ This issue of having to coarse grain licenses is a known issue which they are working on pypa/trove-classifiers#17. The sample package can be viewed at https://test.pypi.org/project/openpolicedata-sowdm/0.0.1/
robot-piglet pushed a commit to catboost/catboost that referenced this issue Apr 19, 2023
Guts added a commit to qgis-deployment/qgis-deployment-toolbelt-cli that referenced this issue Nov 8, 2023
Make license classifiers compatible with PyPI to fix error during
publishing
(https://github.com/Guts/qgis-deployment-cli/actions/runs/6799893590/job/18487221639):

![Uploading image.png…]()


Upstream:

- pypi/legacy#91
- pypa/trove-classifiers#17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new classifier request Request for a new classifier
Projects
None yet
Development

No branches or pull requests

13 participants