Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3088: add full text field for licenses to default syft-json output #3450

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

spiffcs
Copy link
Contributor

@spiffcs spiffcs commented Nov 17, 2024

Description

This PR updates the syft License model to include a new FullText field without any breaking changes to the current license behavior. We select candidates for this new field based on if the metadata being analyzed contains any new line characters. Because we still want Value to be populated as it is a required field I've included a default string that will be added here when FullText is the selected outcome for a newly constructed license.

Verification

Use the following Dockerfile and build a test image
docker build -t syft-3088:latest .

# Use the official Python 3.9 image from Docker Hub
FROM python:3.9

# Set the working directory in the container
WORKDIR /app

# Install the specific version of NumPy
RUN pip install numpy==1.26.4

# Specify the command to run on container start
CMD ["python"]

Run the latest syft against this image using this branch:
go run cmd/syft/main.go -o json syft-3088 | jq '.artifacts[] | select(.name=="numpy") | { name: .name, licenses: .licenses }'

The large license value extracted from the package should now be listed under the field fullText with value being set to FullText to keep the field required and not incur any breaking changes.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have added unit tests that cover changed behavior
  • I have tested my code in common scenarios and confirmed there are no regressions
  • I have added comments to my code, particularly in hard-to-understand sections

@github-actions github-actions bot added the json-schema Changes the json schema label Nov 17, 2024
@spiffcs spiffcs requested a review from wagoodman November 18, 2024 17:01
// FullTextValue is a placeholder value when license metadata is discovered to be fullText
// A license with "FullText" in its value refers the consumer to look at the FullText field
// This is so that we can keep Value as a required field up until syft 2.0
var FullTextValue = "FullText"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of a placeholder field that is attempting to have consumers reference the full text, should we just populate this with an empty value? That is, the json schema requires the field to be present but does not state that it must not be empty. I feel that if we don't know the ID we should leave this blank, but allow the new full text field to convey any discovered information. That way we can stay compliant to syft v1 json output but still add full text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we do things like otherLicenses in SPDX what would we use for the license ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wagoodman - it looks like in syft we have a couple of places where we assume value should not be ""

This includes the constructor for creating a license set:

func (s *LicenseSet) Add(licenses ...License) {
if s.set == nil {
s.set = make(map[artifact.ID]License)
}
for _, l := range licenses {
// we only want to add licenses that have a value
// note, this check should be moved to the license constructor in the future
if l.Value != "" {
if id, merged, err := s.addToExisting(l); err == nil && !merged {
// doesn't exist, add it
s.set[id] = l
} else if err != nil {
log.Trace("license set failed to add license %#v: %+v", l, err)
}
}
}
}

If we were to keep this const as a placeholder some alternatives that @willmurphyscode and I discussed could be as follows:

Other Options for Value

  • SeeFullText
  • “<First_50_Characters of file> … [TRUNCATED]”

Do we want a better name for this const or do we want to go back and break up the license set/spdx formatter assumptions about value being present?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section will need to be updated already to account for filtering out "FullText" values. In that same way, can't we do the same for an empty value and the full text is specified in the other field?

@henrysachs
Copy link
Contributor

I just hit this case and maybe this could get picked up again as the new field would help me a lot :)

@spiffcs
Copy link
Contributor Author

spiffcs commented Jan 23, 2025

I just hit this case and maybe this could get picked up again as the new field would help me a lot :)

Yep! This is in the queue of a few other PR I have to clean up and get merged for the next up coming release. Apologies for the delay on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
json-schema Changes the json schema
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

syft extract the full description of the license in python
3 participants