8000 Populate Files and Relationship fields for spdx-json output by spiffcs · Pull Request #507 · anchore/syft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Populate Files and Relationship fields for spdx-json output #507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Sep 17, 2021

Conversation

spiffcs
Copy link
Contributor
@spiffcs spiffcs commented Sep 14, 2021

Work for #476

Initial Support for RELATIONSHIPS in the spdx-json output.

This PR adds support for the following:

  • populating the files field of the spdxJSON output
  • associating spdx Packages with the new File resources in the hasFiles field.
  • populating the Relationship field with the vertices associating Files and Packages

File support only comes from catalogers who support the FileOwner interface.

The current list is

  • apk
  • dpkg
  • python
  • rmdb

Link to JSON schema:
https://github.com/spdx/spdx-spec/blob/a08ffa4b883b35574ae0df8f4566cd031341ba3e/schemas/spdx-schema.json#L423-L558

Snippets of the new output generated with the spdxJson option - pullfrom node:latest

 "files": [
  {
   "SPDXID": "SPDXRef-File-adduser-deluser.conf",
   "name": "deluser.conf",
   "licenseConcluded": "",
   "fileName": "/etc/deluser.conf"
  },
  {
   "SPDXID": "SPDXRef-File-adduser-adduser",
   "name": "adduser",
   "licenseConcluded": "",
   "fileName": "/usr/sbin/adduser"
  },
  {
   "SPDXID": "SPDXRef-File-adduser-deluser",
   "name": "deluser",
   "licenseConcluded": "",
   "fileName": "/usr/sbin/deluser"
  },
  {
   "SPDXID": "SPDXRef-File-adduser-adduser.conf",
   "name": "adduser.conf",
   "licenseConcluded": "",
   "fileName": "/usr/share/adduser/adduser.conf"
  },

   "hasFiles": [
    "SPDXRef-File-autotools-dev-dh_autotools-dev_restoreconfig",
    "SPDXRef-File-autotools-dev-dh_autotools-dev_updateconfig",
    "SPDXRef-File-autotools-dev-NEWS.Debian.gz",
    "SPDXRef-File-autotools-dev-README.Debian.gz",
    "SPDXRef-File-autotools-dev-TODO",
    "SPDXRef-File-autotools-dev-changelog.Debian.gz",
    "SPDXRef-File-autotools-dev-changelog.gz",
    "SPDXRef-File-autotools-dev-copyright",
    "SPDXRef-File-autotools-dev-upstream.mail.template",
    "SPDXRef-File-autotools-dev-dh_autotools-dev_restoreconfig.1.gz",
    "SPDXRef-File-autotools-dev-dh_autotools-dev_updateconfig.1.gz",
    "SPDXRef-File-autotools-dev-config.guess",
    "SPDXRef-File-autotools-dev-config.sub",
    "SPDXRef-File-autotools-dev-autotools_dev.pm"
   ],
 "relationships": [
  {
   "spdxElementId": "SPDXRef-Package-deb-adduser-3.118",
   "relationshipType": "CONTAINS",
   "relatedSpdxElement": "SPDXRef-File-adduser-deluser.conf"
  },
  {
   "spdxElementId": "SPDXRef-Package-deb-adduser-3.118",
   "relationshipType": "CONTAINS",
   "relatedSpdxElement": "SPDXRef-File-adduser-adduser"
  },
  {
   "spdxElementId": "SPDXRef-Package-deb-adduser-3.118",
   "relationshipType": "CONTAINS",
   "relatedSpdxElement": "SPDXRef-File-adduser-deluser"
  },
  {
   "spdxElementId": "SPDXRef-Package-deb-adduser-3.118",
   "relationshipType": "CONTAINS",
   "relatedSpdxElement": "SPDXRef-File-adduser-adduser.conf"
  },
  {
   "spdxElementId": "SPDXRef-Package-deb-adduser-3.118",
   "relationshipType": "CONTAINS",
   "relatedSpdxElement": "SPDXRef-File-adduser-TODO"
  },

@github-actions
Copy link
github-actions bot commented Sep 14, 2021

Benchmark Test Results

Benchmark results from the latest changes vs base branch
name                                                   old time/op    new time/op    delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           742µs ± 3%     803µs ± 3%  +8.33%  (p=0.008 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2        1.11ms ± 4%    1.14ms ± 1%    ~     (p=0.151 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     372µs ± 3%     390µs ± 1%  +4.90%  (p=0.008 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                 362µs ± 1%     384µs ± 1%  +6.12%  (p=0.008 n=5+5)
ImagePackageCatalogers/rpmdb-cataloger-2                  383µs ± 3%     396µs ± 1%    ~     (p=0.056 n=5+5)
ImagePackageCatalogers/java-cataloger-2                  8.04ms ± 3%    8.58ms ± 0%  +6.65%  (p=0.008 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                  584µs ± 4%     561µs ± 3%  -3.93%  (p=0.032 n=5+5)
ImagePackageCatalogers/go-cataloger-2                     187µs ± 0%     199µs ± 0%  +6.55%  (p=0.008 n=5+5)
ImagePackageCatalogers/rust-cataloger-2                   315µs ± 0%     335µs ± 0%  +6.51%  (p=0.016 n=4+5)

name                                                   old alloc/op   new alloc/op   delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           133kB ± 0%     133kB ± 0%    ~     (p=0.548 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2         643kB ± 0%     643kB ± 0%  -0.02%  (p=0.008 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2     116kB ± 0%     116kB ± 0%    ~     (p=0.056 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                 127kB ± 0%     126kB ± 0%  -0.17%  (p=0.016 n=5+5)
ImagePackageCatalogers/rpmdb-cataloger-2                  138kB ± 0%     138kB ± 0%  +0.00%  (p=0.008 n=5+5)
ImagePackageCatalogers/java-cataloger-2                  2.64MB ± 0%    2.64MB ± 0%    ~     (p=0.841 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                 1.16MB ± 0%    1.16MB ± 0%  -0.00%  (p=0.016 n=4+5)
ImagePackageCatalogers/go-cataloger-2                    52.7kB ± 0%    52.7kB ± 0%    ~     (p=0.222 n=5+5)
ImagePackageCatalogers/rust-cataloger-2                   110kB ± 0%     110kB ± 0%  -0.01%  (p=0.016 n=5+4)

name                                                   old allocs/op  new allocs/op  delta
ImagePackageCatalogers/ruby-gemspec-cataloger-2           2.04k ± 0%     2.04k ± 0%    ~     (all equal)
ImagePackageCatalogers/python-package-cataloger-2         6.14k ± 0%     6.14k ± 0%    ~     (p=0.095 n=5+4)
ImagePackageCatalogers/javascript-package-cataloger-2     1.93k ± 0%     1.93k ± 0%    ~     (all equal)
ImagePackageCatalogers/dpkgdb-cataloger-2                 2.41k ± 0%     2.41k ± 0%    ~     (all equal)
ImagePackageCatalogers/rpmdb-cataloger-2                  3.20k ± 0%     3.20k ± 0%    ~     (all equal)
ImagePackageCatalogers/java-cataloger-2                   34.8k ± 0%     34.8k ± 0%    ~     (p=1.000 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                  1.93k ± 0%     1.92k ± 0%    ~     (p=0.079 n=4+5)
ImagePackageCatalogers/go-cataloger-2                     1.42k ± 0%     1.42k ± 0%    ~     (all equal)
ImagePackageCatalogers/rust-cataloger-2                   2.83k ± 0%     2.83k ± 0%    ~     (all equal)

@spiffcs spiffcs force-pushed the 476-SPDX-relationship-mvp branch 2 times, most recently from 705eaf2 to c741b05 Compare September 14, 2021 19:44
modify spdx22 to include relationships
export spdxJson constuctor for poweruser consumption

Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
latest patch for json schema:
https://github.com/spdx/spdx-spec/blob/development/v2.2.1/schemas/spdx-schema.json

See this pr:
spdx/spdx-spec#528

See this comment:
spdx/spdx-spec#528 (comment)

Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
Add basic functionality for generating contains relationships for
packages. Code needs to be refactored with a second pass to clean up
signatures. The right format will be discovered when we try to build
multiple types of relationship for the relationship field.

Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
@spiffcs spiffcs force-pushed the 476-SPDX-relationship-mvp branch from c741b05 to db373d0 Compare September 14, 2021 19:48
@spiffcs spiffcs changed the title 476 spdx relationship mvp 476 json spdx relationship mvp Sep 14, 2021
@seabass-labrax
Copy link

This is a very exciting addition to Syft! Keep up the great work, @spiffcs :)

@@ -1,3 +1,4 @@
//go:build ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mentioned this line is getting added by accident. You might want to make a local adjustment to avoid accidentally checking this in 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - thought I removed this configuration but it seems like it's back with a vengence. I'll look at what magic nonsense exists within my defaults that is appending this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luhring looks like my tooling is trying to migrate us to the current spec:
https://go.googlesource.com/proposal/+/master/design/draft-gobuild.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better explanation of the transition
https://stackoverflow.com/questions/68360688/whats-the-difference-between-gobuild-and-build-directives

I'm going to remove it for now, but will keep an eye on the transition going forward. I don't know if it's my LSP that is doing this or if it's some configuration I updated in my dotfiles when go1.17 dropped which has slipped my mind.

@@ -39,7 +39,8 @@ func (pres *SPDXJsonPresenter) Present(output io.Writer) error {
return enc.Encode(&doc)
}

func newSPDXJsonDocument(catalog *pkg.Catalog, srcMetadata source.Metadata) spdx22.Document {
// NewSPDXJsonDocument creates and populates a new JSON document struct that follows the SPDX 2.2 spec from the given cataloging results.
func NewSPDXJsonDocument(catalog *pkg.Catalog, srcMetadata source.Metadata) spdx22.Document {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: Why did this get exported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We export NewJsonDocument as it is consumed by the poweruser package. It seemed incorrect to have one Document constructor exposed while the other was private. I also think it's something that should have an official Documentation string highlighted above the function since its counterpart would also end up in the godoc if generated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, interesting! My thoughts, in no particular order:

  • 100% agree re: doc comments 🌟
  • Architectural question: Do we plan on creating SPDX documents for any non-package commands?
  • My pref in general is not to export something unless it's being used internally in another package, or we expect it to be used by a consumer of our module. In this case, it looks like this code is under internal/ so I don't think it would be consumed by another module. Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercased - if we start doing anything in poweruser with the function we can export it then.

@wagoodman
Copy link
Contributor

Sub-packages should not be nested inside a Package Information section,
but should be separate and should use a Relationship to clarify.

This is good information. I think today we have some catalogers that raise up transitive dependencies and others that only raise up direct dependencies. This implies a couple of things:

  1. Adding a relationship to the doc ID for all existing packages that are cataloged today may not be correct --we need to only do this for only direct dependencies.
  2. This hints at catalogers should be able to optionally raise up relationship information, to both support declaring when there is a "root" package and to pave the way for transitive dependencies.

@spiffcs
Copy link
Contributor Author
spiffcs commented Sep 14, 2021

Sub-packages should not be nested inside a Package Information section,
but should be separate and should use a Relationship to clarify.

This is good information. I think today we have some catalogers that raise up transitive dependencies and others that only raise up direct dependencies. This implies a couple of things:

  1. Adding a relationship to the doc ID for all existing packages that are cataloged today may not be correct --we need to only do this for only direct dependencies.
  2. This hints at catalogers should be able to optionally raise up relationship information, to both support declaring when there is a "root" package and to pave the way for transitive dependencies.

Yea this is the direction we should proceed.

I can start tonight looking at how to make those changes as part of this PR. I don't want this going in if Adding a relationship to the doc ID for all existing packages that are cataloged is incorrect if only SOME are direct. I'll do some work learning more about the cataloguer so I can find out which are smarter than the others around direct/transitive dependencies.

Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
@spiffcs spiffcs force-pushed the 476-SPDX-relationship-mvp branch from 75ae899 to 8ebd51d Compare September 14, 2021 21:22
@spiffcs
Copy link
Contributor Author
spiffcs commented Sep 15, 2021

After talking with @wagoodman we're going to keep some parts of this branch. We'll table the Package problem for AFTER we bubble up the file relationships. I'm going to proceed with adding the File section of the document and then populating Relationships with the edge file relationships we already generate here.

cc @luhring for an update on progress/direction

Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
@spiffcs spiffcs force-pushed the 476-SPDX-relationship-mvp branch from 57a28f4 to 45cadf6 Compare September 15, 2021 19:21
@spiffcs spiffcs requested a review from a team September 15, 2021 19:36
@spiffcs spiffcs marked this pull request as ready for review September 15, 2021 19:36
@spiffcs spiffcs force-pushed the 476-SPDX-relationship-mvp branch 3 times, most recently from 010268a to 46ee410 Compare September 15, 2021 20:01
SPDX json Schema documentation:
https://github.com/spdx/spdx-spec/blob/a08ffa4b883b35574ae0df8f4566cd031341ba3e/schemas/spdx-schema.json

Required fields:
[ "SPDXID", "fileName", "copyrightText", "licenseConcluded" ]

Add `omitempty` to non required fields.
Add depricated fields to old SPDX fields

Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
@spiffcs spiffcs force-pushed the 476-SPDX-relationship-mvp branch from 46ee410 to ee19181 Compare September 15, 2021 20:14
Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
@spiffcs spiffcs requested a review from a team September 16, 2021 18:37
Copy link
Contributor
@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to add information onto to common model that gets used by the presenters when possible. This PR is adding the relationship within a specific presenter --I think this is the only way to go to keep this PR small, but that does hint at future refactor that we can iterate on.

// it might to use a matcher like https://github.com/gabriel-vasile/mimetype.
// This would need to be implemented when we move into reading things from disk
// rather than digesting package metadata from the different package managers.
func matchFileTypes(baseFileName string) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for fyi, no action requested: there is a prototype branch for stereoscope to add MIME type detection https://github.com/anchore/stereoscope/compare/add-mimetype , this would need to be supported for the directory-resolver as well, but this at least covers containers.


for _, ownedFilePath := range pkgFileOwner.OwnedFiles() {
baseFileName := filepath.Base(ownedFilePath)
fileSpdxID := spdx22.ElementID(fmt.Sprintf("File-%s-%s", p.Name, baseFileName)).String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain the base name will be good enough for an ID --what if there are two files with the same names in different paths for the same package? Could we use the whole path? or a hash of the path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point - I can hash the full path and add it as a suffix to this Sprintf


files = append(files, spdx22.File{
FileName: ownedFilePath,
FileTypes: matchFileTypes(baseFileName),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this function isn't implemented yet we could probably delete it and the FileTypes entry here (if the spec permits it). (would the basename be enough to lookup the filetype here? I'm not certain it would)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basename is not enough in this case for file detection. I'll remove it as it's not a required type.

@spiffcs spiffcs changed the title 476 json spdx relationship mvp Populate Files and Relationship fields for spdx-json output Sep 16, 2021
@spiffcs spiffcs force-pushed the 476-SPDX-relationship-mvp branch from d9af81e to 5f4544e Compare September 16, 2021 20:36
Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
@spiffcs spiffcs force-pushed the 476-SPDX-relationship-mvp branch from 5f4544e to 95861d9 Compare September 16, 2021 20:45
@spiffcs spiffcs merged commit 93d00dc into main Sep 17, 2021
@spiffcs spiffcs deleted the 476-SPDX-relationship-mvp branch September 17, 2021 13:06
@spiffcs spiffcs linked an issue Sep 20, 2021 that may be closed by this pull request
@wagoodman wagoodman added the enhancement New feature or request label Sep 23, 2021
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
)

* update spdx22 Document model to include relationships field

Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>

* update document and relationship to match current JSON spec
https://github.com/spdx/spdx-spec/blob/development/v2.2.1/schemas/spdx-schema.json
spdx/spdx-spec#528
spdx/spdx-spec#528 (comment)

Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>

* update File struct based on SPDX schema

Required fields:
[ "SPDXID", "fileName", "copyrightText", "licenseConcluded" ]
Signed-off-by: Christopher Angelo Phillips <christopher.phillips@anchore.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adapt new and existing package metadata as SPDX relationships
4 participants
0