-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Help Wanted] Determine attestation format for vuln scans #58
Comments
We (w/@Dentrax) have a concern about this spec because as you can see there is a
|
Hi @Dentrax and @developer-guy! My biggest recommendation is to decide on a few critical "user journeys" and figure out exactly how the attestation will be used in those cases, both generation and consumption. The consumption piece is particularly important. Figure out exactly how a consumer will take this attestation and make a decision on whether the image is acceptable or not. Assume that some library will take care of doing the crypto and the matching of the artifact to the subject - just worry about the A few initial thoughts:
Example use caseOnly allow a Kubernetes image to be deployed to a cluster if there exists a scan from "trivy" signed by MyTrustedScanner from the last 7 days showing no high severity vulnerabilities, excluding some allowlisted CVEs. (I'm just guessing here - not sure how trivy vs MyTrustedScanner works out.) Psuedocode in Rego (untested, probably incorrect): # This just shows how to match the predicate.
# Assume that boilerplate code takes care of signature, subject, and multiple-attestation logic.
allow_predicate(predicateType, predicate, cveAllowlist, now) {
predicateType == "..."
is_within_7_days(predicate.timestamp, now)
scanner := predicate.scanners[_]
scanner.name == "trivy"
not high_severity_vuln_exists(scanner.result)
}
high_severity_vuln_exists(result, cveAllowlist) {
vuln := result.vulnerabilities[_] # I'm making up the schema here
vuln.severity == "HIGH"
not is_on_allowlist(vuln.cve, cveAllowlist)
} I probably don't have all of this right, but hopefully this shows that going through a concrete example will allow you to see how well your predicate design meets your needs. |
hello @MarkLodato thank you so much for your suggestions, we (w/@Dentrax) revised our spec according to your suggestions. The following predicate specifies our specification: {
"predicate": {
"invocation": {
"parameters": null,
"uri": "https://github.com/developer-guy/alpine/actions/runs/1071875574",
"event_id": "1071875574",
"builder.id": "github actions"
},
"scanner": {
"name": "trivy",
"version": "0.19.2",
"db": {
"uri": "https://github.com/aquasecurity/trivy-db/commit/4c76bb580b2736d67751410fa4ab66d2b6b9b27d",
"version": "v1-2021080612"
},
"result": "<JSON RESULT>"
},
"metadata": {
"scanStartedOn": 1628256736,
"scanFinishedOn": 1628256736
},
"materials": [
{
"uri": "index.docker.io/devopps/alpine:3.12.1",
"digest": "sha256:ab614a0022ff7e6568ff403034b9114302ac20b1dd2173c842c561f6e746e304"
}
]
}
} |
Hi @developer-guy, that schema looks good to me. One question: is |
Hi @MarkLodato, it should be a JSON object, I think this option would be better, right, wdyt? |
Is 'materials' meant to show the thing that was scanned or an image used by the scanner. Should 'materials' use DigestSet as the type instead of embedding the type (sha256) and the hash together in one string? Can the timestamps in metadata use RFC 3339 for timestamps? Is there a Package URL or SPDX location that could be used for the URI in materials that might provide more context? (see https://github.com/in-toto/attestation/blob/main/spec/field_types.md#field-type-definitions) Could scanner.name use a URL to identify the scanner (to help avoid conflicts)? |
Exactly!
Oh, yes. We missed that. "materials": [
{
"uri": "index.docker.io/devopps/alpine:3.12.1",
"digest": {
"sha256": "ab614a0022ff7e6568ff403034b9114302ac20b1dd2173c842c561f6e746e304"
}
}
]
Good point. In such a case, we should use "metadata": {
"scanStartedOn": "2021-08-06T17:45:50.52Z",
"scanFinishedOn": "2021-08-06T17:50:50.52Z"
},
That makes sense. I think you're proposing to add something like
I'm not sure about this. May representing the
But actually, the name of the scanner is just Additionally, we can use the following format for the
|
I don't think you need a new field, I believe the idea is that go in uri e.g.
I think @MarkLodato may know better however. |
we (w/@Dentrax) updated the spec according to your recommendations @MarkLodato @TomHennen, WDYT about the latest updates, if you agree on the spec, we can start to work on implementing it to the in-toto attestations repository, then we will continue to discuss in PR? The following are explaining the summary of what we have done:
The followings are the questions that we want to ask:
{
"predicate": {
"invocation": {
"parameters": null,
"uri": "https://github.com/developer-guy/alpine/actions/runs/1071875574",
"event_id": "1071875574",
"builder.id": "github actions"
},
"scanner": {
"uri": "pkg:github/aquasecurity/trivy@244fd47e07d1004f0aed9", ## changed field from name to uri
"version": "0.19.2",
"db": {
"uri": "pkg:github/aquasecurity/trivy-db/commit/4c76bb580b2736d67751410fa4ab66d2b6b9b27d",
"version": "v1-2021080612"
},
"result": {<JSON RESULT>} ## JSON Object
},
"metadata": {
"scanStartedOn": "2021-08-06T17:45:50.52Z",
"scanFinishedOn": "2021-08-06T17:50:50.52Z"
},
"materials": [
{
"uri": "pkg:docker/devopps/alpine:3.12.1", ## the image was built
"digest": {
"sha256": "ab614a0022ff7e6568ff403034b9114302ac20b1dd2173c842c561f6e746e304" ## digest
}
},
{
"uri":"pkg:github/devopps/alpine@244fd47e07d1004f0aed9c", ## the commit that image was built
"digest": {
"sha256": "244fd47e07d1004f0aed9c403034b9114302ac20b1dd2173c842c561f6e746e304" ## commit sha
},
}
]
}
} |
I don't think we need this here. The scanning service should encapsulate all of that.
Same as above. I think the |
Agreed.
I think that's right. If you really want to include provenance data on the scanner itself then you could emit SLSA provenance for that. My guess is that the consumers of the vuln scan probably don't care about exactly how the scanner was built, they just care about the results of the image scanned. It would probably be useful to have the SLSA provenance available but I don't think it's necessary for this use case.
I don't actually know what's been decided generally, but we've moved the SLSA provenance spec to its own repo and then we just link to it here. @adityasaky might have a preference as to submitting it here vs just linking to it. See #60 for a bit more info. Regardless I think this review was really helpful and I'm glad you're interested in using these attestations! |
@Dentrax @developer-guy couple questions about the spec proposal as of this version:
|
Right. (i.e., https://github.com/aquasecurity/trivy-db) What should we do if scanner uses multiple DBs? Should we change the format of
Most scanner have a support for SARIF format. So we can append this into
As far as I understand, you meant "What will happen in case of scanner exits with non-zero?" So, we may have to think about if we really want to generate vuln attestation if scan fails in the first place. |
thanks you @TomHennen @dlorenc for your suggestions, we are truly grateful for your help.
According to that, we should remove the {
"predicate": {
"invocation": {
"parameters": null,
"uri": "https://github.com/developer-guy/alpine/actions/runs/1071875574",
"event_id": "1071875574",
"builder.id": "github actions"
},
"scanner": {
"uri": "pkg:github/aquasecurity/trivy@244fd47e07d1004f0aed9",
"version": "0.19.2",
"db": {
"uri": "pkg:github/aquasecurity/trivy-db/commit/4c76bb580b2736d67751410fa4ab66d2b6b9b27d",
"version": "v1-2021080612"
},
"result": {<SCAN RESULT>} ## Formatted as SARIF or JSON.
},
"metadata": {
"scanStartedOn": "2021-08-06T17:45:50.52Z",
"scanFinishedOn": "2021-08-06T17:50:50.52Z"
}
} cc: @Dentrax |
Hi Tom, thanks for the ping. I don't have a strong preference for where this should go yet. Looking at the spec, it seems, to my eyes, to be quite general. I understand the concern with the provenance spec was that it was expressly designed for SLSA, which I think isn't the case here. |
@developer-guy Looking at the SERIF spec, it seems like most of that data is already found within the SERIF result. For example, the timestamps, scanner binary and version, and invocation are all there. It would be good to de-dupe and only add the minimum amount necessary that's not there (or add it to SERIF). It looks like what is missing is the hash of the binaries (scanner and db), which would be useful for chaining (so you can determine the provenance of those). |
Trivy, for example, we need to pass a
{
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [
{
"tool": {
"driver": {
"name": "Trivy",
"informationUri": "https://github.com/aquasecurity/trivy",
"fullName": "Trivy Vulnerability Scanner",
"version": "0.15.0",
"rules": []
}
}
}
]
} Actually, most of the scanners can able to generate their own output JSON format. We can use that
{
"SchemaVersion": 2,
"ArtifactName": "foo:bar",
"ArtifactType": "container_image",
"Metadata": {
"OS": {
"Family": "debian",
"Name": "10.10"
},
"RepoTags": [
"foo:bar"
]
},
"Results": [
{
"Target": "foo:bar (debian 10.10)",
"Class": "os-pkgs",
"Type": "debian"
}
]
} As we can see, we do have not the related info about the scanner name, version, URL, etc. if we use the default result output format. Should we support only one format for the
Exactly! |
My thought is that it would be easier for consumers if there were a different predicateType for each type of the 'result' object. I'm assuming some consumers won't know how to interpret all of the various types. Having a different predicateType will let them easily filter out the ones they don't support. If you do support multiple different types in the result field then you need some way to indicate type anyways, and why not use predicateType for that? Maybe it would be easier to have a different resultType field so that processing all the other fields could be the same? I'm not actually sure. |
Yeah, you are absolutely right, we have to provide a way for consumers to indicate the type of the result and the "predicateType": "https://sigstore.dev/v1/VulnerabilityScanResult+sarif"
"predicateType": "https://sigstore.dev/v1/VulnerabilityScanResult+json" WDYT @dlorenc @Dentrax @TomHennen? |
we don't actually need the + syntax because this isn't a media type, so something ilke this would work: "predicateType": "https://sigstore.dev/v1/VulnerabilityScanResult/sarif" |
thank you so much for warning us @dlorenc, let me fix that: "predicateType": "https://sigstore.dev/v1/VulnerabilityScanResult/sarif"
"predicateType": "https://sigstore.dev/v1/VulnerabilityScanResult/json" |
I thought that we can store the result information somewhere else instead of writing it to the spec to avoid having a long spec: "result": {<SCAN RESULT>} ## Formatted as SARIF or JSON. instead of the above one, we can use other locations to store these results, so we can reference through them like the following: result: {
"uri": <some location> # https, oci, aws s3, gcp storage whatever
"digest": {
"sha256": 2142918498129ea34912849802198041
}
} |
The downside of just including the URI is that now anyone that wants to evaluate this information needs to go fetch it. That may not always be possible, especially if there are tight latency requirements around making a decision.
|
Oh, you also have to worry about if the entity evaluating the attestation has permission to retrieve the results. |
Any updates? How should we proceed here? 🤔 |
My personal opinion:
|
Awesome work here all, thank you @Dentrax and @developer-guy for driving this !! 🚀 I really like the format that has evolved through discussion here. I'd love to see it as a predicate in this repo (with in-toto.io domain in the type). I agree with Tom that we should probably have a different predicate for different result types, that makes it easier to determine what to do with a predicate rather than having to query parts of the predicate to know how to parse it all. I wonder whether scanStartedOn and scanFinishedOn are both required? If you have vulnerability database version information and scan finish time, does the duration matter? What else can we learn by having both start and finish times? Any help wanted here? |
Thanks for kind words. @joshuagl Everyone here is so passionate 🤗
We agree with you. We should determine different predicates for each result types.
Actually the idea behind this is that we can track the total duration for the whole scan process. It's just for tracing purposes, i don't think it's related to security, however, I'm not sure whether this duration matter or not, or, how can we dig into it to collect some valuable information. Can the longer scanning anomalies than normal mean something to us in terms of security? Otherwise, we can go with the only |
Question: |
Hmm, I'm curious if you know of a scanner that can output an SBOM as part of the scan? It'd be certainly an interesting use-case for it |
Well, looking here (search the word vuln) you can see that the use of SBOM to document vulnerabilities has traction. I come to understand the predicate type has two contradicting purposes:
If I'm right, something has to be done at the predicate type level, and I strongly suggest to prioritize the first use.
Connecting this issue, and and the context issue see my comment to issue #46, and since these issues are not a detail in a specific predicate definition, It may be better to open a separate discussion regarding the meta-data required at the statement level. |
Hi! I'm not sure I understand. Do you mean to say "this predicate type" rather than "the predicate type"? I'm assuming so, let me know if I'm wrong.
I'm not sure I follow the rest. We are exactly trying to figure out what would be a good format for this use-case in this issue. As for your last paragraph, definitely! the issue in #46 is whether we need a timestamp a the subject level (i.e., the meta-data layer), or at a predicate layer. The context being that it may not make sense to introduce timestamps in certain types of operations on the chain. Either way, to reel back on things. I'm somewhat for the idea of making SARIF a first-class in this context, yet I'm not so convinced that SBOMs should be used to convey Vuln information exclusively. Ideally, you should be able to aggregate a series of attestations to answer different pieces of information from trusted parties. I think this belongs in a broader conversation on where to draw the line on what an SBOM is expected to do (and, being in the ecosystem for long enough, I don't think there is consensus on what the scope of SBOMs should be). To say things differently, think of the following use-case:
Now, with this in mind. Do we agree that then the mapping between SBOMs and software artifacts are not bijective, but rather injective (and not surjective). Now, I think this discussion is better kept on SBOM circles, but it's interesting context for this conversation nevertherless... What do you think? |
Hello folks, there is a cross-issue on cosign side, please see. We want to start working on this to implement, at least, move forward with baby steps because IMHO, there is a great potential of doing this. WDYT about the latest spec we defined, are there any updates, or additions, please tell us, then we can complete the final spec according to your feedback, and start both implementing and documenting it, thanks in advance. |
Do you mean the one we were discussing here? If so IIRC everything sounded fine generally. Can you send a PR to hash out the details? |
Predicate type issue
Sorry' but I meant "The predicate type field in any statement". I understand the different levels manifest as different entities, which are interested in different levels of enforcement; sometimes there is only a need to verify that an attestation of a certain statement exists, and sometimes there is need to verify specific details that are deep in the predicates. In other words, sometimes I just want to test that an attestation of certain type exists, sometimes I also care who signed it, and so on. See #77 where I give examples to policies that can be implemented at the statement level and policies that need also parsing of the predicates. I'll try again to explain where I see the contradiction: But if the predicate type is supposed to enable the parsing of the predicate - it should be format-aware, for example As discussed above, currently the type is in fact "media type" - the second option. This is limiting. The context issue The predicate type and format issue Use of SBOMs for CVEs |
Sorry, the above is response to @SantiagoTorres above |
I mixed sbom and vuln scan ...
What I meant was predicate type - vuln scan |
Sure, this is something that was handled by in-toto since its inception. As for the contradiction:
I do not think that there is a contradiction here. Say, for example, that a tool generating SPDX parses out fields that are common between SBOMs and fills it on a generic type that can be cross-verified across implementations and sticks media type specific fields in an opaque field. This is not limiting, and can allow for better interoperability. I am not saying this is the definitive solution, but rather that there are more alternatives to face this. Although this sounds closer to what you're saying on the "abstract level" it's essentially halfway through. Producers can provide the abstract where applicable, and the media-type elements where impossible. Philosophically, this is where in-toto used to stand with the generic link types: you stick the known members in the defined field, and anything ancillary can be stuck into an opaque member that is up to policy specification.
I think this is a worthwhile discussion to have in a separate issue (or issues) yeah.
I think this is feasible. I wonder in terms of evolution of the predicates, we could create this for testing (e.g., in, say, cosign) and gather some in-the field experience. @Dentrax , I know you're invested in making this happen in cosign, so it'd be great to hear your input here. |
According to all the discussions, we served that spec under cosign project as Cosign Vuln Attestation. and awesome projects such as trivy has already started to provide vulnerability report in the vuln attestation format we define, and we believe more will to come in the upcoming years. So, we (w/@Dentrax) thought it might be a good time to add it to the official in-toto attestations. WDYT folx? /cc @danlorenc @SantiagoTorres @TomHennen @trishankatdatadog |
yeah, I wonder how hard would it be to unify this. Thanks for the poing @developer-guy ! Thoughts from other attestation maintainers would be super helpful: @in-toto/attestation-maintainers ! I'm personally ok with, say, starting to document these somewhere so people know they exist, and then distil a unified type. If it's consign's then that'd be the easiest :P |
I think it would be helpful if someone could summarize the value of defining this new predicate data structure, if it's not too much trouble. I've been getting a little lost. I'm wondering if the net effect is basically "just use the existing vuln report standards". As long as we can map an existing format to a Regarding the "Cosign vuln spec", I had opened an issue in Cosign to ask clarifying questions about the new spec, and most of those questions have still not been answered, even though the spec has since been put into Trivy. |
I think SARIF is certainly possible, though I find the spec quite lengthy myself. I do agree that using existing formats where possible would be the most usful. |
I definitely agree. And Trivy and Grype's SARIF JSON outputs are not the same as each other. I've found generally that SARIF isn't the best for machine-to-machine interchange of vuln data, it's mostly used to drive web UIs (has a lot of English prose, etc.). I actually think there might be room for a new vulnerability report format in general, FWIW. I'm just not wrapping my head around the "thin wrapper as a predicate format" approach, at least just yet. |
@developer-guy I really like the direction this is going and the evolution that it has taken based on the comments. I agree with @TomHennen in terms of the inline results but do have similar questions/concerns about the different result types. It would be great if you open a PR that we can discuss during our next attestation meeting. |
+1 to discussing this at our next maintainers meeting. I would echo Mark's concern above to make sure we're not duplicating formats. l'm also still on the fence about how to handle different result types. @Dentrax @developer-guy Thanks so much for working on this! For your PR, I would recommend writing up your predicate using ITE-9 format. This will give you an opportunity to describe the motivation, the usage model and the predicate all in one place. If you'd like examples, please take a look at few of the currently open PRs for new predicates, and please let us know if you have any other questions! |
Regarding duplication, I do wonder if it's actually desirable for the in-toto attestation maintainers to prevent duplication. I feel like there would be a lot of nuance and tricky judgement calls, and we might find ourselves trapped in a corner. Some users may have tools that output format X, and they want to stick that in an in-toto attestation, but we've only defined predicateTypes for format Y. Are those users out of luck? What about predicate types that are similar to one another. Some (like @MarkLodato) have argued that it might be possible to drop the SLSA provenance type and just use SPDX instead. There's enough overlap that it's plausible. However, a lot of folks didn't like that proposal. Would it be good enough to say that you can put whatever data you like in these attestations but that we do try to avoid having multiple ways to use the same format in predicates. E.g. Let's make sure that we don't wind up with predicate types of Then the attestation approval process is just trying to make sure it's documented well enough, that the author has thought through the use cases well enough, and that there isn't already a predicate that coveys the same data. Thoughts? |
Good point, @TomHennen . There is a lot of nuance in what "duplication" means, and I think part of our job as maintainers should be to provide some guidelines for what counts as duplicated and what doesn't, ways to avoid ambiguity etc. Totally agree, though, that we don't want to be overly restrictive, and maybe a solution is to provide some hints as to different formats in the predicate, but I could be wrong. I think this warrants a longer, more general discussion. I, in particular, would like to see a few more concrete examples of this to understand what cases of (near-)duplication might look like. |
@hectorj2f @TomHennen @pxp928 @adityasaky I assume this issue can be closed now that #268 is merged? |
@marcelamelara Yes, let's close it. |
This is a follow-up issue for the sigstore/cosign#442.
I thought it would be more appropriate to continue the discussion for the aforementioned issue, since the spec is non-cosign-related.
With @developer-guy, we are currently trying to determine a vuln scan spec, as far as we can do best. We can enrich the following brand-new attestation SPEC:
We called the predicateType as SARIF. But I think that name, not fits this type since the content is not in SARIF format. We may have to reconsider the name.
It's obvious that it's a bit hard to think of best practices during the implementation of the first version of the spec. It would be great if you maintainers get involved and give a hand to us to improve the overall structure. So we can easily implement the model into in-toto project in order to do validate and generate the attestation. Is that make sense to you? Thanks! We are waiting for your feedback about this.
FYI @dlorenc @NitinJain2 @trishankatdatadog
The text was updated successfully, but these errors were encountered: