Ratify Vulnerability Report Verifier

# Ratify Vulnerability Report Verifier **PRD doc [here](https://hackmd.io/VlhTy7Q7TM2gTMXdfujowQ)** Vulnerability reports are JSON documents attached to the subject as a referrer. What and how they are verified surfaces different considerations: * Which tools will generate these reports? * Trivy, Grype, Syft * We will focus on Trivy and Grype initially * What report format will we support? * For now, we will focus on SARIF reports. * Should implementation be extensible to support non-SARIF in the future? * Does it matter which tool generates the report if it's a common standard? * Vulnerability reports generated by each tool in SARIF format vary a lot in how critical information is surfaced. See [below](#SARIF-Format) for more details. * Parsing logic will need to be tool-specific * How is the creation date attached to a report? * OCI annotation is one option * How do we add flexibility so it's easy to add new custom policies on the reports? * certain pre-canned policies are built-in and controlled via CRD config. * add passthrough flag which will dump all SARIF contents into verifier report so rego and parse and apply custom policy. ## SARIF Format SARIF documents provide a schema framework for static analysis reporting. In a nut shell, a set of rules configured on a tool's driver are applied to a subject. The results of each rule are outputted in a list with details on violations. There are many optional and custom fields that can be added to rules and results. Official specification [here](https://docs.oasis-open.org/sarif/sarif/v2.0/sarif-v2.0.html) **There is no standardization between SARIF reports generated by scanning tools** Here are sample SARIF reports generated by Trivy and Grype for the same image. #### Trivy ```json "rules": [ { "id": "CVE-2023-2253", "name": "LanguageSpecificPackageVulnerability", "shortDescription": { "text": "DoS from malicious API request" }, "fullDescription": { "text": "A flaw was found in the `/v2/_catalog` endpoint in distribution/distribution, which accepts a parameter to control the maximum number of records returned (query string: `n`). This vulnerability allows a malicious user to submit an unreasonably large value for `n,` causing the allocation of a massive string array, possibly causing a denial of service through excessive use of memory." }, "defaultConfiguration": { "level": "error" }, "helpUri": "https://avd.aquasec.com/nvd/cve-2023-2253", "help": { "text": "Vulnerability CVE-2023-2253\nSeverity: HIGH\nPackage: github.com/docker/distribution\nFixed Version: 2.8.2-beta.1\nLink: [CVE-2023-2253](https://avd.aquasec.com/nvd/cve-2023-2253)\nA flaw was found in the `/v2/_catalog` endpoint in distribution/distribution, which accepts a parameter to control the maximum number of records returned (query string: `n`). This vulnerability allows a malicious user to submit an unreasonably large value for `n,` causing the allocation of a massive string array, possibly causing a denial of service through excessive use of memory.", "markdown": "**Vulnerability CVE-2023-2253**\n| Severity | Package | Fixed Version | Link |\n| --- | --- | --- | --- |\n|HIGH|github.com/docker/distribution|2.8.2-beta.1|[CVE-2023-2253](https://avd.aquasec.com/nvd/cve-2023-2253)|\n\nA flaw was found in the `/v2/_catalog` endpoint in distribution/distribution, which accepts a parameter to control the maximum number of records returned (query string: `n`). This vulnerability allows a malicious user to submit an unreasonably large value for `n,` causing the allocation of a massive string array, possibly causing a denial of service through excessive use of memory." }, "properties": { "precision": "very-high", "security-severity": "7.5", "tags": [ "vulnerability", "security", "HIGH" ] } }, . . . "results": [ { "ruleId": "CVE-2023-2253", "ruleIndex": 0, "level": "error", "message": { "text": "Package: github.com/docker/distribution\nInstalled Version: v2.8.1+incompatible\nVulnerability CVE-2023-2253\nSeverity: HIGH\nFixed Version: 2.8.2-beta.1\nLink: [CVE-2023-2253](https://avd.aquasec.com/nvd/cve-2023-2253)" }, "locations": [ { "physicalLocation": { "artifactLocation": { "uri": "kubectl", "uriBaseId": "ROOTPATH" }, "region": { "startLine": 1, "startColumn": 1, "endLine": 1, "endColumn": 1 } }, "message": { "text": "kubectl: github.com/docker/distribution@v2.8.1+incompatible" } } ] }, ``` Full report [here](https://gist.github.com/akashsinghal/d06451f7d9a1b3d91e9b12cc02887f5f) #### Grype ```json "rules": [ { "id": "GHSA-69cg-p879-7622-golang.org/x/net", "name": "GoModuleMatcherExactDirectMatch", "shortDescription": { "text": "GHSA-69cg-p879-7622 high vulnerability for golang.org/x/net package" }, "fullDescription": { "text": "golang.org/x/net/http2 Denial of Service vulnerability" }, "helpUri": "https://github.com/anchore/grype", "help": { "text": "Vulnerability GHSA-69cg-p879-7622\nSeverity: high\nPackage: golang.org/x/net\nVersion: v0.0.0-20220722155237-a158d28d115b\nFix Version: 0.0.0-20220906165146-f3363e06e74c\nType: go-module\nLocation: /kubectl\nData Namespace: github:language:go\nLink: [GHSA-69cg-p879-7622](https://github.com/advisories/GHSA-69cg-p879-7622)", "markdown": "**Vulnerability GHSA-69cg-p879-7622**\n| Severity | Package | Version | Fix Version | Type | Location | Data Namespace | Link |\n| --- | --- | --- | --- | --- | --- | --- | --- |\n| high | golang.org/x/net | v0.0.0-20220722155237-a158d28d115b | 0.0.0-20220906165146-f3363e06e74c | go-module | /kubectl | github:language:go | [GHSA-69cg-p879-7622](https://github.com/advisories/GHSA-69cg-p879-7622) |\n" }, "properties": { "security-severity": "7.5" } }, . . . "results": [ { "ruleId": "GHSA-69cg-p879-7622-golang.org/x/net", "message": { "text": "The path /kubectl reports golang.org/x/net at version v0.0.0-20220722155237-a158d28d115b which is a vulnerable (go-module) package installed in the container" }, "locations": [ { "physicalLocation": { "artifactLocation": { "uri": "image//kubectl" }, "region": { "startLine": 1, "startColumn": 1, "endLine": 1, "endColumn": 1 } }, "logicalLocations": [ { "name": "/kubectl", "fullyQualifiedName": "generaltest.azurecr.io/deislabs/ratify-crds:v1@sha256:0a2263a8aa28d86a795c2789b6548c7a1f520fb7ddab042050a3a1a8e5c84752:/kubectl" } ] } ] }, ``` Full report [here](https://gist.github.com/akashsinghal/68ae4daa1dd681e35a48eec3f9b9a678) Let's take `severity` as an example field to extract. From the `results` objects, the severity is buried in the `message.text` field of the object and in Grype it doesn't even exist. If we instead inspect the corresponding `rule` object associated with the `result` then we have more information and overlap. However, for Trivy the `severity` is also stored as a custom `tag` field but that doesn't exist in Grype. The only commonality is to manually parse the `help.text` field for the `severity: \w` substring using regex and extract it that way. This is currently the way the way the prototype does it. Other information like `fixed-version` has the same requirement. Having standard fields beyond the SARIF defined ones would be ideal. In the future, there's no guarantee other scans produced by other tools like syft/clair will have the same format. We will need to consider versioning the parsing of the reports too. Report formats may change leading to parsing issues. ## Policies to enforce 1. Only consider reports produced with in the last x duration 2. Only reports with [x, y, z...] severity levels with less than [x_count, y_count, z_count...] count are allowed 3. Denylist certain CVEs (e.g curl, http2 vuln) 4. Support Vex document filtering ## Filtering Reports based on creation date Creation metadata is not embedded inside the SARIF report generated by Trivy and Grype. We would need to rely on upon annotating the referrer manifest. The OCI spec defines the standard `org.opencontainers.image.created` annotation with and RFC 3339 date. This annotation can be used for filtering in the vuln verifier. Each verifier, when invoked, is provided a descriptor of the referrer manifest. This descriptor includes the annotations on the manifest. An initial date check based on the `maxAge` defined in the verifier config can be performed. `isSuccess` can be set to false if the report is beyond the `maxAge`. If we decide, to delegate all evaluation to the rego template, the report SARIF embedded in the verifier report can have the creation annotation date appended to it at a well-known location. ## Verifier vs Rego Policy Policies can be implemented at either the rego policy level or built-in to the vulnerability verifier. TLDR: * Built-in is better for: * Maintain separation of decision making between rego and ratify. Ratify = per artifact decision; rego = multi-report aggregate decision * Keeps rego much simpler * No user input required in the template * Rego is hard to understand and complicated. It's not user friendly at all. * Easier user config experience: only need to interact with CRD * More complex verifier checks can be done. Rego has a limited set of functions for filtering/parsing. Rego is also a declarative languange. * Rego is better for: * filtering/validation of artifacts that rely on other artifacts * ex: vex filtering for vulnerability reports * Ratify's verifiers are designed to be isolated based on a single artifact. It is possible to discover and download other artifacts. But each artifact is still verified independently. * Errors are propogated to users directly * CT violation message is outputted in k8s events and in cli with `kubectl run` ### Built-in policies Common policies such as age enforcement and severity filtering can be implemented inside the verifier. Configuration will be done via the verifier config. * Pros: * This will make policy updates simple and straightforward * Rego policy is far less complex and can focus on applying aggregate policies on the list of report rather than parsing individual report. (e.g rego would enforce at least one verifier report is valid and each report has a valid signature) * Cons: * Not as flexible * Each new policy requires updates to the verifier logic * vex filtering is tougher #### Prototype Currently supports date filtering based on OCI annotation image creation and a list of disallowed severities. PR can be found [here](https://github.com/deislabs/ratify/pull/1123) [![asciicast](https://asciinema.org/a/614275.svg)](https://asciinema.org/a/614275) ### Rego-based policies A new Artifact Loader verifier is introduced that returns the entire blob content of the referrer in the report as a `payload` extension field. The only processing is an options `maximumAge`. The `artifactloader` verifier can be multipurpose. SBOM verifier can also leverage it to download SBOM and then Rego will be specific to each scenario. The verifier's `isSuccess` indicates if blob was fetched successfuly and/or the creation date is within the maximum age. Different Constraint Templates (CT) can be used in tandem to achieve the intended policy: ALL vuln reports must pass validation and each report must have at least one notary signature. This can be achieved by applying both a nested notary signature CT and a vuln report CT. During CT evaluation: 1. Collect all reports with a matching artifact type of `application/sarif+json` from the `artifactloader` verifier 2. Verify json schema is SARIF 3. Each report must be produced by trivy or grype only 4. Each report evaluated must be true 5. Check if Vex filtering enabled? - Verify vex schema - Extract cve and add to whitelist - Use whitelist to filter out reports in SARIF. 7. Search for denylist CVE 8. Extract Severity based on matching rule in report. Check for threshold count violation per severity level * Pros: * Much more flexible. Allows for adding custom filtering/querying based on future fields. * Consolidates all policy at the same level. Report aggregation policy as well as invidual report validation. * Violation message in CT can be very targeted. Violation message is currently very generic as the published Ratify library templates are not artifact specific. Rego policy can emit violation message with verbose errors to indicate exactly where violations occurred. * Single verifier that can be shared between SBOM/vuln scan/provenance/lifecycle etc. * Cons: * Extremely messy and convoluted rego policy. Each report will have to be manually parsed in rego and evaulated. This will make it hard for any user to understand how to update/manage it. * A comprehensive library of rego will need to be published with comprehensive unit testing. * The size of the external data response may be very large. TODO investigate if any size limits during CT evaluation #### Prototype Capabilities: * Introduces `artifactloader` verifier that only does filtering by `maximumAge` based on `org.opencontainers.image.created` annotation. * Adds entire blob content of the first layer in referrer to the verifier report as a new `payload` extension field. * New rego template for requiring at least one valid notary signature for each artifact in the referrer tree. * New rego template for vuln report filtering: * SARIF schema check based on remote schema file (URL) * Report only generated by Trivy/Grype * Every valid report must be true * Extract severity and check it is not part of `disallowedSeverity` list **Artifact Loader Verifier** ``` apiVersion: config.ratify.deislabs.io/v1beta1 kind: Verifier metadata: name: verifier-artifactloader spec: name: artifactloader artifactTypes: application/sarif+json parameters: nestedReferences: application/vnd.cncf.notary.signature maxiumumAge: 24h ``` **Vuln Report REGO** ``` apiVersion: templates.gatekeeper.sh/v1beta1 kind: ConstraintTemplate metadata: name: vulnerabilityreportvalidation spec: crd: spec: names: kind: VulnerabilityReportValidation validation: openAPIV3Schema: type: object properties: issuer: type: string targets: - target: admission.k8s.gatekeeper.sh rego: | package vulnerabilityreportvalidation # TODO: add support for custom reason message propagating to user import future.keywords.if import future.keywords.in import future.keywords.every default supported_types := {"trivy", "grype"} default disallowed_severity := set() default severity_regex := `Severity:\s*(\w+)` # Get data from Ratify remote_data := response { images := [img | img = input.review.object.spec.containers[_].image] images_init := [img | img = input.review.object.spec.initContainers[_].image] images_ephemeral := [img | img = input.review.object.spec.ephemeralContainers[_].image] other_images := array.concat(images_init, images_ephemeral) all_images := array.concat(other_images, images) response := external_data({"provider": "ratify-provider", "keys": all_images}) } violation[{"msg": msg}] { general_violation[{"result": msg}] } # Check if there are any system errors general_violation[{"result": result}] { err := remote_data.system_error err != "" result := sprintf("System error calling external data provider: %s", [err]) } # Check if there are errors for any of the images general_violation[{"result": result}] { count(remote_data.errors) > 0 result := sprintf("Error validating one or more images: %s", remote_data.errors) } # Check if the success criteria is true general_violation[{"result": result}] { reason := "" subject_validation := remote_data.responses[_] subject_result := subject_validation[1] vuln_results := [res | subject_result.verifierReports[i].artifactType == "application/sarif+json"; res := subject_result.verifierReports[i]] not process_vuln_reports(vuln_results) result := sprintf("vulnerability report validation failed for subject: %s, reason: %s", [subject_validation[0], reason]) } process_vuln_reports(reports) if { # download SARIF json schema response := http.send({"method": "get", "url": "https://json.schemastore.org/sarif-2.1.0.json"}) response.status == "200 OK" sarif_schema := response.body # ALL reports must be valid every vuln_report in reports { vuln_report.isSuccess == true raw_report := vuln_report.extensions.payload json_report := json.unmarshal(raw_report) is_report_valid(sarif_schema, json_report) # find matching rule per result every result in json_report.runs[0].results { some i json_report.runs[0].tool.driver.rules[i].id == result.ruleId json_report.runs[0].tool.driver.rules[i].help.text # extract severity severity := lower(regex.find_all_string_submatch_n(severity_regex, json_report.runs[0].tool.driver.rules[i].help.text, 1)[0][1]) severity # severity should not exist in disallowed list count({severity} & disallowed_severity) == 0 } } # there MUST be at least one vulnerability report count(reports) > 0 } is_report_valid(schema, report) if { # check valid json_report in SARIF based on scanner type output := json.match_schema(report, schema) output[0] # check scanner name exists report.runs[0].tool.driver.name # check report is from a support scanner count({lower(report.runs[0].tool.driver.name)} & supported_types) > 0 # check results field exist (0 length ok) report.runs[0].results } ``` **Nested Notary Validation REGO** ``` apiVersion: templates.gatekeeper.sh/v1beta1 kind: ConstraintTemplate metadata: name: notationnestedvalidation spec: crd: spec: names: kind: NotationNestedValidation validation: openAPIV3Schema: type: object properties: issuer: type: string targets: - target: admission.k8s.gatekeeper.sh rego: | package notationnestedvalidation import future.keywords.if remote_data := response { images := [img | img = input.review.object.spec.containers[_].image] images_init := [img | img = input.review.object.spec.initContainers[_].image] images_ephemeral := [img | img = input.review.object.spec.ephemeralContainers[_].image] other_images := array.concat(images_init, images_ephemeral) all_images := array.concat(other_images, images) response := external_data({"provider": "ratify-provider", "keys": all_images}) } violation[{"msg": msg}] { general_violation[{"result": msg}] } # Check if there are any system errors general_violation[{"result": result}] { err := remote_data.system_error err != "" result := sprintf("System error calling external data provider: %s", [err]) } # Check if there are errors for any of the images general_violation[{"result": result}] { count(remote_data.errors) > 0 result := sprintf("Error validating one or more images: %s", remote_data.errors) } # Check if the success criteria is true general_violation[{"result": result}] { subject_validation := remote_data.responses[_] subject_result := subject_validation[1] failed_verify(subject_result) result := sprintf("notation signature validation failed: at least one valid signature must exist for all artifacts attached to subject: %s", [subject_validation[0]]) } failed_verify(reports) if { newReports := {"nestedResults": reports.verifierReports} has_subject_failed_verify(newReports) } has_subject_failed_verify(nestedReports) if { [path, value] := walk(nestedReports) path[count(path) - 1] == "nestedResults" not notary_signature_pass_verify(value) } notary_signature_pass_verify(nestedReports) if { count_with_success := notary_signature_signature_count(nestedReports) count_with_success > 0 } notary_signature_signature_count(nestedReports) := number if { sigs := [x | some i nestedReports[i].isSuccess == true nestedReports[i].artifactType == "application/vnd.cncf.notary.signature" x := nestedReports[i].subject ] number := count(sigs) } ``` ### Support both verifiers Some common policies can be implemented in the verifier. For custom policies that need more access to the full report, a verifier `passthrough` flag can be used to pass the entire report. Then, rego policy can apply custom logic. This could potentially make it very difficult for the user to understand where policy/rules are configured if we have configurability at both rego and CRD level. ## Open Questions 1. How do we handle multi-arch images since reports will be arch specific? 2. For schema validation, how do we load the schema? * URL based can be tricky for clusters that have locked down ingres/egress * Also external URL is a single point of failure 3. How do we handle if the underlying report schema changes? 4. How does versioning work? If some reports are old and some are newer schema, how do verifiers simultaneously verify both types?