KEP-NNNN: Auto-Refreshing Official CVE Feed

### Enhancement Description - One-line enhancement description (can be used as a release note): Auto-refreshing official kubernetes CVE feed - Kubernetes Enhancement Proposal: - Discussion Link: https://docs.google.com/document/d/1GgmmNYN88IZ2v2NBiO3gdU8Riomm0upge_XNVxEYXp0/edit#heading=h.ash02v8wrjia - Primary contact (assignee): @nehaLohia27 - Responsible SIGs: @kubernetes/sig-security - Enhancement target (which target equals to which milestone): - Alpha release target (x.y): 1.25 - Beta release target (x.y): - Stable release target (x.y): - [ ] Alpha - [ ] KEP (`k/enhancements`) update PR(s): - [ ] Code (`k/k`) update PR(s): - [ ] Docs (`k/website`) update PR(s):  _Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently._ # KEP-NNNN: Auto-Refreshing Official CVE Feed    - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [Motivation](#motivation) - [Goals](#goals) - [Non-Goals](#non-goals) - [Proposal](#proposal) - [User Stories (Optional)](#user-stories-optional) - [Story 1](#story-1) - [Story 2](#story-2) - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) - [Test Plan](#test-plan) - [Graduation Criteria](#graduation-criteria) - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - [Version Skew Strategy](#version-skew-strategy) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) - [Feature Enablement and Rollback](#feature-enablement-and-rollback) - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) - [Monitoring Requirements](#monitoring-requirements) - [Dependencies](#dependencies) - [Scalability](#scalability) - [Troubleshooting](#troubleshooting) - [Implementation History](#implementation-history) - [Drawbacks](#drawbacks) - [Alternatives](#alternatives) - [Infrastructure Needed (Optional)](#infrastructure-needed-optional)  ## Release Signoff Checklist  Items marked with (R) are required *prior to targeting to a milestone / release*. - [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) - [ ] (R) KEP approvers have approved the KEP status as `implementable` - [ ] (R) Design details are appropriately documented - [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) - [ ] e2e Tests for all Beta API Operations (endpoints) - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free - [ ] (R) Graduation criteria is in place - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - [ ] (R) Production readiness review completed - [ ] (R) Production readiness review approved - [ ] "Implementation History" section is up-to-date for milestone - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] - [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes  [kubernetes.io]: https://kubernetes.io/ [kubernetes/enhancements]: https://git.k8s.io/enhancements [kubernetes/kubernetes]: https://git.k8s.io/kubernetes [kubernetes/website]: https://git.k8s.io/website ## Summary {#summary} Currently it is not possible to filter for issues or PRs that are related to CVEs announced by kubernetes. This KEP addresses this concern by labelling this issues or PRs with the new label **official-cve-feed** using the automation. The in-scope issues are the closed issues for which there is a CVE ID and is officially announced as a Kubernetes CVE by SRC in the past. ## Motivation {#motivation} With the growing number of eyes on Kubernetes, the number of CVEs related to Kubernetes have increased. Although most CVEs are regularly fixed that directly or indirectly or transitively impact Kubernetes, there is no single place to programmatically subscribe or pull the data of fixed CVEs, for the end users of Kubernetes. Current options are either broken or incomplete. An auto-refreshing CVE feed will allow end users to programmatically fetch the list of CVEs and allow them to get up to date information from kubernetes community. ### Goals {#goals} Create a periodically auto-refreshing list of official Kubernetes CVEs ## Proposal {#proposal} ### Pre-requisites - [x] https://github.com/kubernetes/test-infra/pull/23428 - [x] Search and Identify closed issues that have a CVE ID e.g. CVE-1001-12345 in the issue description or summary (This search [filter](https://github.com/kubernetes/kubernetes/issues?q=is%3Aissue+in%3Abody+%22CVSS%3A3.%22+label%3Acommittee%2Fsecurity-response+is%3Aclosed+) is giving the most accurate data so far) - [x] Label those issues with `official-cve-feed` using https://docs.github.com/en/rest/reference/issues REST API - [x] https://github.com/kubernetes/committee-security-response/pull/133 ### Goals - Generate a JSON document using the results from the filtered label on `k/k` repo. - Create a Prow job to periodically generate this JSON document. - Update the JSON doc when needed (e.g. when a new CVE is announced) in `k/website` - Using Hugo, publish the list from this JSON document on official k8s website ### Non-Goals - Triage and vulnerability disclosure: This will continue to be done by SRC - CVEs that are identified in build time dependencies and container images. Only official CVEs announced by SRC will be published in the feed. ### User Stories (Optional)  #### Story 1 As a K8s end user, I want a list of CVEs with relevant information that I can fetch programmatically, so I can understand when new CVEs are announced #### Story 2 As a K8s maintainer, I want to create a process that auto-updates CVE feed, when SRC announces new CVEs such that I do not have to do extra work to maintain this feed manually ### Risks and Mitigations #### JSON blob construction will fail If this happens, we expect the job too fail. If blob construction fails, the failure will alert the owners of this feature and we will take action as needed. If the failure can not be fixed in a reasonable amount of time, the CVE feed will be stale until it is fixed. In case of urgency from community to get the refreshed feed, JSON blob will be manually updated via usual PR review and approval process. #### Urgent CVE feed refresh In some extenuating circumstances, we may need to update the CVE feed within minutes of the official CVE announcement, instead of waiting for the merge based or periodical website rebuild. In those situations, manual updates to JSON blob using usual PR reviews and approval process can be implemented. ### Storage of CVE feed blob There are two options to store the CVE feed JSON blob: * __Google Cloud Bucket__: A new google cloud bucket can be created where the CVE feed is written and read using `gsutil` tool. * __Git repository__: Store it as a version controlled artifact in one of the kubernetes org websites. Google cloud option has an advantage of transparent updates to JSON blob where the prow job run will be identical everytime. The disadvantage with Google Cloud bucket is that, it will have an unofficial looking URL which would be hard for an end user to deciper it's authenticity and provenance. Advantage with git repository especially `k/website` hosts the JSON blob the domain name in the URL would be `k8s.io/static/security/official-cve-feed.json` which is much more recognizable, intuitive in terms of trust, TLS enabled and unlikely to be spoofed. The disadvantage though is that this might get delayed by PR review and approval process. However, this can be prevented through use of `skip-review` label. ## Design Details Prow job that automates PR creation to k/website when a new CVE is announced will be used to keep the CVE feed always updated. A wrapper will be need to be implemented based on: https://github.com/kubernetes/test-infra/tree/master/robots/pr-creator The flow will look something like this: - Prow job creates JSON blob based on the information found using github label `/official-cve-feed` applied to `k/k` repo - It compares the new generated JSON blob with existing JSON blob on `k/website` page - JSON blob will be hosted in (https://github.com/kubernetes/website/tree/main/static/security/official-cve-feed.json) - If generated JSON blob is different, than existing JSON blob (including blob does not exist), a new PR is created to update that JSON blob - Example Prow job could be something like this: https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/test-infra/test-infra-trusted.yaml#L604-L677 - The directory under which the JSON blob exists, will also have an OWNERS file made of aliases. The OWNERS file can include SRC alias `security-response-committee` and a new alias for `sig-security-tooling`. This will allow manual modification when needed. - `skip-review` label will need to be added to k/website. This can be used for PRs created by `pr-creator` bot to reduce / bypass approver and reviewer availability ### Test Plan  ### Graduation Criteria  ### Upgrade / Downgrade Strategy Not applicable ### Version Skew Strategy Not applicable ## Production Readiness Review Questionnaire Not applicable  ### Feature Enablement and Rollback  ###### How can this feature be enabled / disabled in a live cluster?  - [ ] Feature gate (also fill in values in `kep.yaml`) - Feature gate name: - Components depending on the feature gate: - [ ] Other - Describe the mechanism: - Will enabling / disabling the feature require downtime of the control plane? - Will enabling / disabling the feature require downtime or reprovisioning of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). ###### Does enabling the feature change any default behavior?  ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?  ###### What happens if we reenable the feature if it was previously rolled back? ###### Are there any tests for feature enablement/disablement?  ### Rollout, Upgrade and Rollback Planning  ###### How can a rollout or rollback fail? Can it impact already running workloads?  ###### What specific metrics should inform a rollback?  ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?  ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?  ### Monitoring Requirements  ###### How can an operator determine if the feature is in use by workloads?  ###### How can someone using this feature know that it is working for their instance?  - [ ] Events - Event Reason: - [ ] API .status - Condition name: - Other field: - [ ] Other (treat as last resort) - Details: ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?  ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?  - [ ] Metrics - Metric name: - [Optional] Aggregation method: - Components exposing the metric: - [ ] Other (treat as last resort) - Details: ###### Are there any missing metrics that would be useful to have to improve observability of this feature?  ### Dependencies  ###### Does this feature depend on any specific services running in the cluster?  ### Scalability  ###### Will enabling / using this feature result in any new API calls?  ###### Will enabling / using this feature result in introducing new API types?  ###### Will enabling / using this feature result in any new calls to the cloud provider?  ###### Will enabling / using this feature result in increasing size or count of the existing API objects?  ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?  ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?  ### Troubleshooting  ###### How does this feature react if the API server and/or etcd is unavailable? ###### What are other known failure modes?  ###### What steps should be taken if SLOs are not being met to determine the problem? ## Implementation History  ## Drawbacks  ## Alternatives  ## Infrastructure Needed (Optional)