PSP++ Extension for Runtime Classes

# PSP++ Extension for Runtime Classes --- title: gate-runtime-classes-with-scc authors: - "@haircommander" - "@mrunalp" reviewers: - "@deads2k" - "@sttts" approvers: - "@deads2k" - "@sttts" - "@mrunalp" creation-date: 2020/12/04 last-updated: 2020/12/11 status: provisional --- # Gate Runtime Classes with SCC ## Release Signoff Checklist - [ ] Enhancement is `implementable` - [ ] Design details are appropriately documented from clear requirements - [ ] Test plan is defined - [ ] Graduation criteria for dev preview, tech preview, GA - [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) ## Summary Create a PSP++ extension that is aware of specific runtime classes Openshift will support. ## Motivation [PSP++](https://docs.google.com/document/d/1dpfDF3Dk4HhbQe74AyCpzUYMjp4ZhiEgGXSMpVWLlqQ/edit#) is a mechanism to gate namespaces from having access to different security context fields. However, the runtime classes that will be created for (Openshift Sandboxed Containers)[https://github.com/openshift/enhancements/pull/677] and openshift-builder runtime classes will change the risk of some items in the security context object. Luckily, the PSP++ proposal includes support for adding extensions that change the risk level of fields. ### Goals - Create a PSP++ extension for allowing RunAsUser/Capabilities in the baseline/restricted profiles for Openshift Sandboxed Containers and openshift-builder runtime classes ### Non-Goals - Updating the SCC to be runtime class aware. - Improving SCC/PSP++ to gate users from accessing the HighPerformance hooks (also known as openshift-low-latency). - Changing behavior of the default runtime class. - Adding new runtime classes ## Proposal [RuntimeClasses](#runtime-classes) is a feature in Kubernetes that allows a user to request a certain runtime configuration. CRI-O has enhanced this feature to not only branch on different runtimes, but customize the way the pod is run. Allowing this customization creates the opportunity for users to gain access to different features previously gated for security. More concretely, the following runtime classes are currently supported by CRI-O, and will eventually be added to Openshift (in follow-up enhancements): - `openshift-builder`: Allows users to allocate a new user namespace for their pod, as well as give access to /dev/fuse. This runtime class mitigates the risk of the Capabilities field. - `openshift-sandboxed-containers`: Allows for pods to be run as a kernel-separated container by a VM based runtime, like [kata](https://katacontainers.io/). ### User Stories [optional] #### Story 1 As a cluster admin, I want my users using kata containers to be able to RunAsUser root, without being in the privileged policy level #### Story 2 As a cluster admin, I want to allow my unprivileged builds to use CAP_CHROOT without being in the privileged policy level. ### Implementation Details/Notes/Constraints [optional] Note: none of this document discusses the inherent risk of the `openshift-low-latency` runtime class. After extensive discussion, it was deemed to be out of scope of SCC/PSP++. This is because the unique attack vector of the `openshift-low-latency` runtime class is pods are able to request exclusive access to CPUs. A malicious user can request a sufficient number of CPUs that restricts the other processes on the node from accessing them. This risk, while valid, is not mitigated by any of the security context fields, but rather needs an external quota validation mechanism (some way to verify users requesting a CPUSet is allowed to do so,and is not requesting too many). This will be done as a separate enhancement. ### Risks and Mitigations - There does exist a risk of relying on PSP++, which is a very young proposal (at the time of writing this enhancement, the proposal is not yet at KEP stage) - This is partially advantageous, as we can drive our requirements into the new proposal to make sure our needs are met. ## Design Details ### Open Questions ### Test Plan - unit tests where appropriate - e2e tests should be added to verify functionality along with CRI-O (the other entity validating against runtime classes). ### Graduation Criteria **Note:** *Section not required until targeted at a release.* Define graduation milestones. These may be defined in terms of API maturity, or as something else. Initial proposal should keep this high-level with a focus on what signals will be looked at to determine graduation. Consider the following in developing the graduation criteria for this enhancement: - Maturity levels - [`alpha`, `beta`, `stable` in upstream Kubernetes](#maturity-levels) - `Dev Preview`, `Tech Preview`, `GA` in OpenShift - [Deprecation policy](#deprecation-policy) Clearly define what graduation means by either linking to the [API doc definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning), or by redefining what graduation means. In general, we try to use the same stages (alpha, beta, GA), regardless how the functionality is accessed. [maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions [deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/ #### Examples These are generalized examples to consider, in addition to the aforementioned [maturity levels](#maturity-levels). ##### Dev Preview -> Tech Preview - Ability to utilize the enhancement end to end - End user documentation, relative API stability - Sufficient test coverage - Gather feedback from users rather than just developers ##### Tech Preview -> GA - More testing (upgrade, downgrade, scale) - Sufficient time for feedback - Available by default **For non-optional features moving to GA, the graduation criteria must include end to end tests.** ##### Removing a deprecated feature - Announce deprecation and support policy of the existing feature - Deprecate the feature ### Upgrade / Downgrade Strategy ##### Upgrade: The upgrade and downgrade behavior is partially dependent on how the extensions are introduced into the kube-apiserver. However, there are some risks that should be considered: - We must ensure users who gain access to this feature also are actually using the runtime class that is specified (the risk is properly mitigated) ##### Downgrade: TODO think about this more ### Version Skew Strategy How will the component handle version skew with other components? What are the guarantees? Make sure this is in the test plan. Consider the following in developing a version skew strategy for this enhancement: - During an upgrade, we will always have skew among components, how will this impact your work? - Does this enhancement involve coordinating behavior in the control plane and in the kubelet? How does an n-2 kubelet without this feature available behave when this feature is used? - Will any other components on the node change? For example, changes to CSI, CRI or CNI may require updating that component before the kubelet. ## Implementation History Major milestones in the life cycle of a proposal should be tracked in `Implementation History`. ## Drawbacks The idea is to find the best form of an argument why this enhancement should _not_ be implemented. ## Alternatives Similar to the `Drawbacks` section the `Alternatives` section is used to highlight and record other possible approaches to delivering the value proposed by an enhancement. ## Infrastructure Needed [optional] Use this section if you need things from the project. Examples include a new subproject, repos requested, github details, and/or testing infrastructure. Listing these here allows the community to get the process for these resources started right away.