# OPA Gatekeeper Detailed Troubleshooting <!-- START doctoc generated TOC please keep comment here to allow auto update --> <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> **Table of Contents** - [What is OPA Gatekeeper?](#what-is-opa-gatekeeper) - [What is the motivation behind using this kind of Policy Engine?](#what-is-the-motivation-behind-using-this-kind-of-policy-engine) - [Integration Process of using OPA Gatekeeper in Trendyol](#integration-process-of-using-opa-gatekeeper-in-trendyol) - [Troubleshooting Details](#troubleshooting-details) - [Part 1](#part-1) - [Environment](#environment) - [Description](#description) - [Scenarios](#scenarios) - [Scenario 1: Create some Kubernetes Resources except for Namespace](#scenario-1-create-some-kubernetes-resources-except-for-namespace) - [Scenario 2: Reboot one of the Master Nodes and wait for them to work properly](#scenario-2-reboot-one-of-the-master-nodes-and-wait-for-them-to-work-properly) - [Scenario 3: Restart all the nodes within the cluster](#scenario-3-restart-all-the-nodes-within-the-cluster) - [Part 2](#part-2) - [Environment](#environment-1) - [Description](#description-1) - [Scenarios](#scenarios-1) - [Scenario 1: Restart all the nodes within the cluster](#scenario-1-restart-all-the-nodes-within-the-cluster) - [Scenario 2: Narrow down the Mutating Webhook Scope from `*` to `pods`](#scenario-2-narrow-down-the-mutating-webhook-scope-from--to-pods) - [Scenario 3: Set values for `timeout` and `sideEffects` fields to Mutating Webhook Configuration](#scenario-3-set-values-for-timeout-and-sideeffects-fields-to-mutating-webhook-configuration) - [Scenario 4: Exempting Namespaces](#scenario-4-exempting-namespaces) - [4.1. Exempting Namespaces from Gatekeeper using Config resource](#41-exempting-namespaces-from-gatekeeper-using-config-resource) - [4.2. Exempting Namespaces from the Gatekeeper Admission Webhook using --exempt-namespace flag](#42-exempting-namespaces-from-the-gatekeeper-admission-webhook-using---exempt-namespace-flag) - [Part 3](#part-3) - [Environment](#environment-2) - [Description](#description-2) - [Scenarios](#scenarios-2) - [Scenario 1: Restart all the nodes within the cluster](#scenario-1-restart-all-the-nodes-within-the-cluster-1) - [Solutions](#solutions) - [Known Risks](#known-risks) - [References](#references) <!-- END doctoc generated TOC please keep comment here to allow auto update --> ## What is OPA Gatekeeper? ![gatekeeper_arch](https://i.imgur.com/lFmksmT.png) Kubernetes allows decoupling policy decisions from the API server by means of admission controller webhooks to intercept admission requests before they are persisted as objects in Kubernetes. Gatekeeper was created to enable users to customize admission control via configuration, not code and to bring awareness of the cluster’s state, not just the single object under evaluation at admission time. Gatekeeper is a customizable admission webhook for Kubernetes that enforces policies executed by the Open Policy Agent (OPA), a policy engine for Cloud Native environments hosted by CNCF. To get more detail, please [see](https://open-policy-agent.github.io/gatekeeper/website/docs/). ## What is the motivation behind using this kind of Policy Engine? With the upcoming deprecation and subsequent removal of Pod Security Policies (PSPs) in Kubernetes, the time is near to find suitable alternatives. Those alternatives, it seems clear at present anyway, will need to be sourced from outside the Kubernetes project itself as there will be no replacement provided. The two leading CNCF projects which are prime candidates for PSP replacement are Open Policy Agent (OPA) via Gatekeeper and Kyverno, each with their own strengths and weaknesses. If you want to learn about the process that we did before deciding which one fits our needs, please [see](https://github.com/developer-guy/policy-as-code-war). Also you can visit the following links: - [Kubernetes Policy Comparison: OPA/Gatekeeper vs Kyverno](https://neonmirrors.net/post/2021-02/kubernetes-policy-comparison-opa-gatekeeper-vs-kyverno/) - [Kubernetes Policy Management Tools Compared: OPA with Gatekeeper vs. Kyverno](https://technologyconversations.com/2021/07/01/kubernetes-policy-management-tools-compared-opa-with-gatekeeper-vs-kyverno/) ## Integration Process of using OPA Gatekeeper in Trendyol First, we started to using OPA Gatekeeper as `Validating` webhook. Because we only want to enforce some organizational policies across Kubernetes clusters. We are maintaining this project from the [k8s-common](https://gitlab.trendyol.com/platform/devops/base/services/k8s-common) repository under the path `security/gatekeeper`. We are also maintaining the `ConstraintTemplates` that we want to apply from the the [gatekeeper-library](https://gitlab.trendyol.com/platform/base/apps/gatekeeper-library) which is a fork of the official `Gatekeeper Library` repository. Second, we enabled the `Mutating` webhook feature to fullfil `Pod` resources with the recommended security fields. This support came with the version [v3.4.0](https://github.com/open-policy-agent/gatekeeper/releases/tag/v3.4.0). <span style="color:red">**So, we upgraded OPA Gatekeeper from `v3.3.0` to `v3.4.0` to be able to use this feature and this is where the problems that we encounter until today begins.**</span> ## Troubleshooting Details We divided this process into the several parts and a several scenarios within that parts. > There is an issue on GitLab that we keep up the whole process of troubleshooting OPA Gatekeeper, please [see](https://gitlab.trendyol.com/platform/devops/base/teams/pe-container/-/issues/133). ### Part 1 #### Environment * Kubernetes v1.16.11 (poc-p1-2platform-mars-os) > Retrieved with the following command: > k version skew -ojson | jq -r '.serverVersion.gitVersion' * OPA Gatekeeper v3.4.0 #### Description First, we talked about the internals of the `OPA Gatekeeper` project and the secret sauce behind the project. So, it allows us to understand how it is going on behind the scenes. The following documents are what we followed during the learning of the internals of the `OPA Gatekeeper` project. * https://open-policy-agent.github.io/gatekeeper/website/docs/ * https://kubernetes.io/blog/2019/08/06/opa-gatekeeper-policy-and-governance-for-kubernetes/ Then, we scaled down the replica count of the deployment `gatekeeper-controller-manager` to 0 to simulate errors as a first step. We have discussed this issue in **4** different scenarios: #### Scenarios ##### Scenario 1: Create some Kubernetes Resources except for Namespace We notice that Gatekeeper should not deny the requests if they are not related to creating or updating namespaces. So, we created a Pod called `alpine` within the default namespace. We expected it to work because the value of the field called `failurePolicy` is `Ignore` for all the resources except for namespace. ```shell $ kubectl run alpine --namespace default --image=alpine --restart='Never' -- sh -c "sleep 600" pod/alpine created ``` Then, we tried to create the namespace, and we expected it not to work. ```shell $ kubectl create namespace testing Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label. gatekeeper.sh": Post https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s: no endpoints available for service "gatekeeper-webhook-service" ``` As a result, at the end of this scenario, no problem was observed. Everything worked as we expected. ##### Scenario 2: Reboot one of the Master Nodes and wait for them to work properly We selected one of the master nodes and made an ssh connection to that machine to run the `reboot` command. ```shell $ ssh user@node # reboot ``` We watched the processes that belong to the Master Node from another terminal to see what will happen. ```shell $ watch kubectl get pods --namespace kube-system ``` As a result, at the end of this scenario, no problem was observed. Everything worked as we expected. ###### Scenario 3: Restart all the nodes within the cluster We restarted all the nodes that belong to the cluster and watched the state of the cluster to see what will happen. ```shell $ ansible all -i '10.43.202.20,10.43.203.199,10.43.200.160,10.43.203.168,10.43.201.31,10.43.202.233,10.43.202.152,10.43.201.133,10.43.201.159,10.43.204.21,10.43.202.71,10.43.203.169' -m ansible.builtin.command -a "sudo reboot" -b -u centos ``` > Retrieved with the following command: > $ kubectl get nodes -o jsonpath='{range .items[*]}{"name:"}{"\t"}{.metadata.name}{"\t"}{"ip:"}{"\t"}{.status.addresses[?(@.type=="InternalIP")].address}{"\n"}{end}' As a result, at the end of this scenario, no problem was observed. Everything worked as we expected. ### Part 2 #### Environment * Kubernetes v1.16.11 (poc-p1-2platform-mars-os) * OPA Gatekeeper v3.4.0 #### Description We did a second troubleshoot event on the 25th Friday of June. This time we didn't scale down the deployment of the OPA Gatekeeper project, so all the replicas were up and running. #### Scenarios ##### Scenario 1: Restart all the nodes within the cluster We restarted all the nodes that belong to the cluster and watched the state of the cluster to see what will happen by using the following command: ```shell $ ansible all -i '10.43.202.20,10.43.203.199,10.43.200.160,10.43.203.168,10.43.201.31,10.43.202.233,10.43.202.152,10.43.201.133,10.43.201.159,10.43.204.21,10.43.202.71,10.43.203.169' -m ansible.builtin.command -a "sudo reboot" -b -u centos ``` As a result; - Once all the nodes restarted, we encountered a problem at this time. We couldn't reach the API Server even though all the API Server pods have been up and running. - We couldn't create Pods due to timeout errors. - All the replicas of the OPA Gatekeeper went into CrashLoopBackOff state. - We couldn't fetch the logs of the pods, we got an error something like this: ```shell Error from server (InternalError): Internal error occurred: Authorization Error (user=kube-apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy ``` ![not_working_example](https://i.imgur.com/oM1yx8n.png) Once we removed the `MutatingWebhookConfiguration` configuration of the OPA Gatekeeper project, we noticed that all the things went back to normal immediately. ```shell $ kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io gatekeeper-mutating-webhook-configuration ``` ##### Scenario 2: Narrow down the Mutating Webhook Scope from `*` to `pods` By default, Mutating Webhook intercepts all the requests for all types of resources due to the `resources` field within the `rules` section set as `*` like the following: ```yaml rules: - apiGroups: - '*' apiVersions: - '*' operations: - CREATE - UPDATE resources: - '*' ``` So in the second scenario, we narrowed down the scope of the Mutating Webhook by setting the value of `resources` fields as `pods`, which means that it will only intercept requests for resources which is in type `Pod` against `CREATE` or `UPDATE` events, then we restarted all the nodes. ```yaml rules: - apiGroups: - '*' apiVersions: - '*' operations: - CREATE - UPDATE resources: - 'pods' ``` As a result; - All the replicas of the OPA Gatekeeper were worked as expected, just restarted a couple of times, did not go to CrashLoopBackOff state. - We were able to create Pods and fetch logs, etc. ![working_example](https://i.imgur.com/SXSDbHx.png) ##### Scenario 3: Set values for `timeout` and `sideEffects` fields to Mutating Webhook Configuration We noticed that there are no fields that exist in the Mutating Webhook Configuration, these are the explanations of both fields: ```shell $ kubectl explain mutatingwebhookconfigurations.webhooks.sideEffects SideEffects states whether this webhook has side effects. Acceptable values are: None, NoneOnDryRun (webhooks created via v1beta1 may also specify Some or Unknown). Webhooks with side effects MUST implement a reconciliation system, since a request may be rejected by a future step in the admission change and the side effects therefore need to be undone. Requests with the dryRun attribute will be auto-rejected if they match a webhook with sideEffects == Unknown or Some. $ kubectl explain mutatingwebhookconfigurations.webhooks.timeoutSeconds TimeoutSeconds specifies the timeout for this webhook. After the timeout passes, the webhook call will be ignored or the API call will fail based on the failure policy. The timeout value must be between 1 and 30 seconds. Default to 10 seconds. ``` We added some values to these fields, then we restarted all the nodes again. ```yaml sideEffects: None timeoutSeconds: 3 ``` As a result; - Nothing changed, same results with Scenario 1. ##### Scenario 4: Exempting Namespaces In OPA Gatekeeper, there is a feature called [Exempting Namespaces](https://open-policy-agent.github.io/gatekeeper/website/docs/exempt-namespaces/). We can use this feature to do two different things: ###### 4.1. Exempting Namespaces from Gatekeeper using Config resource The config resource can be used as follows to exclude namespaces from certain processes for all constraints in the cluster. To exclude namespaces at a constraint level, use `excludedNamespaces` in the constraint instead. [[0]](https://open-policy-agent.github.io/gatekeeper/website/docs/exempt-namespaces#exempting-namespaces-from-gatekeeper-using-config-resource) ###### 4.2. Exempting Namespaces from the Gatekeeper Admission Webhook using --exempt-namespace flag In a nutshell, this feature helps us to protect some of the important namespaces being intercepted by the Admission Webhooks. So, we have to decide which namespaces are important for us and set them to the `--exempt-namespace` flag one by one. [[1]](https://open-policy-agent.github.io/gatekeeper/website/docs/exempt-namespaces#exempting-namespaces-from-the-gatekeeper-admission-webhook-using---exempt-namespace-flag) In this scenario, we used this flag like the following in deployment of the OPA Gatekeeper project: ![exempt_namespaces](https://i.imgur.com/uQI5vZd.png) Then, we restarted all the nodes. As a result: - Same results with Scenario 2, everything worked properly. ### Part 3 #### Environment * Kubernetes v1.21.2 (p-platform-p1-3moon) * OPA Gatekeeper v3.5.1 #### Description We installed the OPA Gatekeeper using it's Helm Chart at this time. We did nothing different than that. ##### Scenarios ###### Scenario 1: Restart all the nodes within the cluster We restarted all the nodes that belong to the cluster and watched the state of the cluster to see what will happen by using the following command: ```shell $ ansible all -i '10.147.0.154 10.147.0.158 10.147.0.157 10.147.0.161 10.147.0.159 10.147.0.152 10.147.0.155 10.147.0.153 10.147.0.156' -m ansible.builtin.command -a "sudo reboot" -b -u pe ``` As a result; - Once all the nodes restarted, we encountered a problem at this time. We couldn't reach the API Server even though all the API Server pods have been up and running. - The replicas of the calico-node went into Unknown state. - The replicas of the OPA Gatekeeper went into Unknown state due to calico nodes went into Unknown state. These are the some errors that we saw while inspecting one of the calico pods. ```shell calico-node-x82mp calico-node 2021-07-28 09:37:14.825 [INFO][58] felix/route_table.go 1096: Failed to access interface because it doesn't exist. error=Link not found ifaceName="cali7455238789c" ifaceRegex="^cali.*" ipVersion=0x4 ERROR: Error accessing the Calico datastore: context deadline exceeded Warning Unhealthy 21m kubelet Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused Warning Unhealthy 21m (x3 over 21m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory ``` Once we remove the ValidatingAdmissionWebhook resource of the OPA Gatekeeper, everything started to work fine. ###### Scenario 2: Intercept only Pod resources instead of everything We know that by default Validating and Mutating resources are intercepting all the resource types in the cluster, but we check our current policies and we saw that we only take care of the Pod's events at the moment, so we can change our rules from `'*'` to `'pods'`. ![*_to_pods](https://i.imgur.com/WP428yE.png) > https://gitlab.trendyol.com/platform/devops/base/services/k8s-common/-/commit/08dc213d10e2d4a7bbf316ae291d9be12c94d53b Then, we restarted all the nodes again to what will happen. As a result: * Everything worked as expected. ## Solutions As a result, we can tackle the problem with the following improvements: 1. A feature called _exempting namespaces_ in OPA Gatekeeper prevents some of the important namespaces from being intercepted by the webhooks. So we have to add a label to the namespaces that are important for us, such as kube-system and kube-public . This label is admission.gatekeeper.sh/ignore.So, we have to do two things: **1.1.** Add a label to the namespace, admission.gatekeeper.sh/ignore. **1.2.** Add the namespace which we labeled to the arguments of the deployment of gatekeeper-controller-manager with exempt-namespace flag. 2. Using Helm Chart of OPA Gatekeeper or the plain YAML file of OPA Gatekeeper instead of in the k8s-common repository. <details> <summary>Current OPA Gatekeeper YAML</summary> ```yaml= apiVersion: v1 kind: Namespace metadata: labels: admission.gatekeeper.sh/ignore: no-self-managing control-plane: controller-manager gatekeeper.sh/system: "yes" name: gatekeeper-system --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: annotations: controller-gen.kubebuilder.io/version: v0.3.0 creationTimestamp: null name: assign.mutations.gatekeeper.sh spec: group: mutations.gatekeeper.sh names: kind: Assign listKind: AssignList plural: assign singular: assign scope: Cluster validation: openAPIV3Schema: description: Assign is the Schema for the assign API properties: apiVersion: description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' type: string kind: description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' type: string metadata: type: object spec: description: AssignSpec defines the desired state of Assign properties: applyTo: description: 'INSERT ADDITIONAL SPEC FIELDS - desired state of cluster Important: Run "make" to regenerate code after modifying this file' items: description: ApplyTo determines what GVKs items the mutation should apply to. Globs are not allowed. properties: groups: items: type: string type: array kinds: items: type: string type: array versions: items: type: string type: array type: object type: array location: type: string match: properties: excludedNamespaces: items: type: string type: array kinds: items: description: Kinds accepts a list of objects with apiGroups and kinds fields that list the groups/kinds of objects to which the mutation will apply. If multiple groups/kinds objects are specified, only one match is needed for the resource to be in scope. properties: apiGroups: description: APIGroups is the API groups the resources belong to. '*' is all groups. If '*' is present, the length of the slice must be one. Required. items: type: string type: array kinds: items: type: string type: array type: object type: array labelSelector: description: A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all objects. A null label selector matches no objects. properties: matchExpressions: description: matchExpressions is a list of label selector requirements. The requirements are ANDed. items: description: A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values. properties: key: description: key is the label key that the selector applies to. type: string operator: description: operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist. type: string values: description: values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch. items: type: string type: array required: - key - operator type: object type: array matchLabels: additionalProperties: type: string description: matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed. type: object type: object namespaceSelector: description: A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all objects. A null label selector matches no objects. properties: matchExpressions: description: matchExpressions is a list of label selector requirements. The requirements are ANDed. items: description: A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values. properties: key: description: key is the label key that the selector applies to. type: string operator: description: operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist. type: string values: description: values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch. items: type: string type: array required: - key - operator type: object type: array matchLabels: additionalProperties: type: string description: matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed. type: object type: object namespaces: items: type: string type: array scope: description: ResourceScope is an enum defining the different scopes available to a custom resource type: string required: - scope type: object parameters: properties: assign: description: Assign.value holds the value to be assigned type: object x-kubernetes-preserve-unknown-fields: true ifIn: description: IfIn Only mutate if the current value is in the supplied list items: type: string type: array ifNotIn: description: IfNotIn Only mutate if the current value is NOT in the supplied list items: type: string type: array pathTests: items: description: "PathTests allows the user to customize how the mutation works if parent paths are missing. It traverses the list in order. All sub paths are tested against the provided condition, if the test fails, the mutation is not applied. All `subPath` entries must be a prefix of `location`. Any glob characters will take on the same value as was used to expand the matching glob in `location`. \n Available Tests: * MustExist - the path must exist or do not mutate * MustNotExist - the path must not exist or do not mutate" properties: condition: enum: - MustExist - MustNotExist type: string subPath: type: string type: object type: array type: object type: object status: description: AssignStatus defines the observed state of Assign type: object type: object version: v1alpha1 versions: - name: v1alpha1 served: true storage: true status: acceptedNames: kind: "" plural: "" conditions: [] storedVersions: [] --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: annotations: controller-gen.kubebuilder.io/version: v0.3.0 creationTimestamp: null name: assignmetadata.mutations.gatekeeper.sh spec: group: mutations.gatekeeper.sh names: kind: AssignMetadata listKind: AssignMetadataList plural: assignmetadata singular: assignmetadata scope: Cluster validation: openAPIV3Schema: description: AssignMetadata is the Schema for the assignmetadata API properties: apiVersion: description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' type: string kind: description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' type: string metadata: type: object spec: description: AssignMetadataSpec defines the desired state of AssignMetadata properties: location: type: string match: properties: excludedNamespaces: items: type: string type: array kinds: items: description: Kinds accepts a list of objects with apiGroups and kinds fields that list the groups/kinds of objects to which the mutation will apply. If multiple groups/kinds objects are specified, only one match is needed for the resource to be in scope. properties: apiGroups: description: APIGroups is the API groups the resources belong to. '*' is all groups. If '*' is present, the length of the slice must be one. Required. items: type: string type: array kinds: items: type: string type: array type: object type: array labelSelector: description: A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all objects. A null label selector matches no objects. properties: matchExpressions: description: matchExpressions is a list of label selector requirements. The requirements are ANDed. items: description: A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values. properties: key: description: key is the label key that the selector applies to. type: string operator: description: operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist. type: string values: description: values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch. items: type: string type: array required: - key - operator type: object type: array matchLabels: additionalProperties: type: string description: matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed. type: object type: object namespaceSelector: description: A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all objects. A null label selector matches no objects. properties: matchExpressions: description: matchExpressions is a list of label selector requirements. The requirements are ANDed. items: description: A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values. properties: key: description: key is the label key that the selector applies to. type: string operator: description: operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist. type: string values: description: values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch. items: type: string type: array required: - key - operator type: object type: array matchLabels: additionalProperties: type: string description: matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed. type: object type: object namespaces: items: type: string type: array scope: description: ResourceScope is an enum defining the different scopes available to a custom resource type: string required: - scope type: object parameters: properties: assign: description: Assign.value holds the value to be assigned type: object x-kubernetes-preserve-unknown-fields: true type: object type: object status: description: AssignMetadataStatus defines the observed state of AssignMetadata type: object type: object version: v1alpha1 versions: - name: v1alpha1 served: true storage: true status: acceptedNames: kind: "" plural: "" conditions: [] storedVersions: [] --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: annotations: controller-gen.kubebuilder.io/version: v0.3.0 creationTimestamp: null labels: gatekeeper.sh/system: "yes" name: configs.config.gatekeeper.sh spec: group: config.gatekeeper.sh names: kind: Config listKind: ConfigList plural: configs singular: config scope: Namespaced validation: openAPIV3Schema: description: Config is the Schema for the configs API properties: apiVersion: description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' type: string kind: description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' type: string metadata: type: object spec: description: ConfigSpec defines the desired state of Config properties: match: description: Configuration for namespace exclusion items: properties: excludedNamespaces: items: type: string type: array processes: items: type: string type: array type: object type: array readiness: description: Configuration for readiness tracker properties: statsEnabled: type: boolean type: object sync: description: Configuration for syncing k8s objects properties: syncOnly: description: If non-empty, only entries on this list will be replicated into OPA items: properties: group: type: string kind: type: string version: type: string type: object type: array type: object validation: description: Configuration for validation properties: traces: description: List of requests to trace. Both "user" and "kinds" must be specified items: properties: dump: description: Also dump the state of OPA with the trace. Set to `All` to dump everything. type: string kind: description: Only trace requests of the following GroupVersionKind properties: group: type: string kind: type: string version: type: string type: object user: description: Only trace requests from the specified user type: string type: object type: array type: object type: object status: description: ConfigStatus defines the observed state of Config type: object type: object version: v1alpha1 versions: - name: v1alpha1 served: true storage: true status: acceptedNames: kind: "" plural: "" conditions: [] storedVersions: [] --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: annotations: controller-gen.kubebuilder.io/version: v0.3.0 creationTimestamp: null labels: gatekeeper.sh/system: "yes" name: constraintpodstatuses.status.gatekeeper.sh spec: group: status.gatekeeper.sh names: kind: ConstraintPodStatus listKind: ConstraintPodStatusList plural: constraintpodstatuses singular: constraintpodstatus scope: Namespaced validation: openAPIV3Schema: description: ConstraintPodStatus is the Schema for the constraintpodstatuses API properties: apiVersion: description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' type: string kind: description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' type: string metadata: type: object status: description: ConstraintPodStatusStatus defines the observed state of ConstraintPodStatus properties: constraintUID: description: Storing the constraint UID allows us to detect drift, such as when a constraint has been recreated after its CRD was deleted out from under it, interrupting the watch type: string enforced: type: boolean errors: items: description: Error represents a single error caught while adding a constraint to OPA properties: code: type: string location: type: string message: type: string required: - code - message type: object type: array id: type: string observedGeneration: format: int64 type: integer operations: items: type: string type: array type: object type: object version: v1beta1 versions: - name: v1beta1 served: true storage: true status: acceptedNames: kind: "" plural: "" conditions: [] storedVersions: [] --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: annotations: controller-gen.kubebuilder.io/version: v0.3.0 creationTimestamp: null labels: gatekeeper.sh/system: "yes" name: constrainttemplatepodstatuses.status.gatekeeper.sh spec: group: status.gatekeeper.sh names: kind: ConstraintTemplatePodStatus listKind: ConstraintTemplatePodStatusList plural: constrainttemplatepodstatuses singular: constrainttemplatepodstatus scope: Namespaced validation: openAPIV3Schema: description: ConstraintTemplatePodStatus is the Schema for the constrainttemplatepodstatuses API properties: apiVersion: description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' type: string kind: description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' type: string metadata: type: object status: description: ConstraintTemplatePodStatusStatus defines the observed state of ConstraintTemplatePodStatus properties: errors: items: description: CreateCRDError represents a single error caught during parsing, compiling, etc. properties: code: type: string location: type: string message: type: string required: - code - message type: object type: array id: description: 'Important: Run "make" to regenerate code after modifying this file' type: string observedGeneration: format: int64 type: integer operations: items: type: string type: array templateUID: description: UID is a type that holds unique ID values, including UUIDs. Because we don't ONLY use UUIDs, this is an alias to string. Being a type captures intent and helps make sure that UIDs and names do not get conflated. type: string type: object type: object version: v1beta1 versions: - name: v1beta1 served: true storage: true status: acceptedNames: kind: "" plural: "" conditions: [] storedVersions: [] --- apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: creationTimestamp: null labels: controller-tools.k8s.io: "1.0" gatekeeper.sh/system: "yes" name: constrainttemplates.templates.gatekeeper.sh spec: group: templates.gatekeeper.sh names: kind: ConstraintTemplate plural: constrainttemplates scope: Cluster subresources: status: {} validation: openAPIV3Schema: properties: apiVersion: description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' type: string kind: description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' type: string metadata: type: object spec: properties: crd: properties: spec: properties: names: properties: kind: type: string shortNames: items: type: string type: array type: object validation: type: object type: object type: object targets: items: properties: libs: items: type: string type: array rego: type: string target: type: string type: object type: array type: object status: properties: byPod: items: properties: errors: items: properties: code: type: string location: type: string message: type: string required: - code - message type: object type: array id: description: a unique identifier for the pod that wrote the status type: string observedGeneration: format: int64 type: integer type: object type: array created: type: boolean type: object version: v1beta1 versions: - name: v1beta1 served: true storage: true - name: v1alpha1 served: true storage: false status: acceptedNames: kind: "" plural: "" conditions: [] storedVersions: [] --- apiVersion: admissionregistration.k8s.io/v1beta1 kind: MutatingWebhookConfiguration metadata: creationTimestamp: null name: gatekeeper-mutating-webhook-configuration webhooks: - clientConfig: caBundle: Cg== service: name: gatekeeper-webhook-service namespace: gatekeeper-system path: /v1/mutate failurePolicy: Ignore name: mutation.gatekeeper.sh rules: - apiGroups: - '*' apiVersions: - '*' operations: - CREATE - UPDATE resources: - '*' --- apiVersion: v1 kind: ServiceAccount metadata: labels: gatekeeper.sh/system: "yes" name: gatekeeper-admin namespace: gatekeeper-system --- apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: annotations: seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*' labels: gatekeeper.sh/system: "yes" name: gatekeeper-admin spec: allowPrivilegeEscalation: false fsGroup: ranges: - max: 65535 min: 1 rule: MustRunAs requiredDropCapabilities: - ALL runAsUser: rule: MustRunAsNonRoot seLinux: rule: RunAsAny supplementalGroups: ranges: - max: 65535 min: 1 rule: MustRunAs volumes: - configMap - projected - secret - downwardAPI --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: creationTimestamp: null labels: gatekeeper.sh/system: "yes" name: gatekeeper-manager-role namespace: gatekeeper-system rules: - apiGroups: - "" resources: - events verbs: - create - patch - apiGroups: - "" resources: - secrets verbs: - create - delete - get - list - patch - update - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: null labels: gatekeeper.sh/system: "yes" name: gatekeeper-manager-role rules: - apiGroups: - '*' resources: - '*' verbs: - get - list - watch - apiGroups: - apiextensions.k8s.io resources: - customresourcedefinitions verbs: - create - delete - get - list - patch - update - watch - apiGroups: - config.gatekeeper.sh resources: - configs verbs: - create - delete - get - list - patch - update - watch - apiGroups: - config.gatekeeper.sh resources: - configs/status verbs: - get - patch - update - apiGroups: - constraints.gatekeeper.sh resources: - '*' verbs: - create - delete - get - list - patch - update - watch - apiGroups: - mutations.gatekeeper.sh resources: - '*' verbs: - create - delete - get - list - patch - update - watch - apiGroups: - policy resourceNames: - gatekeeper-admin resources: - podsecuritypolicies verbs: - use - apiGroups: - status.gatekeeper.sh resources: - '*' verbs: - create - delete - get - list - patch - update - watch - apiGroups: - templates.gatekeeper.sh resources: - constrainttemplates verbs: - create - delete - get - list - patch - update - watch - apiGroups: - templates.gatekeeper.sh resources: - constrainttemplates/finalizers verbs: - delete - get - patch - update - apiGroups: - templates.gatekeeper.sh resources: - constrainttemplates/status verbs: - get - patch - update - apiGroups: - admissionregistration.k8s.io resourceNames: - gatekeeper-validating-webhook-configuration resources: - validatingwebhookconfigurations verbs: - create - delete - get - list - patch - update - watch - apiGroups: - admissionregistration.k8s.io resourceNames: - gatekeeper-mutating-webhook-configuration resources: - mutatingwebhookconfigurations verbs: - create - delete - get - list - patch - update - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: gatekeeper.sh/system: "yes" name: gatekeeper-manager-rolebinding namespace: gatekeeper-system roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: gatekeeper-manager-role subjects: - kind: ServiceAccount name: gatekeeper-admin namespace: gatekeeper-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: gatekeeper.sh/system: "yes" name: gatekeeper-manager-rolebinding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: gatekeeper-manager-role subjects: - kind: ServiceAccount name: gatekeeper-admin namespace: gatekeeper-system --- apiVersion: v1 kind: Secret metadata: labels: gatekeeper.sh/system: "yes" name: gatekeeper-webhook-server-cert namespace: gatekeeper-system --- apiVersion: v1 kind: Service metadata: labels: gatekeeper.sh/system: "yes" name: gatekeeper-webhook-service namespace: gatekeeper-system spec: ports: - port: 443 targetPort: 8443 selector: control-plane: controller-manager gatekeeper.sh/operation: webhook gatekeeper.sh/system: "yes" --- apiVersion: apps/v1 kind: Deployment metadata: labels: control-plane: audit-controller gatekeeper.sh/operation: audit gatekeeper.sh/system: "yes" name: gatekeeper-audit namespace: gatekeeper-system spec: replicas: 1 selector: matchLabels: control-plane: audit-controller gatekeeper.sh/operation: audit gatekeeper.sh/system: "yes" template: metadata: annotations: container.seccomp.security.alpha.kubernetes.io/manager: runtime/default labels: control-plane: audit-controller gatekeeper.sh/operation: audit gatekeeper.sh/system: "yes" spec: imagePullSecrets: - name: ty-docker-registry automountServiceAccountToken: true containers: - args: - --operation=audit - --operation=status - --logtostderr command: - /manager env: - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name image: ${GATEKEEPER_IMAGE} imagePullPolicy: IfNotPresent livenessProbe: httpGet: path: /healthz port: 9090 name: manager ports: - containerPort: 8888 name: metrics protocol: TCP - containerPort: 9090 name: healthz protocol: TCP readinessProbe: httpGet: path: /readyz port: 9090 resources: limits: cpu: 1000m memory: 512Mi requests: cpu: 100m memory: 256Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - all readOnlyRootFilesystem: true runAsGroup: 999 runAsNonRoot: true runAsUser: 1000 nodeSelector: kubernetes.io/os: linux serviceAccountName: gatekeeper-admin terminationGracePeriodSeconds: 60 --- apiVersion: apps/v1 kind: Deployment metadata: labels: control-plane: controller-manager gatekeeper.sh/operation: webhook gatekeeper.sh/system: "yes" name: gatekeeper-controller-manager namespace: gatekeeper-system spec: replicas: 3 selector: matchLabels: control-plane: controller-manager gatekeeper.sh/operation: webhook gatekeeper.sh/system: "yes" template: metadata: annotations: container.seccomp.security.alpha.kubernetes.io/manager: runtime/default prometheus.io/port: "8888" prometheus.io/scrape: "true" labels: control-plane: controller-manager gatekeeper.sh/operation: webhook gatekeeper.sh/system: "yes" spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: gatekeeper.sh/operation operator: In values: - webhook topologyKey: kubernetes.io/hostname weight: 100 automountServiceAccountToken: true imagePullSecrets: - name: ty-docker-registry containers: - args: - --port=8443 - --logtostderr - --exempt-namespace=gatekeeper-system - --operation=webhook - --enable-mutation=true command: - /manager env: - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name image: ${GATEKEEPER_IMAGE} imagePullPolicy: IfNotPresent livenessProbe: httpGet: path: /healthz port: 9090 name: manager ports: - containerPort: 8443 name: webhook-server protocol: TCP - containerPort: 8888 name: metrics protocol: TCP - containerPort: 9090 name: healthz protocol: TCP readinessProbe: httpGet: path: /readyz port: 9090 resources: limits: cpu: 1000m memory: 512Mi requests: cpu: 100m memory: 256Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - all readOnlyRootFilesystem: true runAsGroup: 999 runAsNonRoot: true runAsUser: 1000 volumeMounts: - mountPath: /certs name: cert readOnly: true nodeSelector: kubernetes.io/os: linux serviceAccountName: gatekeeper-admin terminationGracePeriodSeconds: 60 volumes: - name: cert secret: defaultMode: 420 secretName: gatekeeper-webhook-server-cert --- apiVersion: admissionregistration.k8s.io/v1beta1 kind: ValidatingWebhookConfiguration metadata: creationTimestamp: null labels: gatekeeper.sh/system: "yes" name: gatekeeper-validating-webhook-configuration webhooks: - clientConfig: caBundle: Cg== service: name: gatekeeper-webhook-service namespace: gatekeeper-system path: /v1/admit failurePolicy: Ignore name: validation.gatekeeper.sh namespaceSelector: matchExpressions: - key: admission.gatekeeper.sh/ignore operator: DoesNotExist rules: - apiGroups: - '*' apiVersions: - '*' operations: - CREATE - UPDATE resources: - '*' sideEffects: None timeoutSeconds: 3 - clientConfig: caBundle: Cg== service: name: gatekeeper-webhook-service namespace: gatekeeper-system path: /v1/admitlabel failurePolicy: Fail name: check-ignore-label.gatekeeper.sh rules: - apiGroups: - "" apiVersions: - '*' operations: - CREATE - UPDATE resources: - namespaces sideEffects: None timeoutSeconds: 3 ``` </details> > You can use a Go tool that we developed to get difference between OPA Gatekeeper versions. > https://gitlab.trendyol.com/platform/base/poc/get-diff-report-between-yaml-files ![go-diff-report-between-yaml-files](https://i.imgur.com/vOVSvyF.png) ## What are we planning to do? We will move on with the second option and we will change our interception strategy from '*' to 'pods' now but here is the plan, if we decide to write a policy that takes care of different than Pod resource, we have to re-deploy OPA Gatekeeper to intercept that resource too before applying the policy at first place. You can follow up the whole process from the following PR: > https://gitlab.trendyol.com/platform/devops/base/services/k8s-common/-/merge_requests/80 ## What have we done so far? * We changed our installation method from plain YAML to Helm Chart. You can see all the details by following [link](https://gitlab.trendyol.com/platform/devops/base/services/k8s-common/-/merge_requests/80/commits). * We opened some issues to improve OPA Gatekeeper Helm chart and its libraries. * https://github.com/open-policy-agent/gatekeeper/pull/1408 * https://github.com/open-policy-agent/gatekeeper/pull/1425 * https://github.com/open-policy-agent/gatekeeper-library/pull/98 * https://github.com/open-policy-agent/gatekeeper/pull/1464 * We opened an issue about the problem that we encountered. * https://github.com/open-policy-agent/gatekeeper/issues/1472 ## Known Risks - Once we define some of the namespaces as exempt namespace, these namespaces will **no longer be intercepted** by OPA Gatekeeper which means that they will be never validated by the constraints. So, this brings us to another problem. The end users can run any privileged containers in namespaces we defined as the exempt namespaces because of that we don't have strict RBAC rules or taints on top of Kubernetes clusters right now. ## References - [OPA Gatekeeper: Policy and Governance for Kubernetes](https://kubernetes.io/blog/2019/08/06/opa-gatekeeper-policy-and-governance-for-kubernetes/) - [Getting Started OPA Gatekeeper](https://open-policy-agent.github.io/gatekeeper/website/docs/) - [Differences between OPA and Gatekeeper for Kubernetes Admission Control](https://www.infracloud.io/blogs/opa-and-gatekeeper/)