# OPA Gatekeeper Detailed Troubleshooting
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents**
- [What is OPA Gatekeeper?](#what-is-opa-gatekeeper)
- [What is the motivation behind using this kind of Policy Engine?](#what-is-the-motivation-behind-using-this-kind-of-policy-engine)
- [Integration Process of using OPA Gatekeeper in Trendyol](#integration-process-of-using-opa-gatekeeper-in-trendyol)
- [Troubleshooting Details](#troubleshooting-details)
- [Part 1](#part-1)
- [Environment](#environment)
- [Description](#description)
- [Scenarios](#scenarios)
- [Scenario 1: Create some Kubernetes Resources except for Namespace](#scenario-1-create-some-kubernetes-resources-except-for-namespace)
- [Scenario 2: Reboot one of the Master Nodes and wait for them to work properly](#scenario-2-reboot-one-of-the-master-nodes-and-wait-for-them-to-work-properly)
- [Scenario 3: Restart all the nodes within the cluster](#scenario-3-restart-all-the-nodes-within-the-cluster)
- [Part 2](#part-2)
- [Environment](#environment-1)
- [Description](#description-1)
- [Scenarios](#scenarios-1)
- [Scenario 1: Restart all the nodes within the cluster](#scenario-1-restart-all-the-nodes-within-the-cluster)
- [Scenario 2: Narrow down the Mutating Webhook Scope from `*` to `pods`](#scenario-2-narrow-down-the-mutating-webhook-scope-from--to-pods)
- [Scenario 3: Set values for `timeout` and `sideEffects` fields to Mutating Webhook Configuration](#scenario-3-set-values-for-timeout-and-sideeffects-fields-to-mutating-webhook-configuration)
- [Scenario 4: Exempting Namespaces](#scenario-4-exempting-namespaces)
- [4.1. Exempting Namespaces from Gatekeeper using Config resource](#41-exempting-namespaces-from-gatekeeper-using-config-resource)
- [4.2. Exempting Namespaces from the Gatekeeper Admission Webhook using --exempt-namespace flag](#42-exempting-namespaces-from-the-gatekeeper-admission-webhook-using---exempt-namespace-flag)
- [Part 3](#part-3)
- [Environment](#environment-2)
- [Description](#description-2)
- [Scenarios](#scenarios-2)
- [Scenario 1: Restart all the nodes within the cluster](#scenario-1-restart-all-the-nodes-within-the-cluster-1)
- [Solutions](#solutions)
- [Known Risks](#known-risks)
- [References](#references)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
## What is OPA Gatekeeper?

Kubernetes allows decoupling policy decisions from the API server by means of admission controller webhooks to intercept admission requests before they are persisted as objects in Kubernetes. Gatekeeper was created to enable users to customize admission control via configuration, not code and to bring awareness of the cluster’s state, not just the single object under evaluation at admission time. Gatekeeper is a customizable admission webhook for Kubernetes that enforces policies executed by the Open Policy Agent (OPA), a policy engine for Cloud Native environments hosted by CNCF.
To get more detail, please [see](https://open-policy-agent.github.io/gatekeeper/website/docs/).
## What is the motivation behind using this kind of Policy Engine?
With the upcoming deprecation and subsequent removal of Pod Security Policies (PSPs) in Kubernetes, the time is near to find suitable alternatives. Those alternatives, it seems clear at present anyway, will need to be sourced from outside the Kubernetes project itself as there will be no replacement provided. The two leading CNCF projects which are prime candidates for PSP replacement are Open Policy Agent (OPA) via Gatekeeper and Kyverno, each with their own strengths and weaknesses.
If you want to learn about the process that we did before deciding which one fits our needs, please [see](https://github.com/developer-guy/policy-as-code-war). Also you can visit the following links:
- [Kubernetes Policy Comparison: OPA/Gatekeeper vs Kyverno](https://neonmirrors.net/post/2021-02/kubernetes-policy-comparison-opa-gatekeeper-vs-kyverno/)
- [Kubernetes Policy Management Tools Compared: OPA with Gatekeeper vs. Kyverno](https://technologyconversations.com/2021/07/01/kubernetes-policy-management-tools-compared-opa-with-gatekeeper-vs-kyverno/)
## Integration Process of using OPA Gatekeeper in Trendyol
First, we started to using OPA Gatekeeper as `Validating` webhook. Because we only want to enforce some organizational policies across Kubernetes clusters. We are maintaining this project from the [k8s-common](https://gitlab.trendyol.com/platform/devops/base/services/k8s-common) repository under the path `security/gatekeeper`. We are also maintaining the `ConstraintTemplates` that we want to apply from the the [gatekeeper-library](https://gitlab.trendyol.com/platform/base/apps/gatekeeper-library) which is a fork of the official `Gatekeeper Library` repository.
Second, we enabled the `Mutating` webhook feature to fullfil `Pod` resources with the recommended security fields. This support came with the version [v3.4.0](https://github.com/open-policy-agent/gatekeeper/releases/tag/v3.4.0). <span style="color:red">**So, we upgraded OPA Gatekeeper from `v3.3.0` to `v3.4.0` to be able to use this feature and this is where the problems that we encounter until today begins.**</span>
## Troubleshooting Details
We divided this process into the several parts and a several scenarios within that parts.
> There is an issue on GitLab that we keep up the whole process of troubleshooting OPA Gatekeeper, please [see](https://gitlab.trendyol.com/platform/devops/base/teams/pe-container/-/issues/133).
### Part 1
#### Environment
* Kubernetes v1.16.11 (poc-p1-2platform-mars-os)
> Retrieved with the following command:
> k version skew -ojson | jq -r '.serverVersion.gitVersion'
* OPA Gatekeeper v3.4.0
#### Description
First, we talked about the internals of the `OPA Gatekeeper` project and the secret sauce behind the project. So, it allows us to understand how it is going on behind the scenes.
The following documents are what we followed during the learning of the internals of the `OPA Gatekeeper` project.
* https://open-policy-agent.github.io/gatekeeper/website/docs/
* https://kubernetes.io/blog/2019/08/06/opa-gatekeeper-policy-and-governance-for-kubernetes/
Then, we scaled down the replica count of the deployment `gatekeeper-controller-manager` to 0 to simulate errors as a first step.
We have discussed this issue in **4** different scenarios:
#### Scenarios
##### Scenario 1: Create some Kubernetes Resources except for Namespace
We notice that Gatekeeper should not deny the requests if they are not related to creating or updating namespaces.
So, we created a Pod called `alpine` within the default namespace. We expected it to work because the value of the field called `failurePolicy` is `Ignore` for all the resources except for namespace.
```shell
$ kubectl run alpine --namespace default --image=alpine --restart='Never' -- sh -c "sleep 600"
pod/alpine created
```
Then, we tried to create the namespace, and we expected it not to work.
```shell
$ kubectl create namespace testing
Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label. gatekeeper.sh": Post https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s: no endpoints available for service "gatekeeper-webhook-service"
```
As a result, at the end of this scenario, no problem was observed. Everything worked as we expected.
##### Scenario 2: Reboot one of the Master Nodes and wait for them to work properly
We selected one of the master nodes and made an ssh connection to that machine to run the `reboot` command.
```shell
$ ssh user@node
# reboot
```
We watched the processes that belong to the Master Node from another terminal to see what will happen.
```shell
$ watch kubectl get pods --namespace kube-system
```
As a result, at the end of this scenario, no problem was observed. Everything worked as we expected.
###### Scenario 3: Restart all the nodes within the cluster
We restarted all the nodes that belong to the cluster and watched the state of the cluster to see what will happen.
```shell
$ ansible all -i '10.43.202.20,10.43.203.199,10.43.200.160,10.43.203.168,10.43.201.31,10.43.202.233,10.43.202.152,10.43.201.133,10.43.201.159,10.43.204.21,10.43.202.71,10.43.203.169' -m ansible.builtin.command -a "sudo reboot" -b -u centos
```
> Retrieved with the following command:
> $ kubectl get nodes -o jsonpath='{range .items[*]}{"name:"}{"\t"}{.metadata.name}{"\t"}{"ip:"}{"\t"}{.status.addresses[?(@.type=="InternalIP")].address}{"\n"}{end}'
As a result, at the end of this scenario, no problem was observed. Everything worked as we expected.
### Part 2
#### Environment
* Kubernetes v1.16.11 (poc-p1-2platform-mars-os)
* OPA Gatekeeper v3.4.0
#### Description
We did a second troubleshoot event on the 25th Friday of June. This time we didn't scale down the deployment of the OPA Gatekeeper project, so all the replicas were up and running.
#### Scenarios
##### Scenario 1: Restart all the nodes within the cluster
We restarted all the nodes that belong to the cluster and watched the state of the cluster to see what will happen by using the following command:
```shell
$ ansible all -i '10.43.202.20,10.43.203.199,10.43.200.160,10.43.203.168,10.43.201.31,10.43.202.233,10.43.202.152,10.43.201.133,10.43.201.159,10.43.204.21,10.43.202.71,10.43.203.169' -m ansible.builtin.command -a "sudo reboot" -b -u centos
```
As a result;
- Once all the nodes restarted, we encountered a problem at this time. We couldn't reach the API Server even though all the API Server pods have been up and running.
- We couldn't create Pods due to timeout errors.
- All the replicas of the OPA Gatekeeper went into CrashLoopBackOff state.
- We couldn't fetch the logs of the pods, we got an error something like this:
```shell
Error from server (InternalError): Internal error occurred: Authorization Error (user=kube-apiserver-kubelet-client, verb=get, resource=nodes, subresource=proxy
```

Once we removed the `MutatingWebhookConfiguration` configuration of the OPA Gatekeeper project, we noticed that all the things went back to normal immediately.
```shell
$ kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io gatekeeper-mutating-webhook-configuration
```
##### Scenario 2: Narrow down the Mutating Webhook Scope from `*` to `pods`
By default, Mutating Webhook intercepts all the requests for all types of resources due to the `resources` field within the `rules` section set as `*` like the following:
```yaml
rules:
- apiGroups:
- '*'
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- '*'
```
So in the second scenario, we narrowed down the scope of the Mutating Webhook by setting the value of `resources` fields as `pods`, which means that it will only intercept requests for resources which is in type `Pod` against `CREATE` or `UPDATE` events, then we restarted all the nodes.
```yaml
rules:
- apiGroups:
- '*'
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- 'pods'
```
As a result;
- All the replicas of the OPA Gatekeeper were worked as expected, just restarted a couple of times, did not go to CrashLoopBackOff state.
- We were able to create Pods and fetch logs, etc.

##### Scenario 3: Set values for `timeout` and `sideEffects` fields to Mutating Webhook Configuration
We noticed that there are no fields that exist in the Mutating Webhook Configuration, these are the explanations of both fields:
```shell
$ kubectl explain mutatingwebhookconfigurations.webhooks.sideEffects
SideEffects states whether this webhook has side effects. Acceptable values
are: None, NoneOnDryRun (webhooks created via v1beta1 may also specify Some
or Unknown). Webhooks with side effects MUST implement a reconciliation
system, since a request may be rejected by a future step in the admission
change and the side effects therefore need to be undone. Requests with the
dryRun attribute will be auto-rejected if they match a webhook with
sideEffects == Unknown or Some.
$ kubectl explain mutatingwebhookconfigurations.webhooks.timeoutSeconds
TimeoutSeconds specifies the timeout for this webhook. After the timeout
passes, the webhook call will be ignored or the API call will fail based on
the failure policy. The timeout value must be between 1 and 30 seconds.
Default to 10 seconds.
```
We added some values to these fields, then we restarted all the nodes again.
```yaml
sideEffects: None
timeoutSeconds: 3
```
As a result;
- Nothing changed, same results with Scenario 1.
##### Scenario 4: Exempting Namespaces
In OPA Gatekeeper, there is a feature called [Exempting Namespaces](https://open-policy-agent.github.io/gatekeeper/website/docs/exempt-namespaces/). We can use this feature to do two different things:
###### 4.1. Exempting Namespaces from Gatekeeper using Config resource
The config resource can be used as follows to exclude namespaces from certain processes for all constraints in the cluster. To exclude namespaces at a constraint level, use `excludedNamespaces` in the constraint instead. [[0]](https://open-policy-agent.github.io/gatekeeper/website/docs/exempt-namespaces#exempting-namespaces-from-gatekeeper-using-config-resource)
###### 4.2. Exempting Namespaces from the Gatekeeper Admission Webhook using --exempt-namespace flag
In a nutshell, this feature helps us to protect some of the important namespaces being intercepted by the Admission Webhooks. So, we have to decide which namespaces are important for us and set them to the `--exempt-namespace` flag one by one. [[1]](https://open-policy-agent.github.io/gatekeeper/website/docs/exempt-namespaces#exempting-namespaces-from-the-gatekeeper-admission-webhook-using---exempt-namespace-flag)
In this scenario, we used this flag like the following in deployment of the OPA Gatekeeper project:

Then, we restarted all the nodes.
As a result:
- Same results with Scenario 2, everything worked properly.
### Part 3
#### Environment
* Kubernetes v1.21.2 (p-platform-p1-3moon)
* OPA Gatekeeper v3.5.1
#### Description
We installed the OPA Gatekeeper using it's Helm Chart at this time. We did nothing different than that.
##### Scenarios
###### Scenario 1: Restart all the nodes within the cluster
We restarted all the nodes that belong to the cluster and watched the state of the cluster to see what will happen by using the following command:
```shell
$ ansible all -i '10.147.0.154 10.147.0.158 10.147.0.157 10.147.0.161 10.147.0.159 10.147.0.152 10.147.0.155 10.147.0.153 10.147.0.156' -m ansible.builtin.command -a "sudo reboot" -b -u pe
```
As a result;
- Once all the nodes restarted, we encountered a problem at this time. We couldn't reach the API Server even though all the API Server pods have been up and running.
- The replicas of the calico-node went into Unknown state.
- The replicas of the OPA Gatekeeper went into Unknown state due to calico nodes went into Unknown state.
These are the some errors that we saw while inspecting one of the calico pods.
```shell
calico-node-x82mp calico-node 2021-07-28 09:37:14.825 [INFO][58] felix/route_table.go 1096: Failed to access interface because it doesn't exist. error=Link not found ifaceName="cali7455238789c" ifaceRegex="^cali.*" ipVersion=0x4
ERROR: Error accessing the Calico datastore: context deadline exceeded
Warning Unhealthy 21m kubelet Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused
Warning Unhealthy 21m (x3 over 21m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
```
Once we remove the ValidatingAdmissionWebhook resource of the OPA Gatekeeper, everything started to work fine.
###### Scenario 2: Intercept only Pod resources instead of everything
We know that by default Validating and Mutating resources are intercepting all the resource types in the cluster, but we check our current policies and we saw that we only take care of the Pod's events at the moment, so we can change our rules from `'*'` to `'pods'`.

> https://gitlab.trendyol.com/platform/devops/base/services/k8s-common/-/commit/08dc213d10e2d4a7bbf316ae291d9be12c94d53b
Then, we restarted all the nodes again to what will happen.
As a result:
* Everything worked as expected.
## Solutions
As a result, we can tackle the problem with the following improvements:
1. A feature called _exempting namespaces_ in OPA Gatekeeper prevents some of the important namespaces from being intercepted by the webhooks. So we have to add a label to the namespaces that are important for us, such as kube-system and kube-public . This label is admission.gatekeeper.sh/ignore.So, we have to do two things:
**1.1.** Add a label to the namespace, admission.gatekeeper.sh/ignore.
**1.2.** Add the namespace which we labeled to the arguments of the deployment of gatekeeper-controller-manager with exempt-namespace flag.
2. Using Helm Chart of OPA Gatekeeper or the plain YAML file of OPA Gatekeeper instead of in the k8s-common repository.
<details>
<summary>Current OPA Gatekeeper YAML</summary>
```yaml=
apiVersion: v1
kind: Namespace
metadata:
labels:
admission.gatekeeper.sh/ignore: no-self-managing
control-plane: controller-manager
gatekeeper.sh/system: "yes"
name: gatekeeper-system
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.3.0
creationTimestamp: null
name: assign.mutations.gatekeeper.sh
spec:
group: mutations.gatekeeper.sh
names:
kind: Assign
listKind: AssignList
plural: assign
singular: assign
scope: Cluster
validation:
openAPIV3Schema:
description: Assign is the Schema for the assign API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: AssignSpec defines the desired state of Assign
properties:
applyTo:
description: 'INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
Important: Run "make" to regenerate code after modifying this file'
items:
description: ApplyTo determines what GVKs items the mutation should
apply to. Globs are not allowed.
properties:
groups:
items:
type: string
type: array
kinds:
items:
type: string
type: array
versions:
items:
type: string
type: array
type: object
type: array
location:
type: string
match:
properties:
excludedNamespaces:
items:
type: string
type: array
kinds:
items:
description: Kinds accepts a list of objects with apiGroups and
kinds fields that list the groups/kinds of objects to which
the mutation will apply. If multiple groups/kinds objects are
specified, only one match is needed for the resource to be in
scope.
properties:
apiGroups:
description: APIGroups is the API groups the resources belong
to. '*' is all groups. If '*' is present, the length of
the slice must be one. Required.
items:
type: string
type: array
kinds:
items:
type: string
type: array
type: object
type: array
labelSelector:
description: A label selector is a label query over a set of resources.
The result of matchLabels and matchExpressions are ANDed. An empty
label selector matches all objects. A null label selector matches
no objects.
properties:
matchExpressions:
description: matchExpressions is a list of label selector requirements.
The requirements are ANDed.
items:
description: A label selector requirement is a selector that
contains values, a key, and an operator that relates the
key and values.
properties:
key:
description: key is the label key that the selector applies
to.
type: string
operator:
description: operator represents a key's relationship
to a set of values. Valid operators are In, NotIn, Exists
and DoesNotExist.
type: string
values:
description: values is an array of string values. If the
operator is In or NotIn, the values array must be non-empty.
If the operator is Exists or DoesNotExist, the values
array must be empty. This array is replaced during a
strategic merge patch.
items:
type: string
type: array
required:
- key
- operator
type: object
type: array
matchLabels:
additionalProperties:
type: string
description: matchLabels is a map of {key,value} pairs. A single
{key,value} in the matchLabels map is equivalent to an element
of matchExpressions, whose key field is "key", the operator
is "In", and the values array contains only "value". The requirements
are ANDed.
type: object
type: object
namespaceSelector:
description: A label selector is a label query over a set of resources.
The result of matchLabels and matchExpressions are ANDed. An empty
label selector matches all objects. A null label selector matches
no objects.
properties:
matchExpressions:
description: matchExpressions is a list of label selector requirements.
The requirements are ANDed.
items:
description: A label selector requirement is a selector that
contains values, a key, and an operator that relates the
key and values.
properties:
key:
description: key is the label key that the selector applies
to.
type: string
operator:
description: operator represents a key's relationship
to a set of values. Valid operators are In, NotIn, Exists
and DoesNotExist.
type: string
values:
description: values is an array of string values. If the
operator is In or NotIn, the values array must be non-empty.
If the operator is Exists or DoesNotExist, the values
array must be empty. This array is replaced during a
strategic merge patch.
items:
type: string
type: array
required:
- key
- operator
type: object
type: array
matchLabels:
additionalProperties:
type: string
description: matchLabels is a map of {key,value} pairs. A single
{key,value} in the matchLabels map is equivalent to an element
of matchExpressions, whose key field is "key", the operator
is "In", and the values array contains only "value". The requirements
are ANDed.
type: object
type: object
namespaces:
items:
type: string
type: array
scope:
description: ResourceScope is an enum defining the different scopes
available to a custom resource
type: string
required:
- scope
type: object
parameters:
properties:
assign:
description: Assign.value holds the value to be assigned
type: object
x-kubernetes-preserve-unknown-fields: true
ifIn:
description: IfIn Only mutate if the current value is in the supplied
list
items:
type: string
type: array
ifNotIn:
description: IfNotIn Only mutate if the current value is NOT in
the supplied list
items:
type: string
type: array
pathTests:
items:
description: "PathTests allows the user to customize how the mutation
works if parent paths are missing. It traverses the list in
order. All sub paths are tested against the provided condition,
if the test fails, the mutation is not applied. All `subPath`
entries must be a prefix of `location`. Any glob characters
will take on the same value as was used to expand the matching
glob in `location`. \n Available Tests: * MustExist - the
path must exist or do not mutate * MustNotExist - the path must
not exist or do not mutate"
properties:
condition:
enum:
- MustExist
- MustNotExist
type: string
subPath:
type: string
type: object
type: array
type: object
type: object
status:
description: AssignStatus defines the observed state of Assign
type: object
type: object
version: v1alpha1
versions:
- name: v1alpha1
served: true
storage: true
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.3.0
creationTimestamp: null
name: assignmetadata.mutations.gatekeeper.sh
spec:
group: mutations.gatekeeper.sh
names:
kind: AssignMetadata
listKind: AssignMetadataList
plural: assignmetadata
singular: assignmetadata
scope: Cluster
validation:
openAPIV3Schema:
description: AssignMetadata is the Schema for the assignmetadata API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: AssignMetadataSpec defines the desired state of AssignMetadata
properties:
location:
type: string
match:
properties:
excludedNamespaces:
items:
type: string
type: array
kinds:
items:
description: Kinds accepts a list of objects with apiGroups and
kinds fields that list the groups/kinds of objects to which
the mutation will apply. If multiple groups/kinds objects are
specified, only one match is needed for the resource to be in
scope.
properties:
apiGroups:
description: APIGroups is the API groups the resources belong
to. '*' is all groups. If '*' is present, the length of
the slice must be one. Required.
items:
type: string
type: array
kinds:
items:
type: string
type: array
type: object
type: array
labelSelector:
description: A label selector is a label query over a set of resources.
The result of matchLabels and matchExpressions are ANDed. An empty
label selector matches all objects. A null label selector matches
no objects.
properties:
matchExpressions:
description: matchExpressions is a list of label selector requirements.
The requirements are ANDed.
items:
description: A label selector requirement is a selector that
contains values, a key, and an operator that relates the
key and values.
properties:
key:
description: key is the label key that the selector applies
to.
type: string
operator:
description: operator represents a key's relationship
to a set of values. Valid operators are In, NotIn, Exists
and DoesNotExist.
type: string
values:
description: values is an array of string values. If the
operator is In or NotIn, the values array must be non-empty.
If the operator is Exists or DoesNotExist, the values
array must be empty. This array is replaced during a
strategic merge patch.
items:
type: string
type: array
required:
- key
- operator
type: object
type: array
matchLabels:
additionalProperties:
type: string
description: matchLabels is a map of {key,value} pairs. A single
{key,value} in the matchLabels map is equivalent to an element
of matchExpressions, whose key field is "key", the operator
is "In", and the values array contains only "value". The requirements
are ANDed.
type: object
type: object
namespaceSelector:
description: A label selector is a label query over a set of resources.
The result of matchLabels and matchExpressions are ANDed. An empty
label selector matches all objects. A null label selector matches
no objects.
properties:
matchExpressions:
description: matchExpressions is a list of label selector requirements.
The requirements are ANDed.
items:
description: A label selector requirement is a selector that
contains values, a key, and an operator that relates the
key and values.
properties:
key:
description: key is the label key that the selector applies
to.
type: string
operator:
description: operator represents a key's relationship
to a set of values. Valid operators are In, NotIn, Exists
and DoesNotExist.
type: string
values:
description: values is an array of string values. If the
operator is In or NotIn, the values array must be non-empty.
If the operator is Exists or DoesNotExist, the values
array must be empty. This array is replaced during a
strategic merge patch.
items:
type: string
type: array
required:
- key
- operator
type: object
type: array
matchLabels:
additionalProperties:
type: string
description: matchLabels is a map of {key,value} pairs. A single
{key,value} in the matchLabels map is equivalent to an element
of matchExpressions, whose key field is "key", the operator
is "In", and the values array contains only "value". The requirements
are ANDed.
type: object
type: object
namespaces:
items:
type: string
type: array
scope:
description: ResourceScope is an enum defining the different scopes
available to a custom resource
type: string
required:
- scope
type: object
parameters:
properties:
assign:
description: Assign.value holds the value to be assigned
type: object
x-kubernetes-preserve-unknown-fields: true
type: object
type: object
status:
description: AssignMetadataStatus defines the observed state of AssignMetadata
type: object
type: object
version: v1alpha1
versions:
- name: v1alpha1
served: true
storage: true
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.3.0
creationTimestamp: null
labels:
gatekeeper.sh/system: "yes"
name: configs.config.gatekeeper.sh
spec:
group: config.gatekeeper.sh
names:
kind: Config
listKind: ConfigList
plural: configs
singular: config
scope: Namespaced
validation:
openAPIV3Schema:
description: Config is the Schema for the configs API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: ConfigSpec defines the desired state of Config
properties:
match:
description: Configuration for namespace exclusion
items:
properties:
excludedNamespaces:
items:
type: string
type: array
processes:
items:
type: string
type: array
type: object
type: array
readiness:
description: Configuration for readiness tracker
properties:
statsEnabled:
type: boolean
type: object
sync:
description: Configuration for syncing k8s objects
properties:
syncOnly:
description: If non-empty, only entries on this list will be replicated into OPA
items:
properties:
group:
type: string
kind:
type: string
version:
type: string
type: object
type: array
type: object
validation:
description: Configuration for validation
properties:
traces:
description: List of requests to trace. Both "user" and "kinds" must be specified
items:
properties:
dump:
description: Also dump the state of OPA with the trace. Set to `All` to dump everything.
type: string
kind:
description: Only trace requests of the following GroupVersionKind
properties:
group:
type: string
kind:
type: string
version:
type: string
type: object
user:
description: Only trace requests from the specified user
type: string
type: object
type: array
type: object
type: object
status:
description: ConfigStatus defines the observed state of Config
type: object
type: object
version: v1alpha1
versions:
- name: v1alpha1
served: true
storage: true
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.3.0
creationTimestamp: null
labels:
gatekeeper.sh/system: "yes"
name: constraintpodstatuses.status.gatekeeper.sh
spec:
group: status.gatekeeper.sh
names:
kind: ConstraintPodStatus
listKind: ConstraintPodStatusList
plural: constraintpodstatuses
singular: constraintpodstatus
scope: Namespaced
validation:
openAPIV3Schema:
description: ConstraintPodStatus is the Schema for the constraintpodstatuses API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
status:
description: ConstraintPodStatusStatus defines the observed state of ConstraintPodStatus
properties:
constraintUID:
description: Storing the constraint UID allows us to detect drift, such as when a constraint has been recreated after its CRD was deleted out from under it, interrupting the watch
type: string
enforced:
type: boolean
errors:
items:
description: Error represents a single error caught while adding a constraint to OPA
properties:
code:
type: string
location:
type: string
message:
type: string
required:
- code
- message
type: object
type: array
id:
type: string
observedGeneration:
format: int64
type: integer
operations:
items:
type: string
type: array
type: object
type: object
version: v1beta1
versions:
- name: v1beta1
served: true
storage: true
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.3.0
creationTimestamp: null
labels:
gatekeeper.sh/system: "yes"
name: constrainttemplatepodstatuses.status.gatekeeper.sh
spec:
group: status.gatekeeper.sh
names:
kind: ConstraintTemplatePodStatus
listKind: ConstraintTemplatePodStatusList
plural: constrainttemplatepodstatuses
singular: constrainttemplatepodstatus
scope: Namespaced
validation:
openAPIV3Schema:
description: ConstraintTemplatePodStatus is the Schema for the constrainttemplatepodstatuses API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
status:
description: ConstraintTemplatePodStatusStatus defines the observed state of ConstraintTemplatePodStatus
properties:
errors:
items:
description: CreateCRDError represents a single error caught during parsing, compiling, etc.
properties:
code:
type: string
location:
type: string
message:
type: string
required:
- code
- message
type: object
type: array
id:
description: 'Important: Run "make" to regenerate code after modifying this file'
type: string
observedGeneration:
format: int64
type: integer
operations:
items:
type: string
type: array
templateUID:
description: UID is a type that holds unique ID values, including UUIDs. Because we don't ONLY use UUIDs, this is an alias to string. Being a type captures intent and helps make sure that UIDs and names do not get conflated.
type: string
type: object
type: object
version: v1beta1
versions:
- name: v1beta1
served: true
storage: true
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
creationTimestamp: null
labels:
controller-tools.k8s.io: "1.0"
gatekeeper.sh/system: "yes"
name: constrainttemplates.templates.gatekeeper.sh
spec:
group: templates.gatekeeper.sh
names:
kind: ConstraintTemplate
plural: constrainttemplates
scope: Cluster
subresources:
status: {}
validation:
openAPIV3Schema:
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
properties:
crd:
properties:
spec:
properties:
names:
properties:
kind:
type: string
shortNames:
items:
type: string
type: array
type: object
validation:
type: object
type: object
type: object
targets:
items:
properties:
libs:
items:
type: string
type: array
rego:
type: string
target:
type: string
type: object
type: array
type: object
status:
properties:
byPod:
items:
properties:
errors:
items:
properties:
code:
type: string
location:
type: string
message:
type: string
required:
- code
- message
type: object
type: array
id:
description: a unique identifier for the pod that wrote the status
type: string
observedGeneration:
format: int64
type: integer
type: object
type: array
created:
type: boolean
type: object
version: v1beta1
versions:
- name: v1beta1
served: true
storage: true
- name: v1alpha1
served: true
storage: false
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
---
apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
creationTimestamp: null
name: gatekeeper-mutating-webhook-configuration
webhooks:
- clientConfig:
caBundle: Cg==
service:
name: gatekeeper-webhook-service
namespace: gatekeeper-system
path: /v1/mutate
failurePolicy: Ignore
name: mutation.gatekeeper.sh
rules:
- apiGroups:
- '*'
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- '*'
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
gatekeeper.sh/system: "yes"
name: gatekeeper-admin
namespace: gatekeeper-system
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
labels:
gatekeeper.sh/system: "yes"
name: gatekeeper-admin
spec:
allowPrivilegeEscalation: false
fsGroup:
ranges:
- max: 65535
min: 1
rule: MustRunAs
requiredDropCapabilities:
- ALL
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
supplementalGroups:
ranges:
- max: 65535
min: 1
rule: MustRunAs
volumes:
- configMap
- projected
- secret
- downwardAPI
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
creationTimestamp: null
labels:
gatekeeper.sh/system: "yes"
name: gatekeeper-manager-role
namespace: gatekeeper-system
rules:
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: null
labels:
gatekeeper.sh/system: "yes"
name: gatekeeper-manager-role
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- get
- list
- watch
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- config.gatekeeper.sh
resources:
- configs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- config.gatekeeper.sh
resources:
- configs/status
verbs:
- get
- patch
- update
- apiGroups:
- constraints.gatekeeper.sh
resources:
- '*'
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- mutations.gatekeeper.sh
resources:
- '*'
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- policy
resourceNames:
- gatekeeper-admin
resources:
- podsecuritypolicies
verbs:
- use
- apiGroups:
- status.gatekeeper.sh
resources:
- '*'
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- templates.gatekeeper.sh
resources:
- constrainttemplates
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- templates.gatekeeper.sh
resources:
- constrainttemplates/finalizers
verbs:
- delete
- get
- patch
- update
- apiGroups:
- templates.gatekeeper.sh
resources:
- constrainttemplates/status
verbs:
- get
- patch
- update
- apiGroups:
- admissionregistration.k8s.io
resourceNames:
- gatekeeper-validating-webhook-configuration
resources:
- validatingwebhookconfigurations
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- admissionregistration.k8s.io
resourceNames:
- gatekeeper-mutating-webhook-configuration
resources:
- mutatingwebhookconfigurations
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
gatekeeper.sh/system: "yes"
name: gatekeeper-manager-rolebinding
namespace: gatekeeper-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: gatekeeper-manager-role
subjects:
- kind: ServiceAccount
name: gatekeeper-admin
namespace: gatekeeper-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
gatekeeper.sh/system: "yes"
name: gatekeeper-manager-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gatekeeper-manager-role
subjects:
- kind: ServiceAccount
name: gatekeeper-admin
namespace: gatekeeper-system
---
apiVersion: v1
kind: Secret
metadata:
labels:
gatekeeper.sh/system: "yes"
name: gatekeeper-webhook-server-cert
namespace: gatekeeper-system
---
apiVersion: v1
kind: Service
metadata:
labels:
gatekeeper.sh/system: "yes"
name: gatekeeper-webhook-service
namespace: gatekeeper-system
spec:
ports:
- port: 443
targetPort: 8443
selector:
control-plane: controller-manager
gatekeeper.sh/operation: webhook
gatekeeper.sh/system: "yes"
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: audit-controller
gatekeeper.sh/operation: audit
gatekeeper.sh/system: "yes"
name: gatekeeper-audit
namespace: gatekeeper-system
spec:
replicas: 1
selector:
matchLabels:
control-plane: audit-controller
gatekeeper.sh/operation: audit
gatekeeper.sh/system: "yes"
template:
metadata:
annotations:
container.seccomp.security.alpha.kubernetes.io/manager: runtime/default
labels:
control-plane: audit-controller
gatekeeper.sh/operation: audit
gatekeeper.sh/system: "yes"
spec:
imagePullSecrets:
- name: ty-docker-registry
automountServiceAccountToken: true
containers:
- args:
- --operation=audit
- --operation=status
- --logtostderr
command:
- /manager
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
image: ${GATEKEEPER_IMAGE}
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /healthz
port: 9090
name: manager
ports:
- containerPort: 8888
name: metrics
protocol: TCP
- containerPort: 9090
name: healthz
protocol: TCP
readinessProbe:
httpGet:
path: /readyz
port: 9090
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: gatekeeper-admin
terminationGracePeriodSeconds: 60
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
control-plane: controller-manager
gatekeeper.sh/operation: webhook
gatekeeper.sh/system: "yes"
name: gatekeeper-controller-manager
namespace: gatekeeper-system
spec:
replicas: 3
selector:
matchLabels:
control-plane: controller-manager
gatekeeper.sh/operation: webhook
gatekeeper.sh/system: "yes"
template:
metadata:
annotations:
container.seccomp.security.alpha.kubernetes.io/manager: runtime/default
prometheus.io/port: "8888"
prometheus.io/scrape: "true"
labels:
control-plane: controller-manager
gatekeeper.sh/operation: webhook
gatekeeper.sh/system: "yes"
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: gatekeeper.sh/operation
operator: In
values:
- webhook
topologyKey: kubernetes.io/hostname
weight: 100
automountServiceAccountToken: true
imagePullSecrets:
- name: ty-docker-registry
containers:
- args:
- --port=8443
- --logtostderr
- --exempt-namespace=gatekeeper-system
- --operation=webhook
- --enable-mutation=true
command:
- /manager
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
image: ${GATEKEEPER_IMAGE}
imagePullPolicy: IfNotPresent
livenessProbe:
httpGet:
path: /healthz
port: 9090
name: manager
ports:
- containerPort: 8443
name: webhook-server
protocol: TCP
- containerPort: 8888
name: metrics
protocol: TCP
- containerPort: 9090
name: healthz
protocol: TCP
readinessProbe:
httpGet:
path: /readyz
port: 9090
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /certs
name: cert
readOnly: true
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: gatekeeper-admin
terminationGracePeriodSeconds: 60
volumes:
- name: cert
secret:
defaultMode: 420
secretName: gatekeeper-webhook-server-cert
---
apiVersion: admissionregistration.k8s.io/v1beta1
kind: ValidatingWebhookConfiguration
metadata:
creationTimestamp: null
labels:
gatekeeper.sh/system: "yes"
name: gatekeeper-validating-webhook-configuration
webhooks:
- clientConfig:
caBundle: Cg==
service:
name: gatekeeper-webhook-service
namespace: gatekeeper-system
path: /v1/admit
failurePolicy: Ignore
name: validation.gatekeeper.sh
namespaceSelector:
matchExpressions:
- key: admission.gatekeeper.sh/ignore
operator: DoesNotExist
rules:
- apiGroups:
- '*'
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- '*'
sideEffects: None
timeoutSeconds: 3
- clientConfig:
caBundle: Cg==
service:
name: gatekeeper-webhook-service
namespace: gatekeeper-system
path: /v1/admitlabel
failurePolicy: Fail
name: check-ignore-label.gatekeeper.sh
rules:
- apiGroups:
- ""
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- namespaces
sideEffects: None
timeoutSeconds: 3
```
</details>
> You can use a Go tool that we developed to get difference between OPA Gatekeeper versions.
> https://gitlab.trendyol.com/platform/base/poc/get-diff-report-between-yaml-files

## What are we planning to do?
We will move on with the second option and we will change our interception strategy from '*' to 'pods' now but here is the plan, if we decide to write a policy that takes care of different than Pod resource, we have to re-deploy OPA Gatekeeper to intercept that resource too before applying the policy at first place.
You can follow up the whole process from the following PR:
> https://gitlab.trendyol.com/platform/devops/base/services/k8s-common/-/merge_requests/80
## What have we done so far?
* We changed our installation method from plain YAML to Helm Chart. You can see all the details by following [link](https://gitlab.trendyol.com/platform/devops/base/services/k8s-common/-/merge_requests/80/commits).
* We opened some issues to improve OPA Gatekeeper Helm chart and its libraries.
* https://github.com/open-policy-agent/gatekeeper/pull/1408
* https://github.com/open-policy-agent/gatekeeper/pull/1425
* https://github.com/open-policy-agent/gatekeeper-library/pull/98
* https://github.com/open-policy-agent/gatekeeper/pull/1464
* We opened an issue about the problem that we encountered.
* https://github.com/open-policy-agent/gatekeeper/issues/1472
## Known Risks
- Once we define some of the namespaces as exempt namespace, these namespaces will **no longer be intercepted** by OPA Gatekeeper which means that they will be never validated by the constraints. So, this brings us to another problem. The end users can run any privileged containers in namespaces we defined as the exempt namespaces because of that we don't have strict RBAC rules or taints on top of Kubernetes clusters right now.
## References
- [OPA Gatekeeper: Policy and Governance for Kubernetes](https://kubernetes.io/blog/2019/08/06/opa-gatekeeper-policy-and-governance-for-kubernetes/)
- [Getting Started OPA Gatekeeper](https://open-policy-agent.github.io/gatekeeper/website/docs/)
- [Differences between OPA and Gatekeeper for Kubernetes Admission Control](https://www.infracloud.io/blogs/opa-and-gatekeeper/)