***(This document is intended to appeal to OCP maintainers, but the core tenet behind the changes is also relevant for using Delve in application workloads or vanilla Kubernetes components)***
I essentially needed Delve as a debugger a few times when de facto printing log mechanism didn't help me sort out the issue in core Openshift components. However, there was no time to hack and make it working easily.
I wrote this README to shorten the time to prepare environment for Delve debugging.
[OCM(aka. openshift-controller-manager)](https://github.com/openshift/openshift-controller-manager) and [OCM-o(aka. cluster-openshift-controller-manager-operator)](https://github.com/openshift/cluster-openshift-controller-manager-operator) are used as examples.
All the changes applied in this document can be tracked via these PRs;
OCM: https://github.com/openshift/openshift-controller-manager/pull/279
OCM-o: https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/317
# Server Side Configurations
1. Delve requires that the debugged binary should be compiled with the correct go flags and build-machinery-go supports customization of flags via `GO_BUILD_FLAGS` environment variable.
Add below assignment in [OCM Makefile](https://github.com/openshift/openshift-controller-manager/blob/master/Makefile)
```sh
export GO_BUILD_FLAGS="-gcflags=all=-N -l"
```
2. In order to run Delve binary, we need to install it in [OCM Dockerfile](https://github.com/openshift/openshift-controller-manager/blob/master/Dockerfile.rhel)
```sh
RUN GOFLAGS='' go install github.com/go-delve/delve/cmd/dlv@latest
```
GOFLAGS needs to be cleared for delve installation because vendoring is enabled by default but we need to install Delve that is not vendored and move it into $PATH(`/usr/bin`);
```sh
COPY --from=builder /go/bin/dlv /usr/bin/
```
OCM related changes are done.
3. OCM is deployed and started by OCM-o in [deploy.yaml](https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/bindata/assets/openshift-controller-manager/deploy.yaml). We need to change the command from
```yaml
command: ["openshift-controller-manager", "start"]
```
to
```yaml
command: ["dlv", "--listen=:40000", "--headless=true", "--api-version=2", "--accept-multiclient=true", "--allow-non-terminal-interactive=true", "--continue", "--log", "exec", "openshift-controller-manager", "--", "start"]
```
dlv manages the initialization of openshift-controller-manager and starts watching it. However, dlv requires `SYS_PTRACE` capability is added in security contexts.
4. In the same [deploy.yaml](https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/bindata/assets/openshift-controller-manager/deploy.yaml), it is required that the SecurityContext of pod and container is privileged and SYS_PTRACE capability is added.
5. Some of core component's(e.g. openshift-apiserver) namespace and service accounts are privileged to add these capabilities but for OCM, it is not. Therefore, we need to set [OCM service account](https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/master/bindata/assets/openshift-controller-manager/tokenreview-clusterrolebinding.yaml) to cluster-admin role;
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:openshift:tokenreview-openshift-controller-manager
roleRef:
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
namespace: openshift-controller-manager
name: openshift-controller-manager-sa
```
and it's [namespace](https://github.com/openshift/cluster-openshift-controller-manager-operator/blob/1770b702607e0a04e718ee79e6b2ae7229c37361/bindata/assets/openshift-controller-manager/ns.yaml) to privileged
```yaml
pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/audit: privileged
pod-security.kubernetes.io/warn: privileged
```
6. Although it is optional, it is strongly recommended to increase the timeout values of readiness and liveliness probes. Otherwise, container dies when timeout occurs and debugging session needs to be restarted which hurts usability.
OCM-o related changes are completed.
Spin up a cluster-bot cluster with these changes via;
```sh
$ launch 4.15-nightly,openshift/cluster-openshift-controller-manager-operator#317,openshift/openshift-controller-manager#279 aws
```
# Client Side Configurations
Actualy there is no configuration on client side apart from invoking port-forward and Delve debugging via IDE;
```sh
$ oc port-forward pod/controller-manager-master-0 -n openshift-controller-manager 40000:40000
```
**Please be aware that the port number is same with the one added in deploy.yaml as `--listen` flag.**
Attach OCM process with `localhost:40000` and start Delve debugging.