# Troubleshooting Venafi Kubernetes Agent ### Memory Limits When the the Agent starts up it downloads and caches all the Kubernetes API resources which it has been configured to report. Depending on the combined size of these resources, the memory usage will briefly spike at startup and if the memory limit on the Agent Pod is set too low, the Agent may breach the memory limit and be OOM killed. To figure out whether it is being OOM killed: ```bash kubectl describe -n venafi pod -l app.kubernetes.io/name=venafi-kubernetes-agent ``` > :book: Read more about [scaling TLSPK components](https://docs.venafi.cloud/vaas/k8s-components/c-k8s-components-best-practice/#memory) ### Remove the Memory Limit Try removing the memory limit and measuring the peak memory usage. To remove the memory limit: ```bash kubectl patch deployment venafi-kubernetes-agent \ -n venafi \ --type=json \ -p='[{"op": "remove", "path": "/spec/template/spec/containers/0/resources/limits"}]' ``` This will restart the venafi-kubernetes-agent Pod and now you can measure the peak memory usage as follows: ```bash NODE_NAME=$(kubectl get pod -n venafi -l app.kubernetes.io/name=venafi-kubernetes-agent -o jsonpath='{.items[].spec.nodeName}') POD_NAME=$(kubectl get pod -n venafi -l app.kubernetes.io/name=venafi-kubernetes-agent -o jsonpath='{.items[].metadata.name}') kubectl get --raw /api/v1/nodes/${NODE_NAME}/proxy/metrics/cadvisor | grep container_memory_max_usage_bytes | grep ${POD_NAME} ``` You'll see something like this: ``` container_memory_max_usage_bytes{container="",id="/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-burstable.slice/kubelet-kubepods-burstable-pod57ed4b42_d15d_426e_88a2_7378362a3697.slice",image="",name="",namespace="venafi",pod="venafi-kubernetes-agent-7b87757cd8-zkdbm"} 2.1344256e+07 1747396621128 container_memory_max_usage_bytes{container="",id="/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-burstable.slice/kubelet-kubepods-burstable-pod57ed4b42_d15d_426e_88a2_7378362a3697.slice/cri-containerd-fd1fc7a71e4d3f213c0d84bde0b9c4d7b1f2e6590994166a4e481559b7432ca6.scope",image="registry.k8s.io/pause:3.10",name="fd1fc7a71e4d3f213c0d84bde0b9c4d7b1f2e6590994166a4e481559b7432ca6",namespace="venafi",pod="venafi-kubernetes-agent-7b87757cd8-zkdbm"} 4.42368e+06 1747396620316 container_memory_max_usage_bytes{container="venafi-kubernetes-agent",id="/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-burstable.slice/kubelet-kubepods-burstable-pod57ed4b42_d15d_426e_88a2_7378362a3697.slice/cri-containerd-d7db3c60331a4d3c72e0791072d4e283359c4ceff31e550dce320b65488a02c1.scope",image="registry.venafi.cloud/venafi-agent/venafi-agent:v1.4.1",name="d7db3c60331a4d3c72e0791072d4e283359c4ceff31e550dce320b65488a02c1",namespace="venafi",pod="venafi-kubernetes-agent-7b87757cd8-zkdbm"} 2.1041152e+07 1747396605390 ``` ### Agent Metrics The Agent exports metrics in Prometheus format and among these are metrics about the size of the data which has been collected and uploaded to Venafi. To print the [Agent's metrics](https://docs.venafi.cloud/vaas/k8s-components/c-vka-metrics/#reference): ``` POD_NAME=$(kubectl get pod -n venafi -l app.kubernetes.io/name=venafi-kubernetes-agent -o jsonpath='{ .items[0].metadata.name }') kubectl get --raw "/api/v1/namespaces/venafi/pods/${POD_NAME}:8081/proxy/metrics" | grep -A1 'HELP' ``` ### Venafi Workloads If there are problems with the Agent, there may be clues in the Agent logs; errors and warnings. Dump the logs and state of all the workloads in the Venafi namespace: ``` kubectl cluster-info dump --namespace venafi -o yaml --output-directory venafi.dump ``` ### Agent logs, current and previous If the Agent crashed and restarted, you can look at the the logs from the previous agent process as follows: ```bash kubectl logs -n venafi deployments/venafi-kubernetes-agent --previous ``` ### API Object Count by Kind It is possible that your cluster has a large number of API objects which are being downloaded and cached by the agent, causing excessive memory usage. You can solve this by creating a custom agent configuration, which excludes certain API object kinds or excludes some namespaces. This command will give the object counts from the [metrics endpoint of the API server](https://kubernetes.io/docs/reference/instrumentation/metrics/): ```bash kubectl get --raw "/metrics" | grep apiserver_storage_objects ``` ### Secret Count by Type It is possible that your cluster contains a large number of large Secrets which are being downloaded and cached by the agent, causing excessive memory usage. By default the agent will ignore various common Secret types, but your cluster may contain other Secret types which are not in the default list and which can be excluded. This command will give a list of all the Secrets and their types, [without downloading any of the data in the Secrets](https://kubernetes.io/docs/reference/using-api/api-concepts/#receiving-resources-as-tables): ```bash kubectl get secret --all-namespaces ``` ### Create a heap profile using Pprof You will want to add the `--enable-pprof` flag to the deployment: ```bash kubectl edit -n venafi deployment venafi-kubernetes-agent ``` You can also use `kubectl patch` to add the flag: ```bash kubectl patch deployment venafi-kubernetes-agent \ -n venafi \ --type=json \ -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--enable-pprof"}]' ``` Then, run in a separate Shell session: ```bash kubectl port-forward -n venafi deploy/venafi-kubernetes-agent 8081 ``` Download the heap profile: ```bash curl http://localhost:8081/debug/pprof/heap >heap.out ``` > You can display the heap profile with: > > ```bash > go tool pprof -http=:9999 heap.out > ``` ![Screenshot 2025-05-16 155248](https://hackmd.io/_uploads/SkNq-RNbex.png) ### Perform a one-shot discovery to stdout Sometimes it is useful to examine the raw data that is being collected by the Agent; the data that it is uploading to Venafi. You can use `kubectl debug` to [Copy a Pod while changing its command](https://kubernetes.io/docs/tasks/debug/debug-application/debug-running-pod/#copying-a-pod-while-changing-its-command), and this allows us to make a copy of the Agent Pod but use the `--one-shot` and `--output-path` arguments to print the data to stdout and save it to a local file. The output file will show both the logs (in JSON format) and the discovered data. ```bash POD_NAME=$(kubectl get pod -n venafi -l app.kubernetes.io/name=venafi-kubernetes-agent -o jsonpath='{.items[].metadata.name}') kubectl debug $POD_NAME \ -n venafi \ --attach \ --copy-to=debugger-$RANDOM \ --container=venafi-kubernetes-agent \ -- \ /ko-app/preflight agent \ --one-shot \ --output-path /dev/stdout \ --agent-config-file /etc/venafi/agent/config/config.yaml \ --venafi-connection not-used \ --logging-format json > output.json ```