K8s Pods Share One Physical GPU

# K8s Pods Share One Physical GPU ###### tags: `K8s`, `雲端 / K8s` [K8s official method](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/) recommends NVIDIA device plugin to handle GPU usage on K8s. But there has a limitation is the resource limit only set as an integer and the GPU could not be shared. Here are some discussions about this issue on Kubernetes and NVIDIA official github: * K8s issue: [Is sharing GPU to multiple containers feasible?](https://github.com/kubernetes/kubernetes/issues/52757) * NVIDIA issues: 1. [pods cannot share a GPU?](https://github.com/NVIDIA/k8s-device-plugin/issues/51) 2. [Allocating same GPU to multiple requests](https://github.com/NVIDIA/k8s-device-plugin/issues/4) **NEW - NVIDIA Solution?**: [nvidia-device-plugin](https://github.com/NVIDIA/k8s-device-plugin#shared-access-to-gpus-with-cuda-time-slicing) version 0.12.0+ To deal with the issue `Pods or containers can not share physical GPUs`. There are some third-party companies offer some solutions. Here are the topics: 1. [Tasks and Bugs/To-Do Features](#tasks--bugsto-do-features): talk about the tasks, bugs, and to-do features on the redmine which belong to this issue. 2. [Third-party Strategies](#third-party-strategies) * [AWS](#aws-amazon) * [Tencent](#tencent) * [Azure](#azure-microsoft) * [Aliyun](#aliyun-alibaba) ## Third-party Strategies ### AWS (Amazon) AWS scheme is based on `nvidia-container-runtime`. It splits one physical GPU into multiple vGPU with MPS, and the default setting is 1 physical GPU should be split into 10 vGPU. Here is the limitation: 1. Can not limit GPU Memory Usage on each pod. If users infer models with TensorFlow or PyTorch, GPU memory usage limitations could be set through the framework's features. 2. The test pod `resnet-api-server` offered by AWS official works successfully, but NVIDIA Triton Inference Server occupies at least ONE physical GPU. 3. AWS scheme and `nvidia-device-plugin` have the same K8s resources standard, and the number of limits vGPU MUST be the same as the number of requests. 4. [https://github.com/awslabs/aws-virtual-gpu-device-plugin#limitations](https://github.com/awslabs/aws-virtual-gpu-device-plugin#limitations): The document says, "Although `aws-virtual-gpu-device-plugin` based on `nvidia-docker2`, it can not be compatible with `k8s-device-plugin` released by NVIDIA." All above have been tested in practice, and the reasons are found in the official documents: * [https://github.com/awslabs/aws-virtual-gpu-device-plugin](https://github.com/awslabs/aws-virtual-gpu-device-plugin) * [https://aws.amazon.com/tw/blogs/opensource/virtual-gpu-device-plugin-for-inference-workload-in-kubernetes/ ](https://aws.amazon.com/tw/blogs/opensource/virtual-gpu-device-plugin-for-inference-workload-in-kubernetes/ ) **Blender DOES NOT TEST WITH AWS YET** ### Tencent The Tencent scheme, `gpu-manager`, depended on DOCKER NATIVE `runC` not `nvidia-container-runtime`. So it could not install `nvidia-docker2` neither modify `/etc/docker/daemon.json`. Testing result: 1. Use docker image `tensorflow/tensorflow:nightly-gpu-jupyter` provided by TensorFlow as a pod, and run example model scripts inside. -> **The GPU memory is limited by the deployment/pod YAML setting**. 2. NVIDIA Triton Inference Server occupies at least ONE physical GPU. 3. Blender can not catch or use GPUs, even I offer one physical GPU. Tencent is not only open source the `gpu-manager` and related project on Github, it also wrote a paper for K8s GPU shaaring being included by IEEE: * [https://github.com/tkestack/gpu-manager](https://github.com/tkestack/gpu-manager) * [https://ieeexplore.ieee.org/abstract/document/8672318](https://ieeexplore.ieee.org/abstract/document/8672318) * [https://static.sched.com/hosted_files/kccncchina2018english/fc/KubeCon_China_ppt.pdf](https://static.sched.com/hosted_files/kccncchina2018english/fc/KubeCon_China_ppt.pdf) * [https://github.com/SimpCosm/paper-reading/issues/1](https://github.com/SimpCosm/paper-reading/issues/1) * [https://github.com/tkestack/gpu-manager/issues/78](https://github.com/tkestack/gpu-manager/issues/78) Testing K8s YAML: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: jupyter-deployment annotations: tencent.com/vcuda-core-limit: 50 spec: replicas: 1 selector: matchLabels: app: jupyter-server template: metadata: labels: app: jupyter-server spec: containers: - name: jupyter-container image: tensorflow/tensorflow:nightly-gpu-jupyter ports: - containerPort: 8888 resources: limits: tencent.com/vcuda-core: 50 tencent.com/vcuda-memory: 15 requests: tencent.com/vcuda-core: 50 tencent.com/vcuda-memory: 15 volumeMounts: - name: notebooks mountPath: /tf/notebooks volumes: - name: notebooks persistentVolumeClaim: claimName: tf-jupyter-pvc --- apiVersion: v1 kind: Service metadata: labels: run: jupyter-service name: jupyter-service spec: ports: - port: 8888 targetPort: 8888 nodePort: 30088 selector: app: jupyter-server type: NodePort ``` ### Testing Result 1. Test item 1, the error occurred when the usage of GPU memory is overed: ```bash File /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/constant_op.py:102, in convert_to_eager_tensor(value, ctx, dtype) 100 dtype = dtypes.as_dtype(dtype).as_datatype_enum 101 ctx.ensure_initialized() --> 102 return ops.EagerTensor(value, ctx.device_name, dtype) FailedPreconditionError: {{function_node wrapped_EagerConst_device/job:localhost/replica:0/task:0/device:GPU:0}} Failed to allocate scratch buffer for device 0 ``` ![tf_jupyter_testresult](https://i.imgur.com/fwmnluL.jpg) 2. Test item 3, Blender can not catch or use GPUs, even I offer one physical GPU: ![blender_cannot_catch_gpus](https://i.imgur.com/GnpF4Lw.png) ## Azure (Microsoft) Azure approach also use MPS to reach the purpose. But whole scheme based on its own platform Azure Stack Edge Pro and its OS Windows, all of that needs to use their `cli` and `commands`. That is why I need not test the scheme. Here are the documents: * [azure-stack-edge-gpu-deploy-kubernetes-gpu-sharing - markdown](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/databox-online/azure-stack-edge-gpu-deploy-kubernetes-gpu-sharing.md) * [azure-stack-edge-gpu-deploy-configure-compute - markdown](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/databox-online/azure-stack-edge-gpu-deploy-configure-compute.md) * [gpu-cluster - markdown](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/aks/gpu-cluster.md) * [azure-stack-edge-gpu-deploy-kubernetes-gpu-sharing doc](https://docs.microsoft.com/zh-tw/azure/databox-online/azure-stack-edge-gpu-deploy-kubernetes-gpu-sharing) * [gpu-cluster doc](https://docs.microsoft.com/zh-tw/azure/aks/gpu-cluster) ## Aliyun (Alibaba) Aliyun's approach is also dependent on MPS. They declare the scheme is trying to isolate both GPU memory and computing resource but does not succeed yet. Here are the documents: * [gpushare-scheduler-extender](https://github.com/AliyunContainerService/gpushare-scheduler-extender) * [gpushare-device-plugin](https://github.com/AliyunContainerService/gpushare-device-plugin) * [designs - markdown](https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/designs/designs.md) ## Conclusion This topic is much more suitable called the “TODO”. I paused the survey here and list a TODO list for the person interested in the solution. 1. Check **NVIDIA Solution?**: [nvidia-device-plugin](https://github.com/NVIDIA/k8s-device-plugin#shared-access-to-gpus-with-cuda-time-slicing) version 0.12.0+ is working or not. 2. Replaced NVIDIA device plugin with NVIDIA operater? 3. Check AWS strategy on Blender is working or not. 4. Choose a scheme to replace current one.