Note on Sharing GPU to Multiple Containers

# Note on Sharing GPU to Multiple Containers --- ###### tags: `Kubernetes` --- ## Preview 1. A device plug-in can register itself with the Kubelet through this gRPC service. 2. Device plug-in is required to report three things to Kubelet as a client. 1. The name of its Unix socket. 2. The Device Plug-in API version against which it was built. 3. The Resource Name to be advertised. Here Resource Name needs to follow the extended resource naming scheme as vendor-domain/resource type. ## Samed Güner, SAP ### Approach Add suffix to the id of a real GPU to create the id of virtual GPU (vGPU). Report these artifical ids to kubelet, so that kubelet think we have more gpu. ### Some Questions 1. how many model can fit in the memory 2. how many vGPU should be created 3. What's the limitation? ### Limitations 1. The relationship between vRAM and the number of models and the size of models is hard to determine. 2. The relationship between CPU limitation and GPU limitation, and throughtput and latency should be investigate. 3. The inability to limit virtual RAM from the Device Plug-in, they had to misuse k8s cpu limits to limit GPU utilization. 4. What happens on the GPU level when run multiple processes need to be figured out. 5. No isolation gurantee on hardware level by Nvidia. 6. Cannot specify the VRAM and GPU cores which makes it hard to use native kubernetes scheduling to schedule pods. 7. Resource fragmentation is a problem because kubelet doesn't see GPUs as first class resources. ### What Are Needed 1. Isolation on GPU level 2. Resources Degragmentation 3. Low-overhead during scheduling and processing across VGPUs 4. Per device sub-resources exposed at Device Plug-in ## KubeShare ### Approach Enable GPU sharing in Kubernetes, and provides first class GPU scheduling to address utilization, fragmentation, and interference problems. ### Important Modification #### Native Kubernetes Mechanism 1. Node Deamon decides GPU binding to a pod, and it sends Allocate Request to Nvidia GPU Device-Plugin. 2. Nvidia GPU Device-Plugin then return Allocate Response, ask Node Deamon to set the environment variable 'NVIDIA_VISIBLE_DEVICES='A, B', So the pod can also access A, B. GPUs are scheduled without identity. #### KubeShare Device Manager 1. Maintains the mapping between vGPU Id and the idenfier of GPU return from Nvidia Device-Plugin (UUID). 2. If the vGPU Id already exists, KubeShare Device Manager creates a pod with corresponding UUID environment variables. So the process doesn't go through Nvidia Device-Plugin. ## Reference [Intel® Device Plug-ins for Kubernetes*](https://docs.01.org/kubernetes/device-plugins/index.html) [CNCF [Cloud Native Computing Foundation]](https://www.youtube.com/watch?v=MDkltK5JLCU&t=674s&ab_channel=CNCF%5BCloudNativeComputingFoundation%5D) [KubeShare: A Framework to Manage GPUs as First-Class and Shared Resources in Container Cloud](https://www.youtube.com/watch?v=1WQMKCGN9j4&ab_channel=HPDCACM)