🚀 MIG Setup Guide (H100 + OpenShift + NVIDIA GPU Operator)

--- author: "Tom Avital" date-created: 20250907 tags: - MIG - H100 - OCP-AI - configuration description: | """ MIG slicing guide for an H100 GPU node within OpenShift-AI using nvidia-gpu-operator """ --- # 🚀 MIG Setup Guide (H100 + OpenShift + NVIDIA GPU Operator) ## 1. Decide MIG Strategy (`ClusterPolicy`) The MIG strategy is set in the **`ClusterPolicy`** CRD (`gpu-cluster-policy`), which is managed by the NVIDIA GPU Operator. It controls **how GPUs are exposed to Kubernetes**. ```yaml apiVersion: nvidia.com/v1 kind: ClusterPolicy metadata: name: gpu-cluster-policy spec: mig: strategy: mixed # options: single, mixed, none ``` ### Options * **`single`** * GPU presented as a whole device (`nvidia.com/gpu`). * MIG disabled. * Best for: full-GPU training/fine-tuning of large models. * **`mixed`** * MIG mode enabled; GPUs can be split into slices. * MIG Manager watches node labels (`nvidia.com/mig.config`) and applies layouts. * Kubernetes advertises MIG resource types (`nvidia.com/mig-1g.10gb`, etc.). * Best for: inference clusters, multi-tenant, running several smaller jobs concurrently. * **`none`** * Operator does not manage MIG at all. * GPU resources not exposed as MIG slices. * Best for: rare cases where you manage MIG manually with `nvidia-smi`. 💡 **Rule of thumb**: * Big training → `single` * Multi-tenant inference → `mixed` * DIY/manual → `none` ## 2. Discover Available MIG Configs The GPU Operator includes a ConfigMap `default-mig-parted-config` in the `nvidia-gpu-operator` namespace. It defines valid MIG layouts. ```bash oc -n nvidia-gpu-operator get cm default-mig-parted-config -o yaml | less ``` Check keys under `data:` for valid values of `nvidia.com/mig.config`. Examples: * `all-1g.10gb` (H100 80 GB → 7 × 10 GB slices) * `all-1g.24gb` (H100 94 GB → 4 × 24 GB slices) * `all-2g.20gb` (80 GB → 4 × 20 GB slices) * `all-3g.40gb` (80 GB → 2 × 40 GB slices) * `all-balanced` (a mix of 1g/2g/3g slices) ## 3. Apply a MIG Layout via Node Label Label your GPU node with one of the valid configs. MIG Manager will reconfigure the GPU accordingly: ```bash oc label node <GPU_NODE> nvidia.com/mig.config=all-2g.20gb --overwrite ``` Operator workflow: 1. MIG Manager detects the label. 2. It may drain workloads. 3. GPU is carved into slices. 4. Node resource list updates (e.g., `nvidia.com/mig-2g.20gb: 4`). ## 4. Monitor Progress Watch operator pods as MIG Manager applies the layout: ```bash oc -n nvidia-gpu-operator get pods | egrep "mig|driver|plugin" oc -n nvidia-gpu-operator logs -l app=nvidia-mig-manager -f ``` Check node resources: ```bash oc describe node <GPU_NODE> | egrep "nvidia.com/mig-|nvidia.com/mig.config" ``` When complete, allocatable MIG resources appear on the node. --- ## 5. Verify with `crictl` + `nvidia-smi` Use `oc debug` to get into the node and exec into the driver container: ```bash oc debug node/<GPU_NODE> chroot /host # Find the driver container crictl ps | grep nvidia # Exec into it crictl exec -it <container-id> nvidia-smi -L # List GPU instances and compute instances crictl exec -it <container-id> nvidia-smi mig -lgi crictl exec -it <container-id> nvidia-smi mig -lci ``` This shows which MIG slices exist and their UUIDs. ## 6. Re-labeling an Already Labeled Node If the node already has a MIG config (e.g., `all-disabled`): ```bash # Safest approach: drain first oc adm cordon <GPU_NODE> oc adm drain <GPU_NODE> --ignore-daemonsets --delete-emptydir-data # Change layout oc label node <GPU_NODE> nvidia.com/mig.config=all-3g.40gb --overwrite # Bring it back oc adm uncordon <GPU_NODE> ``` Alternative: remove then re-add the label ```bash oc label node <GPU_NODE> nvidia.com/mig.config- oc label node <GPU_NODE> nvidia.com/mig.config=all-3g.40gb ``` ## 7. Deployment Redirection (Resources) Use of specific GPU slice is done via deployment CR resources limits and requests ```yaml= spec: containers: - resources: limits: cpu: '6' memory: 18Gi nvidia.com/mig-1g.24gb: '1' requests: cpu: '6' memory: 18Gi nvidia.com/mig-1g.24gb: '1' ``` Selection of `nvidia.com/mig-1g.24gb` sets the deployment to run specifically on such slice across GPU nodes. # ✅ Summary * **MIG strategy** (`ClusterPolicy`): `single` (whole GPU), `mixed` (MIG slices), `none` (disabled). * **Layouts**: defined in `default-mig-parted-config`. * **Node labels**: `nvidia.com/mig.config=<layout>` tell MIG Manager how to carve GPUs. * **Progress**: watch Operator pods, check node allocatable resources. * **Verification**: `nvidia-smi` via driver container (`crictl exec`). * **Relabeling**: use `--overwrite` or remove+add; drain workloads first for safety. * **Deployment setup:** Setting deployment to use MIG slice.